Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

effects: add new @consistent_overlay macro #54322

Merged
merged 5 commits into from
Jun 18, 2024
Merged

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented May 1, 2024

This PR serves to replace #51080 and close #52940.
It extends the :nonoverlayed to UInt8 and introduces the CONSISTENT_OVERLAY effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. Additionally, this PR adds a new effect override called :consistent_overlay.
I've also included a relatively accurate description of :consistent_overlay, as pointed out in #51080. Quoting from the newly added docstrings:

:consistent_overlay

The :consistent_overlay setting asserts that any overlayed methods potentially called by
the method are :consistent with their original, non-overlayed counterparts. For the exact
definition of :consistent, refer to the earlier explanation.

More formally, when evaluating a generic function call f(x) at a specific world-age i,
and the regular method call fᵢ(x) is redirected to an overlay method fᵢ′(x), this
setting requires that fᵢ(x) ≡ fᵢ′(x).

!!! note
Note that the requirements for :consistent-cy include not only that the return values
are egal, but also that the manner of termination is the same.
However, it's important to aware that when they throw exceptions, the exceptions
themselves don't necessarily have to be egal as explained in the note of :consistent.
In other words, if fᵢ(x) throws an exception, this settings requires fᵢ′(x) to
also raise one, but the exact exceptions may differ.

!!! note
This setting isn't supported at the callsite; it has to be applied at the definition
site. Also, given its nature, it's expected to be used together with Base.Experimental.@overlay.

Still, the explanation might not be comprehensive enough. I welcome any feedback.


EDIT: Now this feature is implemented with a new macro @consistent_overlay instead of extending the override set of @assume_effects.

@aviatesk aviatesk requested review from maleadt and Keno May 1, 2024 10:12
@aviatesk aviatesk added the backport 1.11 Change should be backported to release-1.11 label May 1, 2024
@oscardssmith
Copy link
Member

could we also add a weaker version of this effect that only requires the overlay to return an egal answer when the regular function returns an answer? (e.g. for things like NaNMath)

@aviatesk
Copy link
Member Author

aviatesk commented May 1, 2024

That sounds exactly like the requirements for :consistent_overlay. Can you explain how that version is considered weaker?

@oscardssmith
Copy link
Member

If I'm reading this right, it wouldn't be valid to use this extension for log_overlay(x) = x < 0 ? NaN : log(x) since log_overlay returns a value rather than throwing for negative numbers.

@aviatesk
Copy link
Member Author

aviatesk commented May 1, 2024

Ah, did you mean @overlay NAN_MATH_TABLE log(x) = x < 0 ? NaN : log(x), right?
We can create such a :nonoverlay effect bit (, which allows concrete evaluation with bailing out of it if the original method's execution raises an issue). But that nuanced setting would be more complicated than the already complex :consistent_overlay, so I'd prefer to introduce it only when there's a demonstrated need.

@aviatesk aviatesk force-pushed the avi/gpucompiler-384-again branch from 146a43a to c675f21 Compare May 1, 2024 16:03
@Keno
Copy link
Member

Keno commented May 2, 2024

I'm ok with this, but it seems a bit weird to put this on @assume_effects. I don't even mind this going into in the (internal) EffectsOverride, I'm just thinking in terms of where to put the user-facing part of this. I think it makes more sense as an option to the @overlay macro.

@KristofferC KristofferC mentioned this pull request May 6, 2024
59 tasks
@aviatesk aviatesk force-pushed the avi/gpucompiler-384-again branch from c675f21 to 6b294ec Compare May 8, 2024 12:19
@aviatesk aviatesk changed the title effects: add new :consistent_overlay override effects: add new @consistent_overlay macro May 8, 2024
@aviatesk
Copy link
Member Author

aviatesk commented May 8, 2024

I think it makes more sense as an option to the @overlay macro.

I agree. I've created a new @consistent_overlay macro and included a thorough documentation.

@maleadt
Copy link
Member

maleadt commented May 23, 2024

Sorry for the slow response, upgrading the GPU stack to LLVM 17 took longer than expected.


Using @consistent_overlay instead of @overlay doesn't seem sufficient to fix the issues in JuliaGPU/CUDA.jl#2241 (comment). Some have been fixed on 1.12 regardless of this PR, but one that remains:

using CUDA

cudacall(f, types::Type, args...; kwargs...) = nothing

function outer(f)
    @inline cudacall(f, Tuple{}; stream=Ref(42), shmem=1)
    return
end

using InteractiveUtils
InteractiveUtils.code_llvm(outer, Tuple{Nothing})
# vs
CUDA.code_llvm(outer, Tuple{Nothing})
; Function Signature: outer(Nothing)
;  @ REPL[4]:1 within `outer`
define void @julia_outer_6994() #0 {
top:
;  @ REPL[4] within `outer`
  ret void
}

vs

;  @ REPL[4]:1 within `outer`
define void @julia_outer_9973() local_unnamed_addr {
top:
  %jlcallframe1 = alloca [4 x ptr], align 8
;  @ REPL[4]:2 within `outer`
; ┌ @ REPL[3]:1 within `cudacall`
; │┌ @ iterators.jl:276 within `pairs`
; ││┌ @ essentials.jl:473 within `Pairs`
; │││┌ @ namedtuple.jl:234 within `eltype`
; ││││┌ @ namedtuple.jl:236 within `nteltype`
; │││││┌ @ tuple.jl:272 within `eltype`
; ││││││┌ @ tuple.jl:292 within `_compute_eltype`
; │││││││┌ @ promotion.jl:175 within `promote_typejoin`
          %0 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 256), align 8
          %1 = call fastcc nonnull ptr @julia_typejoin_9980(ptr readonly %0, ptr readonly inttoptr (i64 128551383252656 to ptr))
; ││││││││ @ promotion.jl:176 within `promote_typejoin`
          %2 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 64), align 8
          store ptr %2, ptr %jlcallframe1, align 8
          %3 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
          store ptr %0, ptr %3, align 8
          %4 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 2
          store ptr inttoptr (i64 128551383252656 to ptr), ptr %4, align 8
          %5 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 3
          store ptr %1, ptr %5, align 8
          %6 = call nonnull ptr @jl_f_apply_type(ptr null, ptr nonnull %jlcallframe1, i32 4)
; └└└└└└└└
;  @ REPL[4]:3 within `outer`
  ret void
}

Do you want me to reduce this to something that doesn't require CUDA.jl?


I'm also wondering if @consistent_overlay isn't a bit specific, and whether this shouldn't be a kwarg to @overlay (@overlay effects=:consistent or something). As proposed in #52940, I would think that consistent overlays pose some danger, i.e., what if the effects of the overlay function aren't identical? It seems tricky to guarantee that without a very good understanding of our optimizer. That's why I assumed that consistent overlays would only be needed for, e.g., GPU intrinsics, while a hypothetical executable overlay effects type indicating that a overlay method can be safely irinterpreted would be fine for overlay methods that do not call GPU-only code, or for other non-GPU overlay use cases.

Of course, the above is speculation without a good understanding of the optimizer / our effects system, so feel free to poke holes.

@aviatesk
Copy link
Member Author

Using @consistent_overlay instead of @overlay doesn't seem sufficient to fix the issues in JuliaGPU/CUDA.jl#2241 (comment). Some have been fixed on 1.12 regardless of this PR, but one that remains:

To fix this exact case, we actually need is #54323, and don't need this one. We can @consistent_overlay Core.throw_inexacterror since it would always throw regardless of where it's executed, but the case can be fully optimized without it if on #54323.

I'm also wondering if @consistent_overlay isn't a bit specific, and whether this shouldn't be a kwarg to @overlay (@overlay effects=:consistent or something). As proposed in #52940, I would think that consistent overlays pose some danger, i.e., what if the effects of the overlay function aren't identical? It seems tricky to guarantee that without a very good understanding of our optimizer. That's why I assumed that consistent overlays would only be needed for, e.g., GPU intrinsics, while a hypothetical executable overlay effects type indicating that a overlay method can be safely irinterpreted would be fine for overlay methods that do not call GPU-only code, or for other non-GPU overlay use cases.

If overlay options other than @overlay effects=:consistent are possible, it might be better to use kwargs, but I can't think of any extensions other than consistent_overlay, so I think it's fine to keep it as @consistent_overlay. In any case, these are under Base.Experimental, so we can relatively easily change them if necessary.
It is true that @consistent_overlay requires an understanding of our compiler, but I hope the detailed explanation added in this PR will enable safe usage.

@maleadt
Copy link
Member

maleadt commented May 23, 2024

I can't think of any extensions other than consistent_overlay

What about the executable one I suggested above, wouldn't that be a safer option for e.g. CassetteOverlay.jl (where the overlay methods are guaranteed to be executable)?

@aviatesk
Copy link
Member Author

@consistent_overlay is a subset of what :executable implies, and @consistent_overlay allows the target methods to be executed at compile time if they are :foldable, but I guess I'm missing your point?

@maleadt
Copy link
Member

maleadt commented May 23, 2024

I guess I'm mixing up effects analysis with concrete evaluation. In any case, I was suggesting a more relaxed version of the @overlay macro where the overlay method wouldn't have to be consistent, instead allowing the compiler to analyze its effects / perform concrete evaluation. That would make it possible to register overlays without having to fully understand the compiler's effects, but also without the performance issues of the current behavior (which IIUC was instated because of the GPU intrinsics that are incompatible with constprop/irinterp).

In any case, for the GPU use case I only care about :consistent, so the currently implementation is fine by me.

@vchuravy
Copy link
Member

For me @consistent_overlay and executable are orthogonal.

  • consistent_overlay states: You may execute the original method instead of the overlay and you are guaranteed to obtain the same results
  • executable you may execute this function inside of the compiler and it will return the same as it would one the target.

E.g. we need consistent_overlay for the GPU since we can't execute the actual overlay.
But for use-cases like CassetteOverlays we shouldn't be executing the original method, but rather execute the overlay method during concrete evaluation and there is no consistency requirement.

@aviatesk
Copy link
Member Author

Thank you for clarifying. That definition of :executable sounds to assume that the system for actually executing overlay methods is available for the compiler. Currently, there is no such system, and we actually need to bring something like CassetteOverlay into base. And if there is such a system for executing overlay methods fully available, I think :executable would no longer be necessary: if the overlay method has good effects, that system could be used for concrete evaluation.

KristofferC added a commit that referenced this pull request May 28, 2024
Backported PRs:
- [x] #53665 <!-- use afoldl instead of tail recursion for tuples -->
- [x] #53976 <!-- LinearAlgebra: LazyString in interpolated error
messages -->
- [x] #54005 <!-- make `view(::Memory, ::Colon)` produce a Vector -->
- [x] #54010 <!-- Overload `Base.literal_pow` for `AbstractQ` -->
- [x] #54069 <!-- Allow PrecompileTools to see MI's inferred by foreign
abstract interpreters -->
- [x] #53750 <!-- inference correctness: fields and globals can revert
to undef -->
- [x] #53984 <!-- Profile: fix heap snapshot is valid char check -->
- [x] #54102 <!-- Explicitly compute stride in unaliascopy for SubArray
-->
- [x] #54070 <!-- Fix integer overflow in `skip(s::IOBuffer,
typemax(Int64))` -->
- [x] #54013 <!-- Support case-changes to Annotated{String,Char}s -->
- [x] #53941 <!-- Fix writing of AnnotatedChars to AnnotatedIOBuffer -->
- [x] #54137 <!-- Fix typo in docs for `partialsortperm` -->
- [x] #54129 <!-- use correct size when creating output data from an
IOBuffer -->
- [x] #54153 <!-- Fixup IdSet docstring -->
- [x] #54143 <!-- Fix `make install` from tarballs -->
- [x] #54151 <!-- LinearAlgebra: Correct zero element in
`_generic_matvecmul!` for block adj/trans -->
- [x] #54213 <!-- Add `public` statement to `Base.GC` -->
- [x] #54222 <!-- Utilize correct tbaa when emitting stores of unions.
-->
- [x] #54233 <!-- set MAX_OS_WRITE on unix -->
- [x] #54255 <!-- fix `_checked_mul_dims` in the presence of 0s and
overflow. -->
- [x] #54259 <!-- Fix typo in `readuntil` -->
- [x] #54251 <!-- fix typo in gc_mark_memory8 when chunking a large
array -->
- [x] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing
imaginary part on diagonal -->
- [x] #54248 <!-- ensure package callbacks are invoked when no valid
precompile file exists for an "auto loaded" stdlib -->
- [x] #54308 <!-- Implement eval-able AnnotatedString 2-arg show -->
- [x] #54302 <!-- Specialised substring equality for annotated strs -->
- [x] #54243 <!-- prevent `package_callbacks` to run multiple time for a
single package -->
- [x] #54350 <!-- add a precompile signature to Artifacts code that is
used by JLLs -->
- [x] #54331 <!-- correctly track freed bytes in
jl_genericmemory_to_string -->
- [x] #53509 <!-- revert moving "creating packages" from Pkg.jl -->
- [x] #54335 <!-- When accessing the data pointer for an array, first
decay it to a Derived Pointer -->
- [x] #54239 <!-- Make sure `fieldcount` constant-folds for `Tuple{...}`
-->
- [x] #54288
- [x] #54067
- [x] #53715 <!-- Add read/write specialisation for IOContext{AnnIO} -->
- [x] #54289 <!-- Rework annotation ordering/optimisations -->
- [x] #53815 <!-- create phantom task for GC threads -->
- [x] #54130 <!-- inference: handle `LimitedAccuracy` in
`handle_global_assignment!` -->
- [x] #54428 <!-- Move ConsoleLogging.jl into Base -->
- [x] #54332 <!-- Revert "add unsetindex support to more copyto methods
(#51760)" -->
- [x] #53826 <!-- Make all command-line options documented in all
related files -->
- [x] #54465 <!-- typeintersect: conservative typevar subtitution during
`finish_unionall` -->
- [x] #54514 <!-- typeintersect: followup cleanup for the nothrow path
of type instantiation -->
- [x] #54499 <!-- make `@doc x` work without REPL loaded -->
- [x] #54210 <!-- attach finalizer in `mmap` to the correct object -->
- [x] #54359 <!-- Pkg REPL: cache `pkg_mode` lookup -->

Non-merged PRs with backport label:
- [ ] #54471 <!-- Actually setup jit targets when compiling
packageimages instead of targeting only one -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #54323 <!-- inference: fix too conservative effects for recursive
cycles -->
- [ ] #54322 <!-- effects: add new `@consistent_overlay` macro -->
- [ ] #54191 <!-- make `AbstractPipe` public -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #53882 <!-- Warn about cycles in extension precompilation -->
- [ ] #53707 <!-- Make ScopedValue public -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [ ] #53286 <!-- Raise an error when using `include_dependency` with
non-existent file or directory -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
@KristofferC KristofferC mentioned this pull request May 29, 2024
60 tasks
@aviatesk aviatesk force-pushed the avi/gpucompiler-384-again branch 2 times, most recently from ee313e5 to f3ff647 Compare June 15, 2024 14:25
@aviatesk aviatesk force-pushed the avi/gpucompiler-384-again branch 2 times, most recently from 32c6c99 to f005c05 Compare June 18, 2024 08:13
@aviatesk
Copy link
Member Author

CI failures are unrelated to this PR. Going to merge.

@aviatesk aviatesk merged commit c5994e4 into master Jun 18, 2024
5 of 7 checks passed
@aviatesk aviatesk deleted the avi/gpucompiler-384-again branch June 18, 2024 12:00
aviatesk added a commit that referenced this pull request Jun 18, 2024
This PR serves to replace #51080 and close #52940.
It extends the `:nonoverlayed` to `UInt8` and introduces the
`CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of
overlay methods using the original non-overlayed counterparts when
applied. This newly added `:nonoverlayed`-bit is enabled through the
newly added `Base.Experimental.@consistent_overlay mt def` macro.
`@consistent_overlay` is similar to `@overlay`, but it sets the
`:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method
definition, allowing the method to be concrete-evaluated.
To use this feature safely, I have also added quite precise
documentation to `@consistent_overlay`.
@aviatesk aviatesk removed the backport 1.11 Change should be backported to release-1.11 label Jun 18, 2024
aviatesk added a commit that referenced this pull request Jun 18, 2024
This PR serves to replace #51080 and close #52940.
It extends the `:nonoverlayed` to `UInt8` and introduces the
`CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of
overlay methods using the original non-overlayed counterparts when
applied. This newly added `:nonoverlayed`-bit is enabled through the
newly added `Base.Experimental.@consistent_overlay mt def` macro.
`@consistent_overlay` is similar to `@overlay`, but it sets the
`:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method
definition, allowing the method to be concrete-evaluated.
To use this feature safely, I have also added quite precise
documentation to `@consistent_overlay`.
KristofferC pushed a commit that referenced this pull request Jun 18, 2024
This PR serves to replace #51080 and close #52940.
It extends the `:nonoverlayed` to `UInt8` and introduces the
`CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of
overlay methods using the original non-overlayed counterparts when
applied. This newly added `:nonoverlayed`-bit is enabled through the
newly added `Base.Experimental.@consistent_overlay mt def` macro.
`@consistent_overlay` is similar to `@overlay`, but it sets the
`:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method
definition, allowing the method to be concrete-evaluated.
To use this feature safely, I have also added quite precise
documentation to `@consistent_overlay`.
KristofferC added a commit that referenced this pull request Jun 25, 2024
Backported PRs:
- [x] #54361 <!-- [LBT] Upgrade to v5.9.0 -->
- [x] #54474 <!-- Unalias source from dest in copytrito -->
- [x] #54548 <!-- Fixes for bitcast bugs with LLVM 17 / opaque pointers
-->
- [x] #54191 <!-- make `AbstractPipe` public -->
- [x] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [x] #53356 <!-- Rename at-scriptdir project argument to at-script and
search upwards for Project.toml -->
- [x] #54545 <!-- typeintersect: fix incorrect innervar handling under
circular env -->
- [x] #54586 <!-- Set storage class of julia globals to dllimport on
windows to avoid auto-import weirdness. Forward port of #54572 -->
- [x] #54587 <!-- Accomodate for rectangular matrices in `copytrito!`
-->
- [x] #54617 <!-- CLI: Use `GetModuleHandleExW` to locate libjulia.dll
-->
- [x] #54605 <!-- Allow libquadmath to also fail as it is not available
on all systems -->
- [x] #54634 <!-- Fix trampoline assembly for build on clang 18 on apple
silicon -->
- [x] #54635 <!-- Aggressive constprop in trevc! to stabilize triangular
eigvec -->
- [x] #54645 <!-- ensure we set the right value to gc_first_tid -->
- [x] #54554 <!-- make elsize public -->
- [x] #54648 <!-- Construct LazyString in error paths for tridiag -->
- [x] #54658 <!-- fix missing uuid check on extension when finding the
location of an extension -->
- [x] #54594 <!-- Switch to Pkg mode prompt immediately and load Pkg in
the background -->
- [x] #54669 <!-- Improve error message in inplace transpose -->
- [x] #54671 <!-- Add boundscheck in bindingkey_eq to avoid OOB access
due to data race -->
- [x] #54672 <!-- make: Fix `sed` command for LLVM libraries with no
symbol versioning -->
- [x] #54624 <!-- more precise aliasing checks for SubArray -->
- [x] #54679 <!-- 🤖 [master] Bump the Distributed stdlib from 6a07d98 to
6c7cdb5 -->
- [x] #54604 <!-- Fix tbaa annotation on union selector bytes inside of
structs -->
- [x] #54690 <!-- Fix assertion/crash when optimizing function with dead
basic block -->
- [x] #54704 <!-- LazyString in reinterpretarray error messages -->
- [x] #54718 <!-- fix prepend StackOverflow issue -->
- [x] #54674 <!-- Reimplement dummy pkg prompt as standard prompt -->
- [x] #54737 <!-- LazyString in interpolated error messages involving
types -->
- [x] #54642 <!-- Document GenericMemory and AtomicMemory -->
- [x] #54713 <!-- make: use `readelf` for LLVM symbol version detection
-->
- [x] #54760 <!-- REPL: improve prompt! async function handler -->
- [x] #54606 <!-- fix double-counting and non-deterministic results in
`summarysize` -->
- [x] #54759 <!-- REPL: Fully populate the dummy Pkg prompt -->
- [x] #54702 <!-- lowering: Recognize argument destructuring inside
macro hygiene -->
- [x] #54678 <!-- Don't let setglobal! implicitly create bindings -->
- [x] #54730 <!-- Fix uuidkey of exts in fast path of `require_stdlib`
-->
- [x] #54765 <!-- Handle no-postdominator case in finalizer pass -->
- [x] #54591 <!-- Don't expose guard pages to malloc_stack API consumers
-->
- [x] #54755 <!-- [TOML] remove Dates hack, replace with explicit usage
-->
- [x] #54721 <!-- add try/catch around scheduler to reset sleep state
-->
- [x] #54631 <!-- Avoid concatenating LazyString in setindex! for
triangular matrices -->
- [x] #54322 <!-- effects: add new `@consistent_overlay` macro -->
- [x] #54785
- [x] #54865
- [x] #54815
- [x] #54795
- [x] #54779
- [x] #54837 

Contains multiple commits, manual intervention needed:
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #54649 <!-- Less restrictive copyto! signature for triangular
matrices -->

Non-merged PRs with backport label:
- [ ] #54779 <!-- make recommendation clearer on manifest version
mismatch -->
- [ ] #54739 <!-- finish implementation of upgradable stdlibs -->
- [ ] #54738 <!-- serialization: fix relocatability bug -->
- [ ] #54574 <!-- Make ScopedValues public -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #53286 <!-- Raise an error when using `include_dependency` with
non-existent file or directory -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve optimization of overlay methods
5 participants