-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
effects: add new @consistent_overlay
macro
#54322
Conversation
could we also add a weaker version of this effect that only requires the overlay to return an egal answer when the regular function returns an answer? (e.g. for things like NaNMath) |
That sounds exactly like the requirements for |
If I'm reading this right, it wouldn't be valid to use this extension for |
Ah, did you mean |
146a43a
to
c675f21
Compare
I'm ok with this, but it seems a bit weird to put this on |
c675f21
to
6b294ec
Compare
:consistent_overlay
override@consistent_overlay
macro
I agree. I've created a new |
Sorry for the slow response, upgrading the GPU stack to LLVM 17 took longer than expected. Using using CUDA
cudacall(f, types::Type, args...; kwargs...) = nothing
function outer(f)
@inline cudacall(f, Tuple{}; stream=Ref(42), shmem=1)
return
end
using InteractiveUtils
InteractiveUtils.code_llvm(outer, Tuple{Nothing})
# vs
CUDA.code_llvm(outer, Tuple{Nothing}) ; Function Signature: outer(Nothing)
; @ REPL[4]:1 within `outer`
define void @julia_outer_6994() #0 {
top:
; @ REPL[4] within `outer`
ret void
} vs ; @ REPL[4]:1 within `outer`
define void @julia_outer_9973() local_unnamed_addr {
top:
%jlcallframe1 = alloca [4 x ptr], align 8
; @ REPL[4]:2 within `outer`
; ┌ @ REPL[3]:1 within `cudacall`
; │┌ @ iterators.jl:276 within `pairs`
; ││┌ @ essentials.jl:473 within `Pairs`
; │││┌ @ namedtuple.jl:234 within `eltype`
; ││││┌ @ namedtuple.jl:236 within `nteltype`
; │││││┌ @ tuple.jl:272 within `eltype`
; ││││││┌ @ tuple.jl:292 within `_compute_eltype`
; │││││││┌ @ promotion.jl:175 within `promote_typejoin`
%0 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 256), align 8
%1 = call fastcc nonnull ptr @julia_typejoin_9980(ptr readonly %0, ptr readonly inttoptr (i64 128551383252656 to ptr))
; ││││││││ @ promotion.jl:176 within `promote_typejoin`
%2 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 64), align 8
store ptr %2, ptr %jlcallframe1, align 8
%3 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
store ptr %0, ptr %3, align 8
%4 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 2
store ptr inttoptr (i64 128551383252656 to ptr), ptr %4, align 8
%5 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 3
store ptr %1, ptr %5, align 8
%6 = call nonnull ptr @jl_f_apply_type(ptr null, ptr nonnull %jlcallframe1, i32 4)
; └└└└└└└└
; @ REPL[4]:3 within `outer`
ret void
} Do you want me to reduce this to something that doesn't require CUDA.jl? I'm also wondering if Of course, the above is speculation without a good understanding of the optimizer / our effects system, so feel free to poke holes. |
To fix this exact case, we actually need is #54323, and don't need this one. We can
If overlay options other than |
What about the |
|
I guess I'm mixing up effects analysis with concrete evaluation. In any case, I was suggesting a more relaxed version of the In any case, for the GPU use case I only care about |
For me
E.g. we need |
Thank you for clarifying. That definition of |
6b294ec
to
a406fcd
Compare
Backported PRs: - [x] #53665 <!-- use afoldl instead of tail recursion for tuples --> - [x] #53976 <!-- LinearAlgebra: LazyString in interpolated error messages --> - [x] #54005 <!-- make `view(::Memory, ::Colon)` produce a Vector --> - [x] #54010 <!-- Overload `Base.literal_pow` for `AbstractQ` --> - [x] #54069 <!-- Allow PrecompileTools to see MI's inferred by foreign abstract interpreters --> - [x] #53750 <!-- inference correctness: fields and globals can revert to undef --> - [x] #53984 <!-- Profile: fix heap snapshot is valid char check --> - [x] #54102 <!-- Explicitly compute stride in unaliascopy for SubArray --> - [x] #54070 <!-- Fix integer overflow in `skip(s::IOBuffer, typemax(Int64))` --> - [x] #54013 <!-- Support case-changes to Annotated{String,Char}s --> - [x] #53941 <!-- Fix writing of AnnotatedChars to AnnotatedIOBuffer --> - [x] #54137 <!-- Fix typo in docs for `partialsortperm` --> - [x] #54129 <!-- use correct size when creating output data from an IOBuffer --> - [x] #54153 <!-- Fixup IdSet docstring --> - [x] #54143 <!-- Fix `make install` from tarballs --> - [x] #54151 <!-- LinearAlgebra: Correct zero element in `_generic_matvecmul!` for block adj/trans --> - [x] #54213 <!-- Add `public` statement to `Base.GC` --> - [x] #54222 <!-- Utilize correct tbaa when emitting stores of unions. --> - [x] #54233 <!-- set MAX_OS_WRITE on unix --> - [x] #54255 <!-- fix `_checked_mul_dims` in the presence of 0s and overflow. --> - [x] #54259 <!-- Fix typo in `readuntil` --> - [x] #54251 <!-- fix typo in gc_mark_memory8 when chunking a large array --> - [x] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing imaginary part on diagonal --> - [x] #54248 <!-- ensure package callbacks are invoked when no valid precompile file exists for an "auto loaded" stdlib --> - [x] #54308 <!-- Implement eval-able AnnotatedString 2-arg show --> - [x] #54302 <!-- Specialised substring equality for annotated strs --> - [x] #54243 <!-- prevent `package_callbacks` to run multiple time for a single package --> - [x] #54350 <!-- add a precompile signature to Artifacts code that is used by JLLs --> - [x] #54331 <!-- correctly track freed bytes in jl_genericmemory_to_string --> - [x] #53509 <!-- revert moving "creating packages" from Pkg.jl --> - [x] #54335 <!-- When accessing the data pointer for an array, first decay it to a Derived Pointer --> - [x] #54239 <!-- Make sure `fieldcount` constant-folds for `Tuple{...}` --> - [x] #54288 - [x] #54067 - [x] #53715 <!-- Add read/write specialisation for IOContext{AnnIO} --> - [x] #54289 <!-- Rework annotation ordering/optimisations --> - [x] #53815 <!-- create phantom task for GC threads --> - [x] #54130 <!-- inference: handle `LimitedAccuracy` in `handle_global_assignment!` --> - [x] #54428 <!-- Move ConsoleLogging.jl into Base --> - [x] #54332 <!-- Revert "add unsetindex support to more copyto methods (#51760)" --> - [x] #53826 <!-- Make all command-line options documented in all related files --> - [x] #54465 <!-- typeintersect: conservative typevar subtitution during `finish_unionall` --> - [x] #54514 <!-- typeintersect: followup cleanup for the nothrow path of type instantiation --> - [x] #54499 <!-- make `@doc x` work without REPL loaded --> - [x] #54210 <!-- attach finalizer in `mmap` to the correct object --> - [x] #54359 <!-- Pkg REPL: cache `pkg_mode` lookup --> Non-merged PRs with backport label: - [ ] #54471 <!-- Actually setup jit targets when compiling packageimages instead of targeting only one --> - [ ] #54457 <!-- Make `String(::Memory)` copy --> - [ ] #54323 <!-- inference: fix too conservative effects for recursive cycles --> - [ ] #54322 <!-- effects: add new `@consistent_overlay` macro --> - [ ] #54191 <!-- make `AbstractPipe` public --> - [ ] #53957 <!-- tweak how filtering is done for what packages should be precompiled --> - [ ] #53882 <!-- Warn about cycles in extension precompilation --> - [ ] #53707 <!-- Make ScopedValue public --> - [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #53286 <!-- Raise an error when using `include_dependency` with non-existent file or directory --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
a406fcd
to
34a94d8
Compare
ee313e5
to
f3ff647
Compare
f3ff647
to
dfca4c2
Compare
32c6c99
to
f005c05
Compare
f005c05
to
b1e031e
Compare
CI failures are unrelated to this PR. Going to merge. |
This PR serves to replace #51080 and close #52940. It extends the `:nonoverlayed` to `UInt8` and introduces the `CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. This newly added `:nonoverlayed`-bit is enabled through the newly added `Base.Experimental.@consistent_overlay mt def` macro. `@consistent_overlay` is similar to `@overlay`, but it sets the `:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method definition, allowing the method to be concrete-evaluated. To use this feature safely, I have also added quite precise documentation to `@consistent_overlay`.
This PR serves to replace #51080 and close #52940. It extends the `:nonoverlayed` to `UInt8` and introduces the `CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. This newly added `:nonoverlayed`-bit is enabled through the newly added `Base.Experimental.@consistent_overlay mt def` macro. `@consistent_overlay` is similar to `@overlay`, but it sets the `:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method definition, allowing the method to be concrete-evaluated. To use this feature safely, I have also added quite precise documentation to `@consistent_overlay`.
This PR serves to replace #51080 and close #52940. It extends the `:nonoverlayed` to `UInt8` and introduces the `CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. This newly added `:nonoverlayed`-bit is enabled through the newly added `Base.Experimental.@consistent_overlay mt def` macro. `@consistent_overlay` is similar to `@overlay`, but it sets the `:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method definition, allowing the method to be concrete-evaluated. To use this feature safely, I have also added quite precise documentation to `@consistent_overlay`.
Backported PRs: - [x] #54361 <!-- [LBT] Upgrade to v5.9.0 --> - [x] #54474 <!-- Unalias source from dest in copytrito --> - [x] #54548 <!-- Fixes for bitcast bugs with LLVM 17 / opaque pointers --> - [x] #54191 <!-- make `AbstractPipe` public --> - [x] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [x] #53356 <!-- Rename at-scriptdir project argument to at-script and search upwards for Project.toml --> - [x] #54545 <!-- typeintersect: fix incorrect innervar handling under circular env --> - [x] #54586 <!-- Set storage class of julia globals to dllimport on windows to avoid auto-import weirdness. Forward port of #54572 --> - [x] #54587 <!-- Accomodate for rectangular matrices in `copytrito!` --> - [x] #54617 <!-- CLI: Use `GetModuleHandleExW` to locate libjulia.dll --> - [x] #54605 <!-- Allow libquadmath to also fail as it is not available on all systems --> - [x] #54634 <!-- Fix trampoline assembly for build on clang 18 on apple silicon --> - [x] #54635 <!-- Aggressive constprop in trevc! to stabilize triangular eigvec --> - [x] #54645 <!-- ensure we set the right value to gc_first_tid --> - [x] #54554 <!-- make elsize public --> - [x] #54648 <!-- Construct LazyString in error paths for tridiag --> - [x] #54658 <!-- fix missing uuid check on extension when finding the location of an extension --> - [x] #54594 <!-- Switch to Pkg mode prompt immediately and load Pkg in the background --> - [x] #54669 <!-- Improve error message in inplace transpose --> - [x] #54671 <!-- Add boundscheck in bindingkey_eq to avoid OOB access due to data race --> - [x] #54672 <!-- make: Fix `sed` command for LLVM libraries with no symbol versioning --> - [x] #54624 <!-- more precise aliasing checks for SubArray --> - [x] #54679 <!-- 🤖 [master] Bump the Distributed stdlib from 6a07d98 to 6c7cdb5 --> - [x] #54604 <!-- Fix tbaa annotation on union selector bytes inside of structs --> - [x] #54690 <!-- Fix assertion/crash when optimizing function with dead basic block --> - [x] #54704 <!-- LazyString in reinterpretarray error messages --> - [x] #54718 <!-- fix prepend StackOverflow issue --> - [x] #54674 <!-- Reimplement dummy pkg prompt as standard prompt --> - [x] #54737 <!-- LazyString in interpolated error messages involving types --> - [x] #54642 <!-- Document GenericMemory and AtomicMemory --> - [x] #54713 <!-- make: use `readelf` for LLVM symbol version detection --> - [x] #54760 <!-- REPL: improve prompt! async function handler --> - [x] #54606 <!-- fix double-counting and non-deterministic results in `summarysize` --> - [x] #54759 <!-- REPL: Fully populate the dummy Pkg prompt --> - [x] #54702 <!-- lowering: Recognize argument destructuring inside macro hygiene --> - [x] #54678 <!-- Don't let setglobal! implicitly create bindings --> - [x] #54730 <!-- Fix uuidkey of exts in fast path of `require_stdlib` --> - [x] #54765 <!-- Handle no-postdominator case in finalizer pass --> - [x] #54591 <!-- Don't expose guard pages to malloc_stack API consumers --> - [x] #54755 <!-- [TOML] remove Dates hack, replace with explicit usage --> - [x] #54721 <!-- add try/catch around scheduler to reset sleep state --> - [x] #54631 <!-- Avoid concatenating LazyString in setindex! for triangular matrices --> - [x] #54322 <!-- effects: add new `@consistent_overlay` macro --> - [x] #54785 - [x] #54865 - [x] #54815 - [x] #54795 - [x] #54779 - [x] #54837 Contains multiple commits, manual intervention needed: - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #54649 <!-- Less restrictive copyto! signature for triangular matrices --> Non-merged PRs with backport label: - [ ] #54779 <!-- make recommendation clearer on manifest version mismatch --> - [ ] #54739 <!-- finish implementation of upgradable stdlibs --> - [ ] #54738 <!-- serialization: fix relocatability bug --> - [ ] #54574 <!-- Make ScopedValues public --> - [ ] #54457 <!-- Make `String(::Memory)` copy --> - [ ] #53957 <!-- tweak how filtering is done for what packages should be precompiled --> - [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} --> - [ ] #53286 <!-- Raise an error when using `include_dependency` with non-existent file or directory --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
This PR serves to replace #51080 and close #52940.
It extends the
:nonoverlayed
toUInt8
and introduces theCONSISTENT_OVERLAY
effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. Additionally, this PR adds a new effect override called:consistent_overlay
.I've also included a relatively accurate description of
:consistent_overlay
, as pointed out in #51080. Quoting from the newly added docstrings:Still, the explanation might not be comprehensive enough. I welcome any feedback.
EDIT: Now this feature is implemented with a new macro
@consistent_overlay
instead of extending the override set of@assume_effects
.