-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of reinterpret in its current form #22849
Comments
I think the issue we have is that we allow However, do we still need that capability in general? In the parts of the IO system that used to use this, I've largely switched them to using |
I'm fine with implementing |
Right, but how would you express that? Currently we don't allow any sort of struct-type-punning in If copy the elements to stack, wouldn't that still imply that the source and destination TBAA are different, inhibiting optimization around it (since LLVM doesn't have a TBAA union type)? |
The fact that source and destination TBAA can't be different is a limitation of the current memcpy implementation, not of the underlying semantics. I plan to fix that. That's an optimization however. The reinterpret changes would be breaking and would have to be done pre-1.0. |
So do you mean to change the implementation of reinterpret, or the syntax? |
implementation. But the return type would change, so it's breaking. |
Then i'd like to see a better motivation than "I don't like the way reinterpret is currently implemented." |
It's in the original post. It violates alignment guarantees (so you now get errors unless you know about alignment issues) and it prevents us from strengthening TBAA. |
I would love to see this. As a user I appreciate this interface. |
Cross-ref additional issues with reinterpret: #16652 |
What about this API:
This requires a little forethought but it should be doable. Another possible API for when that's not possible is to extract elements of a different type from an array without reinterpreting it, as if it had a different element type. That would potentially be slow of the alignment disagrees, but it wouldn't be wrong at least. We can have both as they're somewhat complementary. |
Will this be in the type or in the value? |
One should also be able to query the maximal alignment of an array. |
I'm still confused by what you mean. Do you mean the alignment is determined by the array or the eltype? |
The alignment would have to be a property of the array. The original element type would determine the minimal alignment that an array must have, but the idea is that you could request coarser alignment if you know it's going to be needed. Something like this: a = Array{UInt8}(n)
b = reinterpret(UInt128, a) # illegal
a = Array{UInt8}(2^20, alignment=128)
b = reinterpret(UInt128, b) # legal For the second suggestion, I had something like this in mind: reinterpret(UInt128, a, i) # reinterpret(UInt128, a)[i] but legal Keno's idea of having a |
I should perhaps clarify that Yichao is "confused" because it is utterly useless to have alignment be part of the data. We can already compute the information for free from the pointer itself. However, only alignments that are guaranteed at compile time (for various combinations of register types and hardware) are performant. Everything else (especially testing for runtime alignment properties) is potentially a substantial performance penalty. |
It's not really useless since the concern of not allowing accidentally aligned array to be reinterpreted is that it'll hide the errors until someone get's unlucky. If the property has to be explicitly specified this won't be an issue anymore. That said, such an object-specific property will make |
The point is to make it possible to allocate an array of one size in a way that can be safely reinterpreted later to a larger size – which we can currently only do by allocating in the bigger size first and then reinterpreting the other way. There are no runtime checks on element access since you never allow an array to be reinterpreted to a size that's not safe and pre-requested. There is a runtime check on reinterpret, but that doesn't matter. The |
Accessing a non-reinterpreted array should in all cases be preferred to the reinterpreted case, performance tradeoff wise. LLVM will do a fine job folding everything to give decent performance with the |
How is that possible? You're still accessing an array with the wrong alignment? If this were no slower why would we bother insisting on alignment in the first place? Or by "decent performance" do you mean, not optimal, but as good as one can get with potentially misaligned storage? |
Yes, which is really the best you can hope for without pessimizing everything else. |
Decided: we're going to do the |
Note to self: get rid of this as part of this: https://github.com/JuliaLang/julia/blob/master/base/bitarray.jl#L292 |
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
This redoes `reinterpret` in julia rather than punning the memory of the actual array. The motivation for this is to avoid the API limitations of the current reinterpret implementation (Array only, preventing strong TBAA, alignment problems). The surface API essentially unchanged, though the shape argument to reinterpret is removed, since those concepts are now orthogonal. The return type from `reinterpret` is now `ReinterpretArray`, which implements the AbstractArray interface and does the reinterpreting lazily on demand. The compiler is able to fold away the abstraction and generate very tight IR: ``` julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000)); julia> typeof(ar) Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}} julia> f(ar) = @inbounds return ar[1] f (generic function with 1 method) julia> @code_llvm f(ar) ; Function f ; Location: REPL[2] define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 { top: ; Location: REPL[2]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)* %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8 %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)* %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)* %7 = load i64*, i64* addrspace(11)* %6, align 8 %8 = load i64, i64* %7, align 8 %9 = getelementptr i64, i64* %7, i64 1 %10 = load i64, i64* %9, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1] g (generic function with 1 method) julia> @code_llvm g(randn(1000)) ; Function g ; Location: REPL[4] define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 { top: ; Location: REPL[4]:1 ; Function getindex; { ; Location: reinterpretarray.jl:31 %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)* %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)* %4 = load double*, double* addrspace(11)* %3, align 8 %5 = bitcast double* %4 to i64* %6 = load i64, i64* %5, align 8 %7 = getelementptr double, double* %4, i64 1 %8 = bitcast double* %7 to i64* %9 = load i64, i64* %8, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0 store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8 %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1 store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8 ;} ret void } ``` In addition, the new `reinterpret` implementation is able to handle any AbstractArray (whether useful or not is a separate decision): ``` invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10)) 5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}: 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 1.0+0.0im 0.0+1.0im ``` The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's code deeper in the array code that needs to be broadened to allow ReinterpretArray. Fixes #22849 Fixes #19238
For myself, and maybe others with the same initial reception of this:
|
I don't like the way reinterpret is currently implemented. It puns on the notion of an array. One problem we recently had to deal with is that alignment guarantees are different between different types (those are now disallowed). Another is that it prevents us from doing more strict TBAA on array element types. I think we should get rid of reinterpret completely and replace it with a
ReinterpretArray
type, who's getindex method performs the appropriate load from the original array. That way we never have anArray
with incorrectly typed storage, but can retain the convenience (and once again allow reinterpret for types with mismatched alignment).The text was updated successfully, but these errors were encountered: