-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copyto!
does not vectorize for views with UnitRange
indices
#53430
Comments
This also appears to be a regression from v1.10.1, where both the operations are equally performant: julia> a = rand(100,100); b = similar(a); av = view(a, axes(a)...); bv = view(b, axes(b)...); bv2 = view(b, UnitRange.(axes(b))...);
julia> @btime copyto!($bv, $av);
9.646 μs (0 allocations: 0 bytes)
julia> @btime copyto!($bv2, $av);
9.560 μs (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.10.1
Commit 7790d6f0641 (2024-02-13 20:41 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
JULIA_EDITOR = subl |
Bisected the regression to #51760
On df39cee, julia> a = rand(100,100); b = similar(a); av = view(a, axes(a)...); bv2 = view(b, UnitRange.(axes(b))...);
julia> @btime copyto!($bv2, $av);
2.428 μs (0 allocations: 0 bytes) whereas on f0a28e9: julia> @btime copyto!($bv2, $av);
20.690 μs (0 allocations: 0 bytes) |
setindex!
does not vectorize for views with UnitRange
indicescopyto!
does not vectorize for views with UnitRange
indices
This reverts commit f0a28e9. This introduced in general a try catch inside the inner loop for `copyto!` and it also has performance regression in other cases #53430. Since this was added without any tests and "is not-quite-public API" it seems easiest to just revert it. This was added for Memory-to-Array and vice versa but dedicated methods could be added for that if it is desirable Fixes #53430, #52070
Even with the revert of #51760 there still seem to be some slowdown: 1.10:
1.11:
But I am not sure it is bad enough to require a milestone... |
Oddly, for me, the performance regression persists on a recently nightly: julia> @btime copyto!($bv, $av); # fast, indices are Base.OneTos
2.375 μs (0 allocations: 0 bytes)
julia> @btime copyto!($bv2, $av); # slow, indices are UnitRanges
23.745 μs (0 allocations: 0 bytes)
julia> VERSION
v"1.12.0-DEV.560" I also see a similar issue on the julia> @btime copyto!($bv, $av);
2.597 μs (0 allocations: 0 bytes)
julia> @btime copyto!($bv2, $av);
24.499 μs (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.11.0-beta2.2
Commit 862f863e0f* (2024-05-29 10:49 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
JULIA_EDITOR = subl @KristofferC Did the performance improve for you after reverting the PR? Edit: I guess I had missed the previous comment. |
Profiling just the relevant branch shows julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line Function
=========================================================
╎4762 @Base/client.jl:568 _start()
╎ 4762 @Base/client.jl:593 repl_main
╎ 4762 @Base/client.jl:511 run_main_repl(interactive::Bool, quiet::Bool, banner::Symbol, history_file::Bool)
╎ 4762 @Base/essentials.jl:1045 invokelatest
╎ 4762 @Base/essentials.jl:1048 #invokelatest#1
╎ 4762 @Base/client.jl:490 run_std_repl(REPL::Module, quiet::Bool, banner::Symbol, history_file::Bool)
╎ ╎ 4762 @REPL/src/REPL.jl:606 run_repl(repl::REPL.AbstractREPL, consumer::Any)
╎ ╎ 4762 @REPL/src/REPL.jl:620 run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
╎ ╎ 4762 @REPL/src/REPL.jl:454 start_repl_backend
╎ ╎ 4762 @REPL/src/REPL.jl:457 start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
╎ ╎ 4762 @REPL/src/REPL.jl:472 repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
╎ ╎ ╎ 4762 @REPL/src/REPL.jl:360 eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
╎ ╎ ╎ 4762 @REPL/src/REPL.jl:335 toplevel_eval_with_hooks
╎ ╎ ╎ 4762 @REPL/src/REPL.jl:342 toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
╎ ╎ ╎ 4762 @REPL/src/REPL.jl:342 toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
╎ ╎ ╎ 4762 @REPL/src/REPL.jl:342 toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
╎ ╎ ╎ ╎ 4762 @REPL/src/REPL.jl:338 toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
╎ ╎ ╎ ╎ 4762 @REPL/src/REPL.jl:331 __repl_entry_eval_expanded_with_loc(mod::Module, ast::Any, toplevel_file::Ref{Ptr{UInt8}}, toplevel_line::Ref{Int32})
╎ ╎ ╎ ╎ 4762 @BenchmarkTools/src/execution.jl:136 kwcall(::@NamedTuple{warmup::Bool}, ::typeof(run), b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
╎ ╎ ╎ ╎ 4762 @BenchmarkTools/src/execution.jl:144 run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwar…
╎ ╎ ╎ ╎ 4762 @BenchmarkTools/src/execution.jl:47 run_result
╎ ╎ ╎ ╎ ╎ 4762 @BenchmarkTools/src/execution.jl:48 #run_result#31
╎ ╎ ╎ ╎ ╎ 4762 @Base/essentials.jl:1045 invokelatest
╎ ╎ ╎ ╎ ╎ 4762 @Base/essentials.jl:1050 #invokelatest#1
╎ ╎ ╎ ╎ ╎ 4762 @BenchmarkTools/src/execution.jl:109 kwcall(::@NamedTuple{warmup::Bool}, ::typeof(BenchmarkTools._run), b::BenchmarkTools.Benchmark, p::BenchmarkTools.Par…
╎ ╎ ╎ ╎ ╎ 26 @BenchmarkTools/src/execution.jl:119 _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, warmup::Bool, kwargs::Ba…
╎ ╎ ╎ ╎ ╎ ╎ 26 @BenchmarkTools/src/execution.jl:570 var"##sample#280"(::Tuple{SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, …
╎ ╎ ╎ ╎ ╎ ╎ 26 @BenchmarkTools/src/execution.jl:561 var"##core#279"(bv2#277::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, …
2╎ ╎ ╎ ╎ ╎ ╎ 2 REPL[1]:0 copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat…
╎ ╎ ╎ ╎ ╎ ╎ 15 REPL[1]:7 copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat…
╎ ╎ ╎ ╎ ╎ ╎ 5 @Base/abstractarray.jl:1341 getindex
╎ ╎ ╎ ╎ ╎ ╎ 5 @Base/abstractarray.jl:1387 _getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 5 @Base/subarray.jl:316 getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 5 @Base/array.jl:928 getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/abstractarray.jl:1376 _to_linear_index
╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/abstractarray.jl:3077 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/abstractarray.jl:3093 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/abstractarray.jl:3109 _sub2ind_recurse
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/abstractarray.jl:3109 _sub2ind_recurse
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/abstractarray.jl:3116 offsetin
2╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/int.jl:86 -
2╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/int.jl:87 +
1╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/essentials.jl:910 getindex
╎ ╎ ╎ ╎ ╎ ╎ 10 @Base/abstractarray.jl:1442 setindex!
╎ ╎ ╎ ╎ ╎ ╎ 10 @Base/abstractarray.jl:1472 _setindex!
╎ ╎ ╎ ╎ ╎ ╎ ╎ 10 @Base/subarray.jl:386 setindex!
4╎ ╎ ╎ ╎ ╎ ╎ ╎ 7 @Base/array.jl:992 setindex!
╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/abstractarray.jl:1376 _to_linear_index
╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/abstractarray.jl:3077 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/abstractarray.jl:3093 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/abstractarray.jl:3109 _sub2ind_recurse
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/abstractarray.jl:3109 _sub2ind_recurse
1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/int.jl:88 *
1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/int.jl:87 +
╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/subarray.jl:294 reindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/views.jl:150 maybeview
╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/array.jl:3082 getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 @Base/range.jl:945 _getindex
2╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/int.jl:87 +
1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/int.jl:86 -
3╎ ╎ ╎ ╎ ╎ ╎ 9 REPL[1]:8 copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat…
╎ ╎ ╎ ╎ ╎ ╎ 6 @Base/multidimensional.jl:423 iterate
4╎ ╎ ╎ ╎ ╎ ╎ 6 @Base/multidimensional.jl:448 __inc
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/operators.jl:321 !=
2╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/promotion.jl:639 ==
2╎ ╎ ╎ ╎ ╎ 4736 @BenchmarkTools/src/execution.jl:125 _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, warmup::Bool, kwargs::Ba…
╎ ╎ ╎ ╎ ╎ ╎ 4734 @BenchmarkTools/src/execution.jl:570 var"##sample#280"(::Tuple{SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, …
╎ ╎ ╎ ╎ ╎ ╎ 4734 @BenchmarkTools/src/execution.jl:561 var"##core#279"(bv2#277::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, …
96╎ ╎ ╎ ╎ ╎ ╎ 96 REPL[1]:0 copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat…
╎ ╎ ╎ ╎ ╎ ╎ 2730 REPL[1]:7 copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat…
╎ ╎ ╎ ╎ ╎ ╎ 622 @Base/abstractarray.jl:1341 getindex
╎ ╎ ╎ ╎ ╎ ╎ 622 @Base/abstractarray.jl:1387 _getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 622 @Base/subarray.jl:316 getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 622 @Base/array.jl:928 getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 418 @Base/abstractarray.jl:1376 _to_linear_index
╎ ╎ ╎ ╎ ╎ ╎ ╎ 418 @Base/abstractarray.jl:3077 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ 418 @Base/abstractarray.jl:3093 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 418 @Base/abstractarray.jl:3109 _sub2ind_recurse
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 418 @Base/abstractarray.jl:3109 _sub2ind_recurse
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 155 @Base/abstractarray.jl:3116 offsetin
155╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 155 @Base/int.jl:86 -
110╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 110 @Base/int.jl:88 *
153╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 153 @Base/int.jl:87 +
204╎ ╎ ╎ ╎ ╎ ╎ ╎ 204 @Base/essentials.jl:910 getindex
╎ ╎ ╎ ╎ ╎ ╎ 2108 @Base/abstractarray.jl:1442 setindex!
╎ ╎ ╎ ╎ ╎ ╎ 2108 @Base/abstractarray.jl:1472 _setindex!
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2108 @Base/subarray.jl:386 setindex!
914╎ ╎ ╎ ╎ ╎ ╎ ╎ 1859 @Base/array.jl:992 setindex!
╎ ╎ ╎ ╎ ╎ ╎ ╎ 945 @Base/abstractarray.jl:1376 _to_linear_index
╎ ╎ ╎ ╎ ╎ ╎ ╎ 945 @Base/abstractarray.jl:3077 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ 945 @Base/abstractarray.jl:3093 _sub2ind
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 945 @Base/abstractarray.jl:3109 _sub2ind_recurse
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 945 @Base/abstractarray.jl:3109 _sub2ind_recurse
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 469 @Base/abstractarray.jl:3116 offsetin
469╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 469 @Base/int.jl:86 -
204╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 204 @Base/int.jl:88 *
272╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 272 @Base/int.jl:87 +
╎ ╎ ╎ ╎ ╎ ╎ ╎ 249 @Base/subarray.jl:294 reindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 249 @Base/views.jl:150 maybeview
╎ ╎ ╎ ╎ ╎ ╎ ╎ 249 @Base/array.jl:3082 getindex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 249 @Base/range.jl:945 _getindex
150╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 150 @Base/int.jl:87 +
99╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 99 @Base/int.jl:86 -
290╎ ╎ ╎ ╎ ╎ ╎ 1908 REPL[1]:8 copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat…
╎ ╎ ╎ ╎ ╎ ╎ 1618 @Base/multidimensional.jl:423 iterate
╎ ╎ ╎ ╎ ╎ ╎ 95 @Base/multidimensional.jl:447 __inc
95╎ ╎ ╎ ╎ ╎ ╎ ╎ 95 @Base/int.jl:87 +
499╎ ╎ ╎ ╎ ╎ ╎ 1523 @Base/multidimensional.jl:448 __inc
╎ ╎ ╎ ╎ ╎ ╎ ╎ 1024 @Base/operators.jl:321 !=
1024╎ ╎ ╎ ╎ ╎ ╎ ╎ 1024 @Base/promotion.jl:639 ==
Total snapshots: 4762. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task. The |
This performance difference appears to arise from a lack of vectorization in indexing. In the first case, the output of
contains
whereas
contains
In particular, if I add a
@simd
declaration, this appears to improve performance considerably:Versioninfo:
The text was updated successfully, but these errors were encountered: