-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specialize copyto!
for performance
#23
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #23 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 1 1
Lines 33 57 +24
=========================================
+ Hits 33 57 +24 ☔ View full report in Codecov by Sentry. |
With this change: julia> using FixedSizeArrays, BenchmarkTools
julia> @benchmark z .= v setup=(v=convert(FixedSizeVector,randn(Float64, 1024*1024)); z=similar(v);)
BenchmarkTools.Trial: 491 samples with 1 evaluation.
Range (min … max): 1.344 ms … 4.687 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.687 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.317 ms ± 684.430 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▇ ▁ ▂▂
▂▃▃▄███▄▄▅▅▃▃▃▂▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄█▇▆████▆▄▅▃▆▆▃▄▄▃▁▃▁▂▂ ▃
1.34 ms Histogram: frequency by time 3.43 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark z .= v setup=(v=randn(Float64, 1024*1024); z=similar(v);)
BenchmarkTools.Trial: 793 samples with 1 evaluation.
Range (min … max): 1.521 ms … 3.384 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.795 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.862 ms ± 271.172 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▃▇██▆▄▄▃▃▂
▆▄▁▅██████████████▆▆▇▄▅▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▄▄▁▅▁▇█▆▆▆ █
1.52 ms Histogram: log(frequency) by time 3.1 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.12.0-DEV.387
Commit b5bfd83a3d0 (2024-04-22 13:12 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores) |
The fallback generic implementation of `copyto!` uses `setindex!` in an elementwise manner. Instead base the implementation on `unsafe_copyto!`. This improves the performance of `copyto!`, thus it also improves the performance of `copy!`, `copy` and of broadcasted operations. This change only applies when copying between two `FixedSizeArray`s, or between a `Memory` and a `FixedSizeArray`. Fixes #19
TBH the tests would ideally be more comprehensive, even though they should now (with the force push) have full source line coverage. |
BTW, the code in this PR is adapted from base/array.jl in the Julia source. From around here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, for small arrays the broacast operations are pretty much equivalent:
julia> @benchmark z .= v setup=(v = randn(Float64, 1024); z = similar(v))
BenchmarkTools.Trial: 10000 samples with 943 evaluations.
Range (min … max): 100.919 ns … 158.051 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 101.670 ns ┊ GC (median): 0.00%
Time (mean ± σ): 102.948 ns ± 3.040 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
███▅▅▅▄▃▄▄▂▁▂▂▃▂▂▁▁▂▁ ▁▁ ▂
████████████████████████▇██▇▇██████████▇▆▆▆▇▆▆▇▆▆▆▇▇▆▆▆▆▅▆▆▆▆ █
101 ns Histogram: log(frequency) by time 115 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark z .= v setup=(v = convert(FixedSizeVector, randn(Float64, 1024)); z = similar(v))
BenchmarkTools.Trial: 10000 samples with 942 evaluations.
Range (min … max): 101.025 ns … 188.960 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 101.911 ns ┊ GC (median): 0.00%
Time (mean ± σ): 103.364 ns ± 4.435 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄██▆▅▅▄▄▃▃▃▃▃▃▂▂▂▁▁ ▂
█████████████████████▇██▇█▆█▇▇▇▇▆▆▆▆▆▆▆▆▆▇▆▆▇▆▆▅▆▆▅▆▅▅▄▅▃▅▅▅▄ █
101 ns Histogram: log(frequency) by time 119 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
For larger arrays there's still some difference, also visible in your benchmarks:
julia> @benchmark z .= v setup=(v = randn(Float64, 2 ^ 20); z = similar(v))
BenchmarkTools.Trial: 1601 samples with 1 evaluation.
Range (min … max): 364.459 μs … 876.416 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 434.208 μs ┊ GC (median): 0.00%
Time (mean ± σ): 434.333 μs ± 50.884 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁█▅▄▃ ▃▁▃▂▃ ▃▄ ▇▄▁
▅████████████▇▄█▄▄▄▄▆█████████▆▆▅▅▆▅▅▃▃▃▃▂▂▂▃▂▃▃▂▃▂▂▂▂▃▁▁▂▂▁▂ ▄
364 μs Histogram: frequency by time 577 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark z .= v setup=(v = convert(FixedSizeVector, randn(Float64, 2 ^ 20)); z = similar(v))
BenchmarkTools.Trial: 1177 samples with 1 evaluation.
Range (min … max): 432.250 μs … 873.959 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 492.125 μs ┊ GC (median): 0.00%
Time (mean ± σ): 513.554 μs ± 64.203 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▄▅█▇▄▂▁
▂▃▅████████▇▇▆▆▇▇▇▇█▇▅▆▆▄▅▄▄▅▃▃▃▃▃▄▃▃▄▃▃▃▃▂▃▃▂▂▁▂▂▂▂▁▂▂▂▁▁▂▂▂ ▄
432 μs Histogram: frequency by time 749 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
Compare to current main
(dd77e40) of this package:
julia> @benchmark z .= v setup=(v = convert(FixedSizeVector, randn(Float64, 2 ^ 20)); z = similar(v))
BenchmarkTools.Trial: 1093 samples with 1 evaluation.
Range (min … max): 775.125 μs … 1.205 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 822.417 μs ┊ GC (median): 0.00%
Time (mean ± σ): 836.816 μs ± 42.380 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▆██▆▇▃▃▁▂▄▂
▂▁▁▂▃▄▆▇████████████▆▆▄▅▃▄▃▄▃▃▃▃▃▄▃▄▃▄▄▄▄▅▄▅▄▃▄▄▃▃▃▂▃▁▂▂▃▂▂▂ ▄
775 μs Histogram: frequency by time 958 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
This is definitely going in the right direction, nice work!
On my machine, the benchmarks favor julia> using FixedSizeArrays, BenchmarkTools
julia> @benchmark z .= v setup=(v = randn(Float64, 2 ^ 20); z = similar(v);)
BenchmarkTools.Trial: 801 samples with 1 evaluation.
Range (min … max): 1.555 ms … 3.311 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.793 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.955 ms ± 407.934 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█▆▄
▃▃▂▅▄▄█████▇▄▃▃▃▄▃▂▃▂▂▂▂▂▂▂▁▁▁▁▂▁▁▁▂▁▁▁▁▁▂▂▁▁▂▁▂▂▃▃▂▃▃▃▃▃▃▄ ▃
1.56 ms Histogram: frequency by time 3.04 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark z .= v setup=(v = convert(FixedSizeVector, randn(Float64, 2 ^ 20)); z = similar(v);)
BenchmarkTools.Trial: 514 samples with 1 evaluation.
Range (min … max): 1.270 ms … 3.203 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.870 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.355 ms ± 721.781 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▇▂ ▇▁ █▇
▃▃▃▄▆██▆▆▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▃▅▇▄██▄██▃▃ ▃
1.27 ms Histogram: frequency by time 3.08 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.12.0-DEV.387
Commit b5bfd83a3d0 (2024-04-22 13:12 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores) What's your versioninfo? |
|
Profiling shows that Perhaps this is just profiling noise? Perhaps it's because your Julia commit is slightly older? Not sure, just throwing around possibilities. |
Yeah, maybe there's some noise, if I use a fixed vector as input, instead of random ones, I get more stable and comparable results:
|
The fallback generic implementation of
copyto!
usessetindex!
in an elementwise manner. Instead base the implementation onunsafe_copyto!
. This improves the performance ofcopyto!
, thus it also improves the performance ofcopy!
,copy
and of broadcasted operations.This change only applies when copying between two
FixedSizeArray
s, or between aMemory
and aFixedSizeArray
.Fixes #19