-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copying small arrays with broadcasting syntax should be faster #45487
Comments
Looks like the for loop version is faster only because it's fully inlined. My Bench Codefunction floop(a, b)
@inbounds for i in eachindex(a, b)
a[i] = b[i]
end
a
end
_copyto!(a, b) = a .= b
_repeat(times, f, a, b) = for _ in 1:times
@noinline f(a, b)
end
using BenchmarkTools
ele = [1, 2, 3, 4, 5, 10, 20, 50, 63, 64, 65, 100]
times = zeros(length(ele), 3)
for n in 1:length(ele)
a = zeros(ele[n])
b = randn(ele[n])
times[n, 1] = @belapsed _repeat(100, floop, $a, $b)
times[n, 2] = @belapsed _repeat(100, copyto!, $a, $b)
times[n, 3] = @belapsed _repeat(100, _copyto!, $a, $b)
end |
Hmm... Looking at The code generated for the broadcast does have a reference to Meanwhile, the for loop version just has a regular-register copying loop. |
Here I'm comparing if I remove the
Still some overhead from the broadcast, but better scaling |
IIRC, only |
@N5N3 - going back to your original observation, if I decorate all Though I suspect there are reasons those functions aren't already marked as inlined? |
c.f. also #28126 |
The extra time seems to be spent in this line, which is new in 1.11: Line 891 in 8eaf83c
I'll make a new issue for this specific case. |
The 4 allocations vs 2 is just that the |
For loops are 2-10x faster (depending on size, destination) for copying small arrays than
copyto!
, which the broadcasting machinery relies on.It would be amazing if the broadcasting syntax could be faster for small arrays automatically.
I am aware of LoopVectorization, the intent here is 1) using the tools available in Base; 2) using the elegant broadcasting syntax; to 3) provide a reasonable default.
The text was updated successfully, but these errors were encountered: