Move bounds checks on `copyto!(dst, n, src)` #43517

mcabbott · 2021-12-22T02:24:57Z

This speeds up @btime copyto!($(rand(3,10)), 7, (1.0, 2.0, 3.0)); from 2.708 ns to 1.375 ns.

By adding @inline and @boundscheck, we can get @btime @inbounds copyto!($(rand(3,10)), 7, (1.0, 2.0, 3.0)); down to 0.875 ns. But this didn't seem to improve anything when used within a loop in #43334, so maybe that's not necessary.

vtjnash

sounds good to me

N5N3 · 2021-12-22T03:14:05Z

base/abstractarray.jl

+    if haslength(src)
+        checkbounds(dest, i)
+        checkbounds(dest, i + length(src) - 1)
+        for x in src


How about replace this with

I = eachindex(dest)[i] @inbounds for x in src dest[I] = x I = nextind(dest, I) end

Might be faster for IndexCartesian cases.

And it seems reasonable to improve copyto!(dest::AbstractArray, src) (L893) in this PR.

Using eachindex(dest)[i] doesn't seem to help, on things I tried.

But I agree that the whole-array method just above has room for comparable improvement.

I could see some difference in the following example:

using BenchmarkTools f(dest, i, src) = begin I = eachindex(dest)[i] @inbounds for x in src dest[I] = x I = nextind(dest, I) end end g(dest, i, src) = begin @inbounds for x in src dest[i] = x i += 1 end end a = view(randn(100,100),1:100,1:100) @btime f($a, 555, $(i + 1 for i in 1:1000)) # 2.144 μs (0 allocations: 0 bytes) @btime g($a, 555, $(i + 1 for i in 1:1000)) # 2.956 μs (0 allocations: 0 bytes)

For longer src or higher dimension, the gain might be bigger?

Just when I thought I'd convinced myself... these give me:

julia> @btime f($a, 555, $(i + 1 for i in 1:1000)) min 2.718 μs, mean 2.759 μs (0 allocations) julia> @btime g($a, 555, $(i + 1 for i in 1:1000)) min 804.804 ns, mean 810.625 ns (0 allocations)

This does not affect the 2-arg method?

julia> @btime Base.copyto!($a, 555, $(i + 1 for i in 1:1000)); min 973.938 ns, mean 985.105 ns (0 allocations) julia> @btime _copyto!($a, 555, $(i + 1 for i in 1:1000)); # first commit of PR min 806.769 ns, mean 811.212 ns (0 allocations) julia> @btime _copyto!($a, 555, $(i + 1 for i in 1:1000)); # PR with eaaefb1 min 2.741 μs, mean 2.786 μs (0 allocations) julia> @btime Base.copyto!($a, $(i + 1 for i in 1:length(a))); # 2-arg method min 12.875 μs, mean 13.022 μs (0 allocations) julia> @btime _copyto!($a, $(i + 1 for i in 1:length(a))); # PR with 68e3d5e min 6.933 μs, mean 7.023 μs (0 allocations)

As for 2-arg version on M1 native, master.
I think the problem is firstindex always return 1 if ndims != 1. Use first(eachindex(dest)) should make things consistent.

On the other hand, just notice that nextind(A, ind::Base.SCartesianIndex2) is not defined, so dest can't be reinterpret(reshape, args...)...
Not sure is it ok to add related definition in this PR? Or just use eachindex(IndexStyle(dest) isa IndexLinear ? IndexLinear() : IndexCartesian(), dest)

Ok, good point re firstindex. So that's just handling offsets right now.

On the xeon, "g" is still an improvement on Base.copyto!, even if slower than "f" there. Is that true on your computer too?

Well that's true, at least we eliminated the boundscheck within the loop.
I have no M1 machine, so I can't test myself.
Would you mind to bench the following?

a = view(randn(100,100),1:100,1:100) b = view(a,1:99,1:100) f_each(x) = begin r = 0.0 @inbounds for i in eachindex(x) r += x[i] end r end f_linear(x) = begin r = 0.0 @inbounds for i in firstindex(x):lastindex(x) r += x[i] end r end @btime f_each($a) @btime f_linear($a) @btime f_each($b) @btime f_linear($b)

Sure:

julia> @btime f_each($a) min 9.250 μs, mean 9.401 μs (0 allocations) 32.23877902877461 julia> @btime f_linear($a) min 9.250 μs, mean 9.401 μs (0 allocations) 32.23877902877461 julia> @btime f_each($b) min 9.166 μs, mean 9.315 μs (0 allocations) 22.794925499363792 julia> @btime f_linear($b) min 9.166 μs, mean 9.286 μs (0 allocations) 22.794925499363792

vs xeon:

julia> @btime f_each($a) 17.736 μs (0 allocations: 0 bytes) 153.39744409371883 julia> @btime f_linear($a) 82.459 μs (0 allocations: 0 bytes) 153.39744409371883 julia> @btime f_each($b) 17.586 μs (0 allocations: 0 bytes) 153.27406012213328 julia> @btime f_linear($b) 81.627 μs (0 allocations: 0 bytes) 153.27406012213328

On my machine, the result is

@btime f_each($a) # 9.100 μs (0 allocations: 0 bytes) @btime f_linear($a) # 22.700 μs (0 allocations: 0 bytes) @btime f_each($b) # 9.000 μs (0 allocations: 0 bytes) @btime f_linear($b) # 22.500 μs (0 allocations: 0 bytes)

Is this some "dark magic" of M1? (Maybe we don't need IndexCartesian on M1?)

Something might related: M1 10x faster than Intel at integral division, throughput one 64-bit divide in two cycles
If this is true, I guess reshape would be faster if we omit the current optimization via MultiplicativeInverse

JeffBezanson · 2022-01-04T21:46:19Z

base/abstractarray.jl

            throw(ArgumentError("destination has fewer elements than required"))
-        dest[y[1]] = x
-        y = iterate(destiter, y[2])
+        i = Int(firstindex(dest))


It doesn't seem right to me to switch from eachindex to firstindex just because src has a length.

Also, in the past we have avoided annotating inbounds in generic methods like this.

Ok, I will fiddle a bit more. At the moment, removing @inbounds removes the whole speed advantage of this method. But perhaps there's a smarter way.

And, I'm think the PR does not handle length zero correctly right now. It breaks these, which work on master, but don't seem to have tests:

julia> firstindex(Int[]) 1 julia> copyto!(Int[], ()) Int64[] julia> copyto!(Int[], 1, ()) Int64[]

I've removed this 2-arg method, as on the cases I was testing, I can't see a way to speed it up without using linear indexing & @inbounds.

I've also added tests for these empty cases, and fixed the 3-arg method to pass them.

mcabbott · 2022-02-09T14:17:42Z

Pre-1.8 bump?

oscardssmith · 2022-02-09T15:49:48Z

What's the status on this? Is it ready to merge?

N5N3 · 2022-02-09T16:00:19Z

@JeffBezanson posted some concern on inbounds annotation.

Also, in the past we have avoided annotating inbounds in generic methods like this.

Although that part change has been reverted, I'm not sure is it OK to him for the @inbounds in copyto!(dest::AbstractArray, dstart::Integer, src).
Also it's a little strange for me copyto!(a, src) is slower than copyto!(a, 1, src)

mcabbott · 2022-02-09T16:03:02Z

Yes that's accurate. I reverted to the smallest initial change, thus this only speeds up copyto!(a, 1, src), which is the case #43334 wanted.

This reverts commit eaaefb1.

vtjnash approved these changes Dec 22, 2021

View reviewed changes

N5N3 reviewed Dec 22, 2021

View reviewed changes

mcabbott changed the title ~~Move bounds checks on copyto!(dst, n, src)~~ Move bounds checks on copyto!(dst, n, src) and copyto!(dest, src) Dec 22, 2021

JeffBezanson reviewed Jan 4, 2022

View reviewed changes

mcabbott force-pushed the copyto branch from 867f012 to 2226a01 Compare January 6, 2022 03:51

mcabbott changed the title ~~Move bounds checks on copyto!(dst, n, src) and copyto!(dest, src)~~ Move bounds checks on copyto!(dst, n, src) Jan 6, 2022

mcabbott force-pushed the copyto branch from 2226a01 to 0c8eaa8 Compare January 6, 2022 03:57

mcabbott force-pushed the copyto branch from 0c8eaa8 to 1b0140a Compare January 13, 2022 17:25

mcabbott force-pushed the copyto branch from 1b0140a to 1967b07 Compare February 9, 2022 14:17

mcabbott force-pushed the copyto branch from 1967b07 to 5563a1e Compare February 20, 2022 15:42

mcabbott added 7 commits March 10, 2022 09:48

faster copyto

b50992f

update to N5N3's suggestion

8597c9a

method for copyto!(dest, src) too

559bcb6

Revert "update to N5N3's suggestion"

05b055d

This reverts commit eaaefb1.

replace nextind with i+1

58ad472

revert copyto!(dest, src)

7131982

fix an empty case, add tests

1d0bfdb

mcabbott force-pushed the copyto branch from 5563a1e to 1d0bfdb Compare March 10, 2022 14:48

KristofferC added the merge me PR is reviewed. Merge when all tests are passing label May 10, 2022

Merge branch 'master' into copyto

79de34d

KristofferC merged commit eb938da into JuliaLang:master May 10, 2022

mcabbott deleted the copyto branch May 10, 2022 15:53

giordano removed the merge me PR is reviewed. Merge when all tests are passing label May 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move bounds checks on `copyto!(dst, n, src)` #43517

Move bounds checks on `copyto!(dst, n, src)` #43517

mcabbott commented Dec 22, 2021

vtjnash left a comment

N5N3 Dec 22, 2021 •

edited

Loading

N5N3 Dec 22, 2021

mcabbott Dec 22, 2021

N5N3 Dec 22, 2021 •

edited

Loading

mcabbott Dec 22, 2021 •

edited

Loading

N5N3 Dec 22, 2021 •

edited

Loading

mcabbott Dec 22, 2021

N5N3 Dec 22, 2021 •

edited

Loading

mcabbott Dec 22, 2021 •

edited

Loading

N5N3 Dec 22, 2021 •

edited

Loading

JeffBezanson Jan 4, 2022

mcabbott Jan 4, 2022

mcabbott Jan 4, 2022

mcabbott Jan 6, 2022

mcabbott commented Feb 9, 2022

oscardssmith commented Feb 9, 2022

N5N3 commented Feb 9, 2022 •

edited

Loading

mcabbott commented Feb 9, 2022

Move bounds checks on copyto!(dst, n, src) #43517

Move bounds checks on copyto!(dst, n, src) #43517

Conversation

mcabbott commented Dec 22, 2021

vtjnash left a comment

Choose a reason for hiding this comment

N5N3 Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

N5N3 Dec 22, 2021

Choose a reason for hiding this comment

mcabbott Dec 22, 2021

Choose a reason for hiding this comment

N5N3 Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

mcabbott Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

N5N3 Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

mcabbott Dec 22, 2021

Choose a reason for hiding this comment

N5N3 Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

mcabbott Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

N5N3 Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

JeffBezanson Jan 4, 2022

Choose a reason for hiding this comment

mcabbott Jan 4, 2022

Choose a reason for hiding this comment

mcabbott Jan 4, 2022

Choose a reason for hiding this comment

mcabbott Jan 6, 2022

Choose a reason for hiding this comment

mcabbott commented Feb 9, 2022

oscardssmith commented Feb 9, 2022

N5N3 commented Feb 9, 2022 • edited Loading

mcabbott commented Feb 9, 2022

Move bounds checks on `copyto!(dst, n, src)` #43517

Move bounds checks on `copyto!(dst, n, src)` #43517

N5N3 Dec 22, 2021 •

edited

Loading

N5N3 Dec 22, 2021 •

edited

Loading

mcabbott Dec 22, 2021 •

edited

Loading

N5N3 Dec 22, 2021 •

edited

Loading

N5N3 Dec 22, 2021 •

edited

Loading

mcabbott Dec 22, 2021 •

edited

Loading

N5N3 Dec 22, 2021 •

edited

Loading

N5N3 commented Feb 9, 2022 •

edited

Loading