Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify indexing #10

Merged
merged 11 commits into from
Jan 3, 2021
Merged

Simplify indexing #10

merged 11 commits into from
Jan 3, 2021

Conversation

mcabbott
Copy link

Before:

julia> ca = CircularArray(rand(Int8, 10,10));

julia> @btime [$ca[i,j] for i in 1:21, j in 1:21];
  52.352 μs (886 allocations: 48.91 KiB)

After:

julia> @btime [$ca[i,j] for i in 1:21, j in 1:21]
  4.009 μs (1 allocation: 544 bytes)
21×21 Matrix{Int8}:
 -69  -22   -83  -28    -7   -20   106  -119  …  -28    -7   -20   106  -119    59  -111  -69
  25  -72  -123  118    36   -54   -81   118     118    36   -54   -81   118   -90   -77   25
...

@codecov
Copy link

codecov bot commented Nov 16, 2020

Codecov Report

Merging #10 (a4d79af) into master (fe49430) will decrease coverage by 29.62%.
The diff coverage is 46.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##            master      #10       +/-   ##
============================================
- Coverage   100.00%   70.37%   -29.63%     
============================================
  Files            1        1               
  Lines           18       27        +9     
============================================
+ Hits            18       19        +1     
- Misses           0        8        +8     
Flag Coverage Δ
unittests 70.37% <46.66%> (-29.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/CircularArrays.jl 70.37% <46.66%> (-29.63%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe49430...a4d79af. Read the comment docs.

@mcabbott
Copy link
Author

One behaviour changed by this PR is:

julia> a1 = CircularArray(b_arr)
3×4 CircularArray{Int64, 2, Matrix{Int64}}:
  2   4   6   8
 10  12  14  16
 18  20  22  24

julia> a1[3]
18

julia> a1[4] # was == 2
4

So linear indexing will cover the whole matrix, not just the first column. It still has IndexStyle(a1) == IndexCartesian() but could inherit that from the parent, too.

@Vexatos
Copy link
Owner

Vexatos commented Nov 17, 2020

Thanks for the pull request! Here are some performance results from me.

Before:

julia> ca = CircularArray(rand(Int8, 10, 10));

julia> @btime [$ca[i,j] for i in 1:21, j in 1:21];
  43.500 μs (883 allocations: 48.77 KiB)

julia> @btime [$ca[i] for i in 1:(21*21)];
  2.125 μs (1 allocation: 544 bytes)

julia> ca = CircularArray(rand(Int8, 100));

julia> @btime [$ca[i] for i in 1:(21*21)];
  2.122 μs (1 allocation: 544 bytes)

After:

julia> ca = CircularArray(rand(Int8, 10, 10));

julia> @btime [$ca[i,j] for i in 1:21, j in 1:21];
  4.855 μs (1 allocation: 544 bytes)

julia> @btime [$ca[i] for i in 1:(21*21)];
  5.965 μs (1 allocation: 544 bytes)

julia> ca = CircularArray(rand(Int8, 100));

julia> @btime [$ca[i] for i in 1:(21*21)];
  2.124 μs (1 allocation: 544 bytes)

Linear indexing of linear arrays didn't change, 2D-indexing of matrices got a lot faster just as you reported and no longer does a lot of allocations, but linear indexing of matrices has gotten slower. All these results can be reproduced for larger arrays (200×200 elements accessed) and more dimensions. Linear indexing of a higher-dimensional array is consistently about 3 times slower with the PR than without for 2D arrays, 4 times slower with 3D arrays and 6 times slower with 4D arrays.

Before:

julia> ca = CircularArray(rand(Int8, 4, 4, 4));

julia> @btime [$ca[i] for i in 1:(21*21)];
  2.010 μs (1 allocation: 544 bytes)

julia> ca = CircularArray(rand(Int8, 3, 3, 3, 3));

julia> @btime [$ca[i] for i in 1:(21*21)];
  2.123 μs (1 allocation: 544 bytes)

After:

julia> ca = CircularArray(rand(Int8, 4, 4, 4));

julia> @btime [$ca[i] for i in 1:(21*21)];
  10.064 μs (1 allocation: 544 bytes)

julia> ca = CircularArray(rand(Int8, 3, 3, 3, 3));

julia> @btime [$ca[i] for i in 1:(21*21)];
  13.392 μs (1 allocation: 544 bytes)

Linear indexing of the array has so far been undocumented behaviour, so changing the behaviour doesn't seem to be much of a problem to me, I agree with the new behaviour a lot more. However, the slowdown getting worse the more dimensions the array has is worrying. That should probably be addressed.

Adding Base.parent is a good idea, thanks for that.

@Vexatos
Copy link
Owner

Vexatos commented Nov 17, 2020

If I had to guess, this slowdown may be caused by the linear indexing special case being removed, making linear and multi-dimensional indexing use the same method. In the old version, the special case for linear indexing did not use a splatting operator ..., which from my experience is very slow and might be the source of this slowdown during linear indexing.

Furthermore, I discovered that the arrays no longer error on indexing with bad dimensions:

Before:

julia> ca = CircularArray(rand(Int8, 3, 3, 3, 3));

julia> ca[1, 2]
ERROR: BoundsError: attempt to access 3×3×3×3 Array{Int8,4} at index [1, 2]
Stacktrace:
 [1] getindex(::Array{Int8,4}, ::Int64, ::Int64) at ./array.jl:810
 [2] getindex(::CircularArray{Int8,4,Array{Int8,4}}, ::Int64, ::Int64) at /[]/src/CircularArrays.jl:38
 [3] top-level scope at REPL[6]:1

After:

julia> ca = CircularArray(rand(Int8, 3, 3, 3, 3));

julia> ca[1, 2]
75						# equal to ca[1, 2, 1, 1]

@mcabbott
Copy link
Author

I'm not sure we should call the change in time for linear indexing of a matrix a slowdown, as it's doing a different operation. Instead of accessing the first column repeatedly, it reads out all the data, and either your calculation wants one or it wants the other. But I am not surprised that reading all the data takes longer, I'd point to thinks fitting in registers or cache or something like that.

And, to go from Cartesian to Linear indices is just multiplication, but the reverse involves div which is slower. And there's more diving in higher dims. For example, compare an Array to a trivial PermutedDimsArray just to fool linear indexing:

julia> quad = randn(10,10,10,10);

julia> IndexStyle(quad)
IndexLinear()

julia> @btime [$quad[i] for i in 1:10^4];
  7.216 μs (2 allocations: 78.20 KiB)

julia> pquad = PermutedDimsArray(quad, (1,2,3,4));

julia> IndexStyle(pquad)
IndexCartesian()

julia> @btime [$pquad[i] for i in 1:10^4];
  22.557 μs (2 allocations: 78.20 KiB)

The old behaviour didn't pay this cost, because it fixed all indices but the leftmost one to 1.

What is still weird about linear indexing is that the algorithm from Base it's using to convert to Cartesian seems not to behave well with negative numbers.

julia> a1
3×4 CircularArray{Int64, 2, Matrix{Int64}}:
  2   4   6   8
 10  12  14  16
 18  20  22  24

julia> a1[-2:9]' # after this PR
1×12 LinearAlgebra.Adjoint{Int64, CircularVector{Int64, Vector{Int64}}}:
 8  10  18  2  10  18  4  12  20  6  14  22

julia> a1[-2], a1[9]
(8, 22)

julia> CircularArray(1:3)[-2:6]'  # with a vector, no conversion to cartesian, no problem:
1×9 LinearAlgebra.Adjoint{Int64, CircularVector{Int64, Vector{Int64}}}:
 1  2  3  1  2  3  1  2  3

Before was even weirder, I think:

julia> a1[-2:9]' 
1×12 LinearAlgebra.Adjoint{Int64, CircularVector{Int64, Vector{Int64}}}:
 8  10  18  2  10  18  4  12  20  6  14  22

julia> a1[-2], a1[9]  # doesn't match
(2, 18)

@mcabbott
Copy link
Author

mcabbott commented Nov 17, 2020

Furthermore, I discovered that the arrays no longer error on indexing with bad dimensions:

Oh I didn't see that variant. I saw this one, and added a test:

julia> a1[2,3,99]
14

julia> size(a1, 3)
1

julia> mod(99, 1:1)
1

julia> a1[2,3,1]
14

I'm not sure how weird this is, given that ordinary arrays do handle things like this:

julia> cquad[3,4] == quad[3,4,1,1] == quad[3,4,1,1,1,1,1,1]
true

But if you like I'll restore the error.

@mcabbott
Copy link
Author

But when the underlying array supports fast linear indexing, then there's no need to go via Cartesian. So the latest commit speeds this up:

julia> @btime [$ca[i,j] for i in 1:21, j in 1:21];
  4.015 μs (1 allocation: 544 bytes)

julia> @btime [$ca[i] for i in 1:(21*21)];
  1.503 μs (1 allocation: 544 bytes)

and avoids some weirdness:

julia> a1
3×4 CircularArray{Int64, 2, Matrix{Int64}}:
  2   4   6   8
 10  12  14  16
 18  20  22  24

julia> a1[-2:9]'
1×12 LinearAlgebra.Adjoint{Int64, CircularVector{Int64, Vector{Int64}}}:
 8  16  24  2  10  18  4  12  20  6  14  22

julia> a1[-2], a1[9]
(8, 22)

and adds a bounds check to restore the error on too few indices.

But breaks one offset array test, not sure why yet.

@Vexatos
Copy link
Owner

Vexatos commented Nov 17, 2020

I'm not sure we should call the change in time for linear indexing of a matrix a slowdown, as it's doing a different operation.

Of course, I didn't think about that. However, re-adding the specialized method did fix it.

I'm not sure how weird this is, given that ordinary arrays do handle things like this

It pretty much just delegated the getindex call to the inner array, if you look at the error trace it mentions the Base.getindex in array.jl being the one that errors, as normal arrays don't support non-matching indices. This behaviour should probably be kept for consistency.

julia> a = rand(2, 2, 2);

julia> a[1, 2]
ERROR: BoundsError: attempt to access 2×2×2 Array{Float64,3} at index [1, 2]
Stacktrace:
 [1] getindex(::Array{Float64,3}, ::Int64, ::Int64) at ./array.jl:810
 [2] top-level scope at REPL[18]:1

So the latest commit speeds this up

Thank you for adding the special cases back! The slowdown is indeed resolved. Although, you should not use the Integer abstract type directly as julia will fail to specialize the methods like this, causing unnecessary slowdown. The best options would be to either use i::Int or i::I where I<:Integer. Furthermore, isn't eachindex(LinearIndices(arr.data)) always the same as eachindex(arr.data), as they have the same shape?

src/CircularArrays.jl Outdated Show resolved Hide resolved
@mcabbott
Copy link
Author

Although, you should not use the Integer abstract type directly as julia will fail to specialize the methods like this, causing unnecessary slowdown.

I don't think this is true, it should specialise here. The two weird cases are ::Function and ::Type. I think that most of Base's indexing uses ::Integer, so I try to follow unless there's a reason not to.

isn't eachindex(LinearIndices(arr.data)) always the same as eachindex(arr.data), as they have the same shape?

I was concerned about things like eachindex(rand(2,2)') which gives CartesianIndices. But this may be too complicated, eachindex(LinearIndices(OffsetArray(rand(3,3), 5,10))) is still one-based, so perhaps it's enough to use length(arr.data) here?

@Vexatos
Copy link
Owner

Vexatos commented Nov 17, 2020

I think that most of Base's indexing uses ::Integer, so I try to follow unless there's a reason not to.

In the past, I have had bad experiences with using abstract types in method signatures unless absolutely necessary, making julia unable to specialize the method for subtypes. getindex in Base seems to be often using Int instead, but not sure that's a better idea.

@mcabbott
Copy link
Author

Base is confusing... there are indeed more ::Int methods than I remember, sorry. Things like rand(3,3)[1,Int32(2)] do work, but via a slightly longer path. Maybe the same path would apply here if these methods were narrower. Nevertheless I am pretty sure that this should specialise; @code_typed a1[1,2] certainly says it does (although is not quite infallible).

@Vexatos
Copy link
Owner

Vexatos commented Nov 18, 2020

Thank you for the adjustments. I guess what's left is to figure out why it breaks with CartesianIndex. I already confirmed it has nothing to do with OffsetArrays.

julia> a = CircularArray(reshape(1:9,3,3))
3×3 CircularArray(reshape(::UnitRange{Int64}, 3, 3)):
 1  4  7
 2  5  8
 3  6  9

julia> c = CartesianIndex.(3:5, 3:5)
3-element Array{CartesianIndex{2},1}:
 CartesianIndex(3, 3)
 CartesianIndex(4, 4)
 CartesianIndex(5, 5)

julia> a[3,3]
9

julia> a[4,4]
1

julia> a[5,5]
5

julia> a[c]
3-element CircularVector(::Array{Int64,1}):
 9
 4
 8

@mcabbott
Copy link
Author

That's progress. Via @less, here's where it's going wrong:

julia> a[CartesianIndex(4,4)] # wrong
4

julia> Base.to_indices(a, (CartesianIndex(4,4),))
(4, 4)

julia> Base._getindex(IndexLinear(), a, (4,4)...)
4

julia> Base._getindex(IndexCartesian(), a, (4,4)...) # right
1

julia> Base._to_linear_index(a, 4, 4)
13

julia> a[13]
4

I suspect that to_indices is the right place to fix this.

@Vexatos
Copy link
Owner

Vexatos commented Nov 18, 2020

I don't think having to implement Base._getindex can possibly be the right solution to this. It's an internal and undocumented function and probably shouldn't be implemented.

@mcabbott
Copy link
Author

mcabbott commented Nov 18, 2020

Yes I don't like it at all.

What I expected to work was overloading to_indices(a, axes(a), (CartesianIndex(3,3), rest...)) which is I think an officially sanctioned way of messing with the internals... it peels off indices one by one so you can dispatch on the first element of the tuple, and that always corresponds to the first (first two, here) elements of the axes tuple.

But in fact this case goes to to_indices(a, (), (Cartesian...,)) with an empty axes tuple, as an optimisation because it was thought CartesianIndices would never need further information. I'm not sure what the right way around that is. Maybe it can be prevented from going down that path.

@mcabbott
Copy link
Author

I thought I saw a way, but now I invented some more evil test cases and it fails.

@@ -50,6 +69,14 @@ CircularArray(def::T, size) where T = CircularArray(fill(def, size))
CircularVector(data::AbstractArray{T, 1}) where T = CircularVector{T}(data)
CircularVector(def::T, size::Int) where T = CircularVector{T}(fill(def, size))

Base.IndexStyle(::Type{CircularArray{T,N,A}}) where {T,N,A} = IndexStyle(A)
Copy link
Owner

@Vexatos Vexatos Nov 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question came to me why this line would be needed. Wouldn't the issues with CartesianIndex be resolved if the IndexStyle defaulted to IndexCartesian for anyhting but CircularVector? That would result in the calls with CartesianIndex parameters to run through the second implementation of getindex instead of the first. Then all the hacks into to_indices could also be removed. I would be very interested to see your "more evil test cases" and see if they work with this, because all the current tests pass like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was necessary for speed, although it's entirely possible I have got lost in all the versions now. It causes eachindex(A) to prefer linear indexing, which can drop straight through to the underlying memory, whereas iterating CartesianIndices involves some multiplications by strides... necessary when the underlying array is transposed, say.

I've pushed what I had lying around.

@Vexatos
Copy link
Owner

Vexatos commented Nov 22, 2020

With my changes, all tests appear to pass, even your new ones, and I cannot find any noticeable slowdown in any of the benchmarks listed in this issue thread, at least with standard arrays. Perhaps you can try the proposed changes, if something turns out to have broken I will revert them.

@Vexatos
Copy link
Owner

Vexatos commented Jan 3, 2021

Since I heard no more feedback on this, I'll merge it. It seems to be working fine.

@Vexatos Vexatos merged commit 12b733b into Vexatos:master Jan 3, 2021
@mcabbott
Copy link
Author

mcabbott commented Jan 3, 2021

Sorry about vanishing. I forgot all the details now but if it passes those tests it must be working pretty well!

@mcabbott mcabbott deleted the getindex branch January 3, 2021 17:29
@Vexatos Vexatos mentioned this pull request Feb 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants