Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve construction and view performance #224

Merged
merged 4 commits into from
May 1, 2022
Merged

improve construction and view performance #224

merged 4 commits into from
May 1, 2022

Conversation

aplavin
Copy link
Member

@aplavin aplavin commented Apr 30, 2022

Make the checks type-stable, and only checkbounds once - not for each component.

Speedup example:

julia> sa = StructArray(a=rand(1:10, 10^6), b=rand(10^6), c=["abc" for _ in 1:10^6], d=[:abc for _ in 1:10^6])

# before
julia> @btime view($sa, $([5, 10, 100]))
  195.191 ns (9 allocations: 624 bytes)

# after first commit
julia> @btime view($sa, $([5, 10, 100]))
  32.490 ns (0 allocations: 0 bytes)

# after second commit
julia> @btime view($sa, $([5, 10, 100]))
  20.309 ns (0 allocations: 0 bytes)

# still significantly slower than plain Vector, but much closer than before:
julia> @btime view($(collect(sa)), $([5, 10, 100]))
  7.353 ns (0 allocations: 0 bytes)

@aplavin aplavin changed the title improve construction performance improve construction and view performance Apr 30, 2022
@piever
Copy link
Collaborator

piever commented May 1, 2022

Thanks, this is a nice change, I've left a couple of small comments.

As a general comment (though it should derail this PR, which is a nice performance improvement) one thing to consider is whether the view. overload is problematic. Maybe one should just have view(s::StructArray, I...) return a SubArray{T, N, StructArray} object. The order of double wrappers keeps confusing me (as in #218 (comment)) and is not entirely consistent at the moment.

Overall, it feels as though wrapping each component is generally preferable, as it makes it easier to access the component vectors. Unfortunately, it is also the trickier to implement in general (esp. for similar and reshape with arbitrary ranges).

Do you have a strong opinion either way on this? Would one way work better than the other for your use case?

@aplavin
Copy link
Member Author

aplavin commented May 1, 2022

Do you have a strong opinion either way on this? Would one way work better than the other for your use case?

I strongly prefer the current situation, when each component is viewed, and they are wrapped into a top-level StructArray. As you say, it keeps the component access, which is quite an important StructArrays feature.

Would also be best for StructArray + OffsetArray to have the same order of wrappers, but that doesn't seem easy according to your comments in #218. Again, the motivation is that such an order gives both - offset indexing and component access - while the opposite order loses the latter.

@piever
Copy link
Collaborator

piever commented May 1, 2022

Perfect, thanks again for the nice PR!

@piever piever merged commit 750cbfd into JuliaArrays:master May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants