-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add _findall for AbstractVector{Bool} and use it in internal functions #2769
Conversation
Timings look acceptable:
|
Nice, I had not time to look on it during a week. |
Yes. I have just not have had time to finish these changes (as you can see PR is failing + I have not done benchmarks). I am working on it right now. |
Tests of things changed (all timings after compilation; I mostly focus on checking if we do not regress). This PR:
|
@@ -510,6 +510,28 @@ function Base.getindex(df::DataFrame, ::typeof(!), col_ind::SymbolOrString) | |||
end | |||
|
|||
# df[MultiRowIndex, MultiColumnIndex] => DataFrame | |||
|
|||
function _threaded_getindex(selected_rows::AbstractVector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the change that is leading to a regression.
Regression is minimal (microsecond level) and only if there are very many columns. Essentially - I have removed code duplication we had in the past differentiating the case when :
is used for column selection (now :
falls back to general column selector). This leads to a slight regression, but I thought that given it is very small it is OK to accept it.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@pstorozenko - OK to merge or you want to have another look at it? |
Sure, merge it, I'm just looking around and trying to learn something :) |
@pstorozenko - thank you. Actually before merging your PR I have checked if |
I didn't know that such functions exist but after seeing them here I thought about using them in BitVector version. Good to know that you've already checked it. |
src/other/utils.jl
Outdated
|
||
# slow path returning Vector{Int} | ||
I = Vector{Int}(undef, nnzB) | ||
I[1:stop - start + 1] .= start:stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed here also to use a loop for consistency with #2771
I have added a news entry. It is not super precise, but I think we do not need to go into all details of when the performance is improved. |
I think that discussion in these two PRs is descriptive enough to understand the source of performance gain. |
src/other/utils.jl
Outdated
@@ -246,7 +248,7 @@ function _findall(B::BitVector)::Union{UnitRange{Int}, Vector{Int}} | |||
end | |||
if c == ~UInt64(0) | |||
if stop != -1 | |||
I = Vector{Int}(undef, nnzB) | |||
I = Vector{Int}(undef,nnzB) | |||
I[1:i-1] .= start:stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadcasting is still used here. Is it really a problem to use it in general, though? This function should be compiled only once for Vector{Bool}
and precompilation should work well, so latency probably doesn't matter a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already fixed in main, #2771.
The problem is not with latency but with allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan - in particular #2771 (comment) shows that something bad is happening with Julia compiler in this case. Since adding a loop is not that much longer and is a "safe choice" I accepted #2771.
Anyway - are you OK with merging this PR? (as the issue you asked about is unrelated) Thank you!
Thank you! |
Fixes #2765