-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix-ups for sorting workspace/buffer (#45330) #45570
Fix-ups for sorting workspace/buffer (#45330) #45570
Conversation
@@ -683,7 +683,7 @@ function radix_sort!(v::AbstractVector{U}, lo::Integer, hi::Integer, bits::Unsig | |||
t::AbstractVector{U}, chunk_size=radix_chunk_size_heuristic(lo, hi, bits)) where U <: Unsigned | |||
# bits is unsigned for performance reasons. | |||
mask = UInt(1) << chunk_size - 1 | |||
counts = Vector{UInt}(undef, mask+2) | |||
counts = Vector{Int}(undef, mask+2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lo
may be negative and counts is also used to store offsets.
To do workspace management right requires access to OffsetArrays. Hold pending that. |
817e81c
to
07d60b3
Compare
We can have only one-based indexed workspaces so long as we convert inputs to one-based indexing to match. Thanks, @N5N3 for encouraging me to pursue this approach. Unfortunately, there is a runtime penalty for this conversion, so we only do it where necessary. This results in a somewhat inelegant solution but does work reasonably well and avoids runtime penalties most of the time. It would still be nice to be able to construct OffsetVectors when handling offset vectors as input, but they are not necessary. |
All tests passed!!! That's better than master (and totally unrelated to this PR) |
…so minor style changes and fixups from JuliaLang#45596 and local review.
if t !== nothing && checkbounds(Bool, t, lo:hi) # Fully preallocated and aligned workspace | ||
u2 = radix_sort!(u, lo, hi, bits, reinterpret(U, t)) | ||
uint_unmap!(v, u2, lo, hi, o, u_min) | ||
elseif t !== nothing && (applicable(resize!, t) || length(t) >= hi-lo+1) # Viable workspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is triggered in the case of sort(::OffsetMatrix; dims)
@@ -842,5 +841,6 @@ end | |||
end | |||
end | |||
end | |||
# The "searchsorted" testset is at the end of the file because it is slow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, this looks good to me. |
Thanks! Any next steps you'd like to see from me? |
I don't see any necessary changes (but you know this part of the code better than I do) |
It looks good to me too. |
FYI this shouldn't have been merged with a failing whitespace CI check. |
|
Sorry about that! Thanks for fixing it. I didn't check because I've been desensitized to one or two failed CI runs, but I'll make sure to check in the future. Is Win32 the only top-level check that's allowed to fail now? |
Win32 shouldn't be failing. If a check fails, I recommend re-running it to make sure the failure isn't related to the PR. I also recommend checking the logs of failing jobs. Sometimes the logs alone can tell you whether or not the failure is related to your PR. In this cases, win32 failed due to an OOM in the Profile tests, so it probably wasn't related to the PR. But it never hurts to rerun the failing job just to make sure. |
Thanks! |
Sorry to bother you again, but how do I rerun a failing job that I suspect is unrelated? |
Nevermind, I figured it out. |
sort!(A::AbstractArray; dims)
sort[!](A::AbstractArray; dims)
. Eliminatingt = similar(A::AbstractArray; 0); ... resize!(t, len)
is motivated by correctness concerns:similar
returns anAbstractVector
andresize!
is defined forVector
s. I'd rather allocate a wee bit more than have this broken edge case. If radix sort is eventually used, this is the perfect size. If merge sort, it is twice as large as required, but half as many allocations, and it doesn't matter much because it is only a single slice of the array. If quicksort pre Stabilize, optimize, and increase robustness of QuickSort #45222 or insertion sort is used the allocation could be empty, but I have not observed a substantive performance cost associated with a slightly too large allocation in this case.