Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix-ups for sorting workspace/buffer (#45330) #45570

Merged
merged 6 commits into from
Jun 16, 2022

Conversation

LilithHafner
Copy link
Member

@LilithHafner LilithHafner commented Jun 3, 2022

  • Fix radix_sort! for negative firstindex (Fixes Bug in sorting offset vectors #45568)
  • Actually use a workspace vector for sort!(A::AbstractArray; dims)
  • Preallocate generously right away for sort[!](A::AbstractArray; dims). Eliminating t = similar(A::AbstractArray; 0); ... resize!(t, len) is motivated by correctness concerns: similar returns an AbstractVector and resize! is defined for Vectors. I'd rather allocate a wee bit more than have this broken edge case. If radix sort is eventually used, this is the perfect size. If merge sort, it is twice as large as required, but half as many allocations, and it doesn't matter much because it is only a single slice of the array. If quicksort pre Stabilize, optimize, and increase robustness of QuickSort #45222 or insertion sort is used the allocation could be empty, but I have not observed a substantive performance cost associated with a slightly too large allocation in this case.
  • handle workspaces in an unconventional indexing aware manner
  • minor comments and style changes

@@ -683,7 +683,7 @@ function radix_sort!(v::AbstractVector{U}, lo::Integer, hi::Integer, bits::Unsig
t::AbstractVector{U}, chunk_size=radix_chunk_size_heuristic(lo, hi, bits)) where U <: Unsigned
# bits is unsigned for performance reasons.
mask = UInt(1) << chunk_size - 1
counts = Vector{UInt}(undef, mask+2)
counts = Vector{Int}(undef, mask+2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lo may be negative and counts is also used to store offsets.

@LilithHafner LilithHafner marked this pull request as draft June 4, 2022 13:45
@LilithHafner
Copy link
Member Author

To do workspace management right requires access to OffsetArrays. Hold pending that.

@LilithHafner LilithHafner force-pushed the fix-negative-offset-array branch from 817e81c to 07d60b3 Compare June 5, 2022 12:26
@LilithHafner
Copy link
Member Author

We can have only one-based indexed workspaces so long as we convert inputs to one-based indexing to match. Thanks, @N5N3 for encouraging me to pursue this approach. Unfortunately, there is a runtime penalty for this conversion, so we only do it where necessary. This results in a somewhat inelegant solution but does work reasonably well and avoids runtime penalties most of the time.

It would still be nice to be able to construct OffsetVectors when handling offset vectors as input, but they are not necessary.

@LilithHafner LilithHafner marked this pull request as ready for review June 5, 2022 12:29
@LilithHafner
Copy link
Member Author

All tests passed!!! That's better than master (and totally unrelated to this PR)

Lilith Hafner added 2 commits June 7, 2022 07:07
@LilithHafner LilithHafner changed the title Fix sort! on long negative offset array Fix-ups for #45330 Jun 8, 2022
@LilithHafner LilithHafner changed the title Fix-ups for #45330 Fix-ups for sorting workspace/buffer (#45330) Jun 8, 2022
@LilithHafner LilithHafner requested a review from oscardssmith June 8, 2022 13:56
if t !== nothing && checkbounds(Bool, t, lo:hi) # Fully preallocated and aligned workspace
u2 = radix_sort!(u, lo, hi, bits, reinterpret(U, t))
uint_unmap!(v, u2, lo, hi, o, u_min)
elseif t !== nothing && (applicable(resize!, t) || length(t) >= hi-lo+1) # Viable workspace
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is triggered in the case of sort(::OffsetMatrix; dims)

@@ -842,5 +841,6 @@ end
end
end
end
# The "searchsorted" testset is at the end of the file because it is slow.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these comments because by default I put new tests at the end of the file and these comments remind me not to. (see #45233, #45234)

@oscardssmith
Copy link
Member

overall, this looks good to me.

@LilithHafner
Copy link
Member Author

Thanks! Any next steps you'd like to see from me?

@oscardssmith
Copy link
Member

I don't see any necessary changes (but you know this part of the code better than I do)

@LilithHafner
Copy link
Member Author

It looks good to me too.

@oscardssmith oscardssmith merged commit 6e79796 into JuliaLang:master Jun 16, 2022
@LilithHafner LilithHafner deleted the fix-negative-offset-array branch June 16, 2022 15:34
@DilumAluthge
Copy link
Member

DilumAluthge commented Jun 16, 2022

FYI this shouldn't have been merged with a failing whitespace CI check.

@DilumAluthge
Copy link
Member

DilumAluthge commented Jun 16, 2022

Whitespace check found 2 issues:
base/sort.jl:608 -- trailing whitespace
base/sort.jl:1563 -- trailing whitespace
make: *** [Makefile:104: check-whitespace] Error 1

@DilumAluthge
Copy link
Member

#45713

@LilithHafner
Copy link
Member Author

Sorry about that! Thanks for fixing it. I didn't check because I've been desensitized to one or two failed CI runs, but I'll make sure to check in the future.

Is Win32 the only top-level check that's allowed to fail now?

@DilumAluthge
Copy link
Member

DilumAluthge commented Jun 16, 2022

Win32 shouldn't be failing.

If a check fails, I recommend re-running it to make sure the failure isn't related to the PR.

I also recommend checking the logs of failing jobs. Sometimes the logs alone can tell you whether or not the failure is related to your PR.

In this cases, win32 failed due to an OOM in the Profile tests, so it probably wasn't related to the PR.

But it never hurts to rerun the failing job just to make sure.

@LilithHafner
Copy link
Member Author

Thanks!

@LilithHafner
Copy link
Member Author

Sorry to bother you again, but how do I rerun a failing job that I suspect is unrelated?

@LilithHafner
Copy link
Member Author

Nevermind, I figured it out.

@LilithHafner LilithHafner added the sorting Put things in order label Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sorting Put things in order
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in sorting offset vectors
3 participants