-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix type instability in sort for few columns case and fix issorted bug #2746
Conversation
We have a bad design of sorting in general:
I will push an update (and then the code will be simplified with recursion) |
@nalimilan - this should be good to have a look at. The only issue is that in general sorting is expensive to compile. I will have to think how to reduce this cost (though maybe it is not easy to do). |
Timings after this PR:
and
so all is OK (the penalty of |
Yeah, that little overhead is fantastic. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I wonder how this went unnoticed for so long.
Maybe one way to limit the compilation cost would be to sort column-wise rather than row-wise (starting with the last column)? Not sure whether that would be fast. Anyway that would require a deeper refactoring so this PR is useful even if we later change the approach.
# | ||
# If a user only specifies a few columns, the DataFrame | ||
# contained in the DFPerm only contains those columns, and | ||
# the permutation induced by this ordering is used to | ||
# sort the original (presumably larger) DataFrame | ||
|
||
struct DFPerm{O<:Union{Ordering, AbstractVector}, T<:Tuple} <: Ordering | ||
struct DFPerm{O<:Union{Ordering, Tuple{Vararg{Ordering}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make O
always a Tuple{Vararg{Ordering}}
? That would avoid the need for ord isa Ordering
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if you do sort(df)
you want a single Ordering
that is reused to avoid excessive compilation. I would assume that ord isa Ordering
check should be optimized out by the compiler so it should have no performance penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried simplifying it but I always ended up with compiler allocating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Why would wrapping the Ordering
in a one-element tuple trigger more compilation or allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - one element Tuple
is not a problem, but in this case we would anyway have to branch if the tuple is one element or matches length of the column vector.
What allocates is if we wanted to have one tuple where each element would hold a tuple consisting of order and column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Anyway always wrapping Ordering
in a tuple sounds simpler conceptually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it if I can make work it fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the conclusion regarding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to leave the union (as in the original design) as in the end the logic was simpler (through dispatch).
Co-authored-by: Milan Bouchet-Valat <[email protected]>
This would be slower AFAICT, because now we are short circuting (so mostly only one comparison is needed on the first column in typical cases) |
We cannot use recursion, as it will break on very wide data frames. I will fix it tomorrow. |
Good point. But I guess that depends on the types of sorted columns. When sorting on bitstype columns which can use optimized algorithms, I imagine that sorting one column at a time could be faster. I'm thinking about integer columns with small ranges which use counting sort in Base, floating point columns which use a special quick sort IIRC, or other types which could use radix sort (only via SortingAlgorithms currently). But we would have to be very careful to do that only for cases known to be fast anyway and that's tricky to get right. |
I agree, but currently we have:
so I think that the first step should be to make sorting on single column fast (which we do at some point - in general sorting and reshaping are things to work on in the near future as these were the areas here no new things were added for a long time). But for this PR I would concentrate the design on fixing type instability issues. |
Here are the benchmarks after the fix of recursion. Additionally I have discovered a bug in this PR
current
|
@nalimilan - no rush, but it would be good to review it and merge, as it is fixing a bug in |
src/abstractdataframe/sort.jl
Outdated
function Sort.lt(o::DFPerm{<:Any, <:Tuple}, a, b) | ||
ord = o.ord | ||
cols = o.cols | ||
length(cols) > 16 && return unstable_lt(ord, cols, a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining how the 16 threshold was chosen?
length(cols) > 16 && return unstable_lt(ord, cols, a, b) | |
# if there are too many columns fall back to type unstable mode to avoid high compilation cost | |
# it is expected that in practice users sort data frames on only few columns | |
length(cols) > 16 && return unstable_lt(ord, cols, a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added. I have not tuned 16 specifically. I just assume that 16 is a safe threshold. Probably we could pass some higher number here, but I think that normally one does not sort on more than something like 4 columns.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Thank you! |
Fixes #2745 by loop unrolling.
Probably is is possible to do it in a smarter way using metaprogramming (but maybe not) so I am marking it as a draft.