-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple uniqueness checks for sorting-related functions #3312
Conversation
Thank you for working on it. The PR should be implemented in a way that if non standard |
Thanks for the comments, I'll get to work on them :) |
I'm sorry, I don't quite get what you mean by that. What would be an example of non-standard |
as commented in the issue e.g. |
I've addressed the case where I'm aware much better tests are needed, I'm waiting to implement them after the discussion about the other thing |
In general, as you probably noticed, we care a lot about test coverage. The issues like the ones I raised in my comments would be caught by proper tests. Thank you! |
also can you please sync the PR with |
I've just added the checks for the subtypes we had discussed + a fallback for other subtypes of |
Also, I wanted to comment on something I came across while reading for the PR. I think we should mention that the detection of complex order clauses will not realize that for instance, |
I think we can skip this. Users that would understand the difference will anyway assume that this will work this way. |
I think all thats left is to get the coverage back up, I did miss some lines in the coverage report. Other than that, is there something else you would like addressed? |
Looks good. Just please update the tests and push them to GitHub and I will do a final review. |
Status update: While testing I noticed something that could be problematic and I'm still working on it Namely: Whenever
leaves no opportunity for the uniqueness checks inside the function:
I thought it might be uncommon for the end user to create a |
Can we make |
Yeah, thats a possibility. It would simplify the current changes in fact |
Probalby adding a comment above the function definition is enough. In the docstring we do not describe this method. |
@bkamins I reverted the uniqueness checks for |
Looks good. Thank you! Let us wait for a few days to see if there are any more comments on this PR. If not I will merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few stylistics comments.
src/abstractdataframe/sort.jl
Outdated
`lt` keywords are being used, as their application can create duplicate items | ||
inadvertently. Similarly, the use of `order(...)` clauses that specify either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the reasoning behind "as their application can create duplicate items inadvertently". Doesn't that make that argument precisely useful to protect against inadvertent introduction of duplicate elements? We don't have to support this right away, but no need to provide misleading reasons for that. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comments!
My goal was to explain that the disallowed combination is not unsupported because we ran out of time, but rather than the process of sorting after the use of the by
and lt
functions is a bit unintuitive, as the actual sorting happens on numbers that the user never sees. This would make it harder(er) to reason about why a particular dataframe and an exact copy look different even after being sorted the same way (which I think isn't supposed to happen because sorting is stable, but was something brought up on the issue that motivates this PR)
In other words, the message tries to say "we assume you are concerned about duplicates, and it isn't always obvious why the use of by
and lt
can create "invisible duplicates" at the time of sorting, and that is why we still don't support it at the moment"
I don't know if I convey that properly in the current docstring, I'd love to hear your take
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just propose to remove the justification part. It is not strictly needed in the docstring (it needs to describe precisely what is being supported). It is just not supported now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I'll remove them and rework the docstrings a bit
Thank you! I hope you enjoyed it! |
As discussed in #2159, this PR implements basic uniqueness checks for the functions
issorted
,sortperm
,sort
andsort!
via a boolean kwarg set to false by default. The uniqueness checks do not yet take into the account the possible effects of theby
andlt
keyword args.This PR also adds tests for the new functionality, modifies the relevant docstrings, and adds a short description of the new functionality to NEWS.md