Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize sorting options #771

Merged
merged 8 commits into from
Dec 14, 2023
Merged

Conversation

billylanchantin
Copy link
Contributor

@billylanchantin billylanchantin commented Dec 14, 2023

Closes:

Each of these options are now available on all the sorting functions:

  • direction: :asc | :desc
    • This takes the form of keys in the first arg on the DF functions
  • nils: :first | :last
  • parallel: true | false
    • This isn't available on the lazy functions
  • stable: true | false

lib/explorer/data_frame.ex Outdated Show resolved Hide resolved
lib/explorer/polars_backend/data_frame.ex Show resolved Hide resolved
lib/explorer/shared.ex Show resolved Hide resolved
Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

lib/explorer/data_frame.ex Outdated Show resolved Hide resolved
Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful work. The only suggestion I have is about the defaults.

Numpy and Pandas do not use stable sorting by default nor Polars. Polars is also parallel by default, which I believe we should try to be as well. So my suggestion is to make stable default to false and parallel default to true. This way, we can also simplify the explanation of parallel. We could say "Use parallel algorithms when possible. Set it to false to disable it.".

Thoughts?

@billylanchantin
Copy link
Contributor Author

@josevalim That sounds great. I was keeping the defaults we had, but I hadn't thought about what they ought to be. I can definitely have :stable default to false and :parallel default to true.

This way, we can also simplify the explanation of parallel. We could say "Use parallel algorithms when possible. Set it to false to disable it.".

That's a good tactic: describe the philosophy instead of belaboring the implementation (which may change). I think I'll still add a sentence about parallelism not always being available on lazy operations. That way the reader will have some idea of what we mean by "when possible".

@billylanchantin billylanchantin merged commit b80fd1a into main Dec 14, 2023
4 checks passed
@billylanchantin billylanchantin deleted the bl-normalize-sorting-options branch December 14, 2023 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants