Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Series.sort_by and Series.sort_with #762

Merged
merged 5 commits into from
Dec 10, 2023
Merged

Conversation

billylanchantin
Copy link
Contributor

@billylanchantin billylanchantin commented Dec 7, 2023

Adds:

  • Series.arrange (macro)
  • Series.arrange_with (function)

Closes:

I'm opening this up early. There are two decisions we need to make before I write docs:

  • Names:
    • Series.arrange{_with}
    • Series.sort{_with} (already taken)
  • API:
    • Series.arrange(s, _, direction: :asc) (matches Series.sort)
    • Series.arrange(s, asc: _) (matches DataFrame.arrange)

Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

For the name, I don't have a strong opinion about it. I think it would be nice to keep the sort naming, since it seems more clear in the context of the series (and matches Enum, as you said in the issue). But again, I'm fine with the arrange naming :)

For the API, I think the version you did here seems better. I would change the arrange from the DF to follow the same API.

@billylanchantin
Copy link
Contributor Author

@philss Thanks for looking!

I think it would be nice to keep the sort naming

Agreed. But can we without a breaking change? Today way support:

s = Explorer.Series.from_list([2, 1, 3])
Explorer.Series.sort(s, direction: :asc)

If we want to also support:

Explorer.Series.sort(s, remainder(_, 3))

There would be a conflict for the sort/2 namespace.

@billylanchantin
Copy link
Contributor Author

We could also do sort_by and sort_with.

(Or the cursed sort_by and sort_by_with 😅)

@philss
Copy link
Member

philss commented Dec 7, 2023

There would be a conflict for the sort/2 namespace.

Yeah, I see :/

We could also do sort_by and sort_with.

It's actually a good idea IMHO :D

@josevalim
Copy link
Member

I like sort_by a lot and had the same idea. Which means we should pass direction as option for symmetry with sort.

I am not ecstatic about sort_with. We could just allow a function in sort, but that would differ from map/filter, so I don’t know.

@cigrainger
Copy link
Member

I agree -- I like sort_by. Generally I'm keen to use sort consistently with DFs as well and rename arrange -> sort.

@josevalim
Copy link
Member

But would we make them all consistent? Arrange? Mutate?

To be honest, I dislike mutate, but it would be a huge departure for the library. I think rooting in R or Pandas make sense, and the general agreement is that R wins API-elegance wise.

@cigrainger
Copy link
Member

I still think mutate doesn't really apply to series so there's nothing to make consistent there. What else would you call it?

100% R wins elegance-wise but it's not necessarily in the names -- in fact, R has a lot of 'creative' naming to avoid namespace conflicts with base. The elegance comes from matching 'verbs' to (tidy data) principles and maintaining parsimony. Which I think we're doing well! I'm pretty sure arrange is one of the cases of avoiding namespace conflict in R.

@josevalim
Copy link
Member

For me a mutate could be a map, transform, or anything else that does not imply an in-memory change, but it is not a hill I would die on.

if we are renaming arrange, would we call it sort or sort_by?

@cigrainger
Copy link
Member

cigrainger commented Dec 8, 2023

Yeah no I totally agree. But mutate in R has always been a bit weird. It's specifically about changing a data frame in some way. Adding, replacing, or changing columns.

I'd just call it sort. I'm totally cool with unifying on arrange for both too. I mainly think they should be consistent. I just don't want to skip out on the more obvious sort just to maintain a naming convention from R that was an artifact of necessity rather than choice.

@josevalim
Copy link
Member

So maybe sort_by and sort_with for full symmetry with series?

@josevalim
Copy link
Member

Although in one we do asc:/desc: and the other does direction: :asc/:desc, so they wouldn’t quite mirror.

@cigrainger
Copy link
Member

So maybe sort_by and sort_with for full symmetry with series?

I'm cool with this.

Although in one we do asc:/desc: and the other does direction: :asc/:desc, so they wouldn’t quite mirror.

True but I think close enough. :)

@billylanchantin
Copy link
Contributor Author

Update:

  • Names:
    • sort_by, sort_with
  • API
    • sort_by(s, -1 * _, direction: :asc)

I've also found some mild inconsistencies in the sorting API (like stable is available on sort_by but not sort). But I think it'd be better to raise an issue and address that as part of a holistic pass that normalizes the entire sorting API.

@billylanchantin billylanchantin changed the title Add Series.arrange and Series.arrange_with Add Series.sort_by and Series.sort_with Dec 8, 2023
@josevalim
Copy link
Member

Yeah, I think there are two issues next: making sure the options are consistent and then renaming DF.arrange.

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits and ship it!

billylanchantin and others added 2 commits December 10, 2023 09:23
Co-authored-by: José Valim <[email protected]>
Co-authored-by: José Valim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants