Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for grouping into series #741

Conversation

costaraphael
Copy link
Contributor

Hey folks!

My first intention with this PR was to implement implode to allow gathering all the values of a given group into a Series. Halfway through it, I noticed I was getting some double wrapping (i.e. [[[1, 2]], [[3, 4]]] instead of [[1, 2], [3, 4]]).

Then I looked a bit further and noticed there isn't really a need for implode to have this feature, we just need to allow plain lazy series in summarize and it just works, which is the approach this PR follows. I also updated the docs to reflect that.

Please let me know if there's something I'm overlooking with this PR, or if this is not a wanted feature. I'm more than happy to start this over with a different approach or just scrap it altogether 🙂

@billylanchantin
Copy link
Contributor

@costaraphael Thanks for the PR!

I know that after #725 we have our eye on functions like implode. Though this does seem like an elegant way to get a similar (the same?) thing!

When you say you were getting double lists, what caused it?

@costaraphael
Copy link
Contributor Author

costaraphael commented Nov 26, 2023

@billylanchantin I first tried to get something like this working:

DF.new(
  letters: ~w(a b c d e),
  is_vowel: [true, false, false, false, true]
)
|> DF.group_by(:is_vowel)
|> DF.summarise(letters: implode(letters))

To get this working, I went through the normal steps (adding the function to Explorer.Series, and Explorer.PolarsBackend.Series, etc), all very straight forward, since implode is already implemented as an eager function in Series and in Expr. To keep just the relevant bits, here's the Rust implementation I added for lazy series:

#[rustler::nif]
pub fn expr_last(expr: ExExpr) -> ExExpr {
    let expr = expr.clone_inner();

    ExExpr::new(expr.implode())
}

When I had this, the summarise call above would yield the following:

#Explorer.DataFrame<
  Polars[2 x 2]
  is_vowel boolean [true, false]
  letters list[string] [[["a", "e"]], [["b", "c", "d"]]]
>

I spent some time trying to understand what was happening, and at one point I tried removing the implode call from the Rust code as a sanity check:

#[rustler::nif]
pub fn expr_last(expr: ExExpr) -> ExExpr {
    let expr = expr.clone_inner();

    ExExpr::new(expr)
}

Lo-and-behold, it just worked:

#Explorer.DataFrame<
  Polars[2 x 2]
  is_vowel boolean [true, false]
  letters list[string] [["a", "e"], ["b", "c", "d"]]
>

Which is when I realized that maybe we don't need the implode call at all 😅

Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! :shipit:

@josevalim josevalim merged commit 3d63f49 into elixir-explorer:main Nov 27, 2023
3 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants