Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: initial work to add support for list dtypes #401

Closed
wants to merge 1 commit into from

Conversation

philss
Copy link
Member

@philss philss commented Nov 10, 2022

This is work-in-progress and adds the bases of "list" dtypes.

It also adds the to_list/1 lazy operation that takes a series and creates a list series with the elements of that series. This is useful for aggregations, when you want to capture the elements of a given group. Eg.:

df = Explorer.DataFrame.new(a: [1, 1, 2, 2], b: [9, 8, 7, 6])

grouped = Explorer.DataFrame.group_by(df, :a)

Explorer.DataFrame.summarise_with(grouped, fn df -> [b_merged: Explorer.Series.to_list(df["b"])] end)

The result is going to be something like this:

  #Explorer.DataFrame<
    Polars[2 x 2]
    a integer [1, 2]
    b_merged list(integer) [[ 9, 8 ], [ 7, 6 ]]
  >

Related to:

This is work-in-progress and adds the bases of "list" dtypes.

It also adds the `to_list/1` lazy operation that takes a series
and creates a list series with the elements of that series.
This is useful for aggregations, when you want to capture the elements
of a given group. Eg.:

```elixir
df = Explorer.DataFrame.new(a: [1, 1, 2, 2], b: [9, 8, 7, 6])

grouped = Explorer.DataFrame.group_by(df, :a)

Explorer.DataFrame.summarise_with(grouped, fn df -> [b_merged: Explorer.Series.to_list(df["b"])] end)
```

The result is going to be something like this:

      #Explorer.DataFrame<
        Polars[2 x 2]
        a integer [1, 2]
        b_merged list(integer) [[ 9, 8 ], [ 7, 6 ]]
      >

Related to:
- elixir-explorer#296
- elixir-explorer#400
@philss
Copy link
Member Author

philss commented Nov 10, 2022

I didn't think too much about the inspecting, but I would like suggestions 😃
Also, I should add all the dtypes encodings and normalization soon.

@@ -27,7 +27,7 @@ defmodule Explorer.Series do

@valid_dtypes Explorer.Shared.dtypes()

@type dtype :: :integer | :float | :boolean | :string | :date | :datetime
@type dtype :: :integer | :float | :boolean | :string | :date | :datetime | {:list, :integer}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to make it recursive further on:

Suggested change
@type dtype :: :integer | :float | :boolean | :string | :date | :datetime | {:list, :integer}
@type dtype :: :integer | :float | :boolean | :string | :date | :datetime | {:list, dtype}

@philss
Copy link
Member Author

philss commented Nov 10, 2022

I'm closing this for now in order to focus in the next release. But anyone that wants to give it a try, feel free to cherry-pick this work :)
I should get beck to it after 0.4.

@philss philss closed this Nov 10, 2022
@cigrainger cigrainger mentioned this pull request Dec 17, 2022
liamdiprose pushed a commit to liamdiprose/explorer that referenced this pull request Feb 16, 2023
@lambdaofgod
Copy link

Hey @philss, how much work do you think it would take to make it work in 0.7?
I'm motivated to do this, but I'm an Elixir newbie and I don't want to get anyone hopes up in case I'll get bogged down

@philss
Copy link
Member Author

philss commented Aug 30, 2023

@lambdaofgod I think the representation part of lists is the easiest part, which is what I did in this PR. The real problem is to support operations with lists. We are postponing this since last year because of the unknown complexity of maintaining the lists dtypes. But I think we are near to start looking into this again.

how much work do you think it would take to make it work in 0.7?

Short answer is: we don't know yet 😅

@philss philss mentioned this pull request Oct 31, 2023
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants