-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement relocate
inspired by dplyr
#619
Conversation
native/explorer/src/dataframe.rs
Outdated
.chain(first_to_relocate.into_iter()) | ||
.chain(second_to_relocate.into_iter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be wrong, actually. This wouldn't re-insert the columns in the correct order
df = Explorer.DataFrame.new(col_a: [1, 2], col_b: [5.1, 5.2], col_c: [4,5], col_d: ["yes", "no"], col_e: [4, 1])
# ..
df2 = Explorer.DataFrame.relocate(df, [4, 0], before: 2)
> #Explorer.DataFrame<
Polars[2 x 5]
col_b float [5.1, 5.2]
col_e integer [4, 1] # order is 'e', then 'a'
col_a integer [1, 2]
col_c integer [4, 5]
col_d string ["yes", "no"]
>
Explorer.DataFrame.dump_csv(df2)
> {:ok, "col_b,col_a,col_e,col_c,col_d\n5.1,1,4,4,yes\n5.2,2,1,5,no\n"}
# ^ but here it's 'a' then 'e'
So while the implementation in Elixir orders it appropriately, it isn't ordered correctly here for the underlying dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, should be addressed in 234c61c :)
lib/explorer/data_frame.ex
Outdated
defp relocate_columns({direction, :first}, df, columns_to_relocate), | ||
do: relocate_columns({direction, 0}, df, columns_to_relocate) | ||
|
||
defp relocate_columns({direction, :last}, df, columns_to_relocate), | ||
do: relocate_columns({direction, -1}, df, columns_to_relocate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this "special casing" is somewhat redundant when 0 and -1 can be provided as the index. Should I omit this in favor for consistency with other functions that don't use :first
and :last
in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the readability of using :first | :last, so I would keep it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is an ambiguity if you have a column name first or last, so we should probably remove it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true. Although you'd be able to specify the column with a string instead. I'm leaning on removing it still, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 09d2bf7, will look at the lazy data frames now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment on first/last but it looks great to me, thank you! Also note that we have to support this operation on lazy data frames. :)
Great call on using |
@josevalim I was about to add a couple test cases for |
Feel free to submit a PR, but I would only add a simple smoke test, given it is built on top of select. :) |
My attempt at resolving #603
I've added tests and an implementation for Polars (non-lazy). Any suggestions and pointers are welcome :)