Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DataFrame.melt #1049

Merged
merged 3 commits into from
May 2, 2024
Merged

Add support for DataFrame.melt #1049

merged 3 commits into from
May 2, 2024

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented May 1, 2024

Closes #1002
dask/dask component: dask/dask#11088

I'm not really sure if melt has been left out of the API for a specific reason. This PR adds basic support using map_partitions.

@rjzamora rjzamora added the enhancement New feature or request label May 1, 2024
Comment on lines 4000 to 4008
meta = make_meta(
meta_nonempty(self._meta).melt(
id_vars=id_vars,
value_vars=value_vars,
var_name=var_name,
value_name=value_name,
col_level=col_level,
)
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm this? Wouldn't want to merge it if we don't need it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah. Default emulation works fine for the tests - Removing the unnecessary logic.

dict(value_vars=["s1", "s2"]),
],
)
def test_melt(kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test copied from dask/dask for now.

Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run the dask/dask melt tests on this to make sure that all of them are passing?

We'd also need a PR that enables those tests on CI

Comment on lines 4000 to 4008
meta = make_meta(
meta_nonempty(self._meta).melt(
id_vars=id_vars,
value_vars=value_vars,
var_name=var_name,
value_name=value_name,
col_level=col_level,
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm this? Wouldn't want to merge it if we don't need it.

@rjzamora
Copy link
Member Author

rjzamora commented May 2, 2024

Thanks for the review @phofl ! Sorry - Should have marked this as a draft. I was curious why melt was never added, and didn't want to bury too much time in a PR only to find out that there are deeper issues I didn't know about. If the answer is: "none of the guiding benchmarks/workflows seemed to need it", then I'll be happy to clean this up and make test coverage is thorough both here and in dask/dask.

Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

can you add it to the api docs as well if not already done in your other PR?

@phofl phofl merged commit e82578d into dask:main May 2, 2024
6 of 7 checks passed
@phofl
Copy link
Collaborator

phofl commented May 2, 2024

thx

@rjzamora rjzamora deleted the add-melt branch May 2, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for DataFrame.melt
2 participants