Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cudf.concat to process a dictionary of frames #15115 #15126

Closed
wants to merge 4 commits into from
Closed

Allow cudf.concat to process a dictionary of frames #15115 #15126

wants to merge 4 commits into from

Conversation

amanlai
Copy link

@amanlai amanlai commented Feb 23, 2024

Closes #15115.

Unlike pandas.concat, cudf.concat doesn't work with a dictionary of objects. The following code raises an error.

d = {
    'first': cudf.DataFrame({'A': [1, 2], 'B': [3, 4]}),
    'second': cudf.DataFrame({'A': [5, 6], 'B': [7, 8]}),
}

cudf.concat(d, axis=1)
cudf.concat(d)

This commit resolves this issue.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Feb 23, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 23, 2024
@shwina shwina added feature request New feature or request non-breaking Non-breaking change labels Feb 23, 2024
@shwina
Copy link
Contributor

shwina commented Feb 23, 2024

/ok to test

@shwina
Copy link
Contributor

shwina commented Feb 23, 2024

/ok to test

1 similar comment
@shwina
Copy link
Contributor

shwina commented Feb 26, 2024

/ok to test

Comment on lines +1928 to +1931
d = {
'first': cudf.DataFrame({'A': [1, 2], 'B': [3, 4]}),
'second': cudf.DataFrame({'A': [5, 6], 'B': [7, 8]}),
}
Copy link
Contributor

@shwina shwina Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we parametrize to include a few additional test cases?

  • A single entry in d
  • A third entry in d with a different number of elements, e.g.,:
    d = {
        'first': pd.DataFrame({'A': [1, 2], 'B': [3, 4]}),
        'second': pd.DataFrame({'A': [5, 6], 'B': [7, 8]}),
        'third': pd.DataFrame({'C': [1, 2, 3]})
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't single entry in d and a third entry in d with just 1 key-value pair the same thing? Are you saying there should be a test for d = {'first': cudf.DataFrame({'A': [1, 2], 'B': [3, 4]})}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I mistyped! I've edited my comment

@@ -1921,3 +1921,21 @@ def test_concat_mixed_list_types_error(s1, s2):

with pytest.raises(NotImplementedError):
cudf.concat([s1, s2], ignore_index=True)


def test_horizontal_concat_dictionary():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add tests for concatenating dict-of-Series and dict-of-Index objects?

If those cases don't work currently - that's OK. We should at least raise a NotImplementedError.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add tests for these.

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is looking pretty good! Just left a comment asking for more tests.

@shwina
Copy link
Contributor

shwina commented Feb 26, 2024

Looks like the PR is failing style checks. Can you run locally:

pre-commit run --all

and fix up any style issues reported?

@amanlai amanlai deleted the branch rapidsai:branch-24.04 February 26, 2024 17:43
@amanlai amanlai closed this Feb 26, 2024
@amanlai amanlai deleted the branch-24.04 branch February 26, 2024 17:43
@amanlai amanlai restored the branch-24.04 branch February 26, 2024 18:13
@amanlai
Copy link
Author

amanlai commented Feb 26, 2024

@shwina I screwed up and deleted the branch by mistake which I think closed the PR. How do I start this up again?

@shwina
Copy link
Contributor

shwina commented Feb 26, 2024

Here's what I would do; the following creates a local branch called fea-concat-dict-axis-1 starting from branch-24.04, and applies your two commits on top of it:

git checkout branch-24.04
git checkout -b fea-concat-dict-axis-1
git cherry-pick --ff b16713a4022487af9e8cba162d2da4fc20309cda
git cherry-pick --ff f7ed7f98e7e4a3d27565246d8c5cb4c09fb099d7

Then you can push the branch fea-concat-dict-axis-1 and create a PR from it.

Does that help? Feel free to ask if you have any questions!

@amanlai
Copy link
Author

amanlai commented Feb 27, 2024

@shwina just to clarify, do I create a new pull request?

@shwina
Copy link
Contributor

shwina commented Feb 27, 2024

Yes I think that's best

@shwina shwina reopened this Feb 27, 2024
@shwina shwina closed this Feb 27, 2024
@amanlai
Copy link
Author

amanlai commented Feb 28, 2024

@shwina I ended up making a new pull request at #15160.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Allow cudf.concat to process a dictionary of frames / add keys=
2 participants