Allow cudf.concat to process a dictionary of frames #15115 #15126

amanlai · 2024-02-23T03:32:23Z

Unlike pandas.concat, cudf.concat doesn't work with a dictionary of objects. The following code raises an error.

d = {
    'first': cudf.DataFrame({'A': [1, 2], 'B': [3, 4]}),
    'second': cudf.DataFrame({'A': [5, 6], 'B': [7, 8]}),
}

cudf.concat(d, axis=1)
cudf.concat(d)

This commit resolves this issue.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2024-02-23T03:32:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

shwina · 2024-02-23T18:06:12Z

/ok to test

…ionary into branch-24.04

shwina · 2024-02-23T20:39:28Z

/ok to test

shwina · 2024-02-26T11:12:35Z

/ok to test

shwina · 2024-02-26T11:34:43Z

python/cudf/cudf/tests/test_concat.py

+    d = {
+        'first': cudf.DataFrame({'A': [1, 2], 'B': [3, 4]}),
+        'second': cudf.DataFrame({'A': [5, 6], 'B': [7, 8]}),
+    }


Can we parametrize to include a few additional test cases?

A single entry in d

A third entry in d with a different number of elements, e.g.,:
d = { 'first': pd.DataFrame({'A': [1, 2], 'B': [3, 4]}), 'second': pd.DataFrame({'A': [5, 6], 'B': [7, 8]}), 'third': pd.DataFrame({'C': [1, 2, 3]}) }

Aren't single entry in d and a third entry in d with just 1 key-value pair the same thing? Are you saying there should be a test for d = {'first': cudf.DataFrame({'A': [1, 2], 'B': [3, 4]})}?

Sorry - I mistyped! I've edited my comment

shwina · 2024-02-26T11:36:17Z

python/cudf/cudf/tests/test_concat.py

@@ -1921,3 +1921,21 @@ def test_concat_mixed_list_types_error(s1, s2):

    with pytest.raises(NotImplementedError):
        cudf.concat([s1, s2], ignore_index=True)
+
+
+def test_horizontal_concat_dictionary():


Should we also add tests for concatenating dict-of-Series and dict-of-Index objects?

If those cases don't work currently - that's OK. We should at least raise a NotImplementedError.

I can add tests for these.

shwina

Overall, this is looking pretty good! Just left a comment asking for more tests.

shwina · 2024-02-26T11:38:14Z

Looks like the PR is failing style checks. Can you run locally:

pre-commit run --all

and fix up any style issues reported?

amanlai · 2024-02-26T18:18:33Z

@shwina I screwed up and deleted the branch by mistake which I think closed the PR. How do I start this up again?

shwina · 2024-02-26T18:28:55Z

Here's what I would do; the following creates a local branch called fea-concat-dict-axis-1 starting from branch-24.04, and applies your two commits on top of it:

git checkout branch-24.04
git checkout -b fea-concat-dict-axis-1
git cherry-pick --ff b16713a4022487af9e8cba162d2da4fc20309cda
git cherry-pick --ff f7ed7f98e7e4a3d27565246d8c5cb4c09fb099d7

Then you can push the branch fea-concat-dict-axis-1 and create a PR from it.

Does that help? Feel free to ask if you have any questions!

amanlai · 2024-02-27T06:23:33Z

@shwina just to clarify, do I create a new pull request?

shwina · 2024-02-27T11:15:49Z

Yes I think that's best

amanlai · 2024-02-28T10:25:45Z

@shwina I ended up making a new pull request at #15160.

Allow cudf.concat to process a dictionary of frames #15115

b16713a

github-actions bot added the Python Affects Python cuDF API. label Feb 23, 2024

Merge branch 'branch-24.04' into branch-24.04

586c047

shwina added feature request New feature or request non-breaking Non-breaking change labels Feb 23, 2024

amanlai added 2 commits February 23, 2024 11:32

axis=0 is not supported yet

f7ed7f9

Merge branch 'branch-24.04' of https://github.com/amanlai/concat-dict…

23fb675

…ionary into branch-24.04

shwina reviewed Feb 26, 2024

View reviewed changes

amanlai deleted the branch rapidsai:branch-24.04 February 26, 2024 17:43

amanlai closed this Feb 26, 2024

amanlai deleted the branch-24.04 branch February 26, 2024 17:43

amanlai restored the branch-24.04 branch February 26, 2024 18:13

shwina reopened this Feb 27, 2024

shwina closed this Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow cudf.concat to process a dictionary of frames #15115 #15126

Allow cudf.concat to process a dictionary of frames #15115 #15126

amanlai commented Feb 23, 2024 •

edited

Loading

copy-pr-bot bot commented Feb 23, 2024

shwina commented Feb 23, 2024

shwina commented Feb 23, 2024 •

edited

Loading

shwina commented Feb 26, 2024

shwina Feb 26, 2024 •

edited

Loading

amanlai Feb 26, 2024

shwina Feb 26, 2024

shwina Feb 26, 2024

amanlai Feb 26, 2024

shwina left a comment

shwina commented Feb 26, 2024

amanlai commented Feb 26, 2024

shwina commented Feb 26, 2024

amanlai commented Feb 27, 2024

shwina commented Feb 27, 2024

amanlai commented Feb 28, 2024

Allow cudf.concat to process a dictionary of frames #15115 #15126

Allow cudf.concat to process a dictionary of frames #15115 #15126

Conversation

amanlai commented Feb 23, 2024 • edited Loading

Checklist

copy-pr-bot bot commented Feb 23, 2024

shwina commented Feb 23, 2024

shwina commented Feb 23, 2024 • edited Loading

shwina commented Feb 26, 2024

shwina Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

amanlai Feb 26, 2024

Choose a reason for hiding this comment

shwina Feb 26, 2024

Choose a reason for hiding this comment

shwina Feb 26, 2024

Choose a reason for hiding this comment

amanlai Feb 26, 2024

Choose a reason for hiding this comment

shwina left a comment

Choose a reason for hiding this comment

shwina commented Feb 26, 2024

amanlai commented Feb 26, 2024

shwina commented Feb 26, 2024

amanlai commented Feb 27, 2024

shwina commented Feb 27, 2024

amanlai commented Feb 28, 2024

amanlai commented Feb 23, 2024 •

edited

Loading

shwina commented Feb 23, 2024 •

edited

Loading

shwina Feb 26, 2024 •

edited

Loading