DataFrame with namedtuples uses ._field as column names #13824

mroeschke · 2023-08-05T01:06:00Z

Description

Allow namedtuple's _field attribute to be mapped to DataFrame column labels like pandas

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

galipremsagar · 2023-08-07T15:28:32Z

python/cudf/cudf/core/dataframe.py

+        elif (
+            len(data) > 0
+            and columns is None
+            and isinstance(data[0], tuple)
+            and hasattr(data[0], "_fields")
+        ):
+            # pandas behavior is to use the fields from the first tuple
+            # as the column names
+            columns = data[0]._fields
+            values = itertools.zip_longest(*data)
+            data = dict(zip(columns, values))
+            self._init_from_dict_like(
+                data, index=index, nan_as_null=nan_as_null
+            )


Suggested change

elif (

len(data) > 0

and columns is None

and isinstance(data[0], tuple)

and hasattr(data[0], "_fields")

):

# pandas behavior is to use the fields from the first tuple

# as the column names

columns = data[0]._fields

values = itertools.zip_longest(*data)

data = dict(zip(columns, values))

self._init_from_dict_like(

data, index=index, nan_as_null=nan_as_null

)

if (

len(data) > 0

and columns is None

and isinstance(data[0], tuple)

and hasattr(data[0], "_fields")

):

# pandas behavior is to use the fields from the first tuple

# as the column names

columns = data[0]._fields

Will just doing this work? i.e., I guess the code below would be handling the data construction no? I might be missing something but from the issue I thought we were only missing out on the column names.

I was curious about this too; since this code path just assigns self.columns = columns I wasn't sure if it was okay if columns was a tuple or needed to be a different type

Okay this looks to work and produces a column Index too so I changed it to this.

wence-

I think this doesn't mimic pandas behaviour exactly, and has some edge cases where it silently drops data.

python/cudf/cudf/core/dataframe.py

python/cudf/cudf/tests/test_dataframe.py

…ples

wence- · 2023-08-09T08:55:22Z

/merge

DataFrame with namedtuples uses ._field as column names

d70d4f0

mroeschke requested a review from a team as a code owner August 5, 2023 01:06

mroeschke requested review from vyasr and galipremsagar August 5, 2023 01:06

github-actions bot added the Python Affects Python cuDF API. label Aug 5, 2023

mroeschke added bug Something isn't working non-breaking Non-breaking change python and removed Python Affects Python cuDF API. labels Aug 5, 2023

galipremsagar reviewed Aug 7, 2023

View reviewed changes

wence- requested changes Aug 7, 2023

View reviewed changes

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved

python/cudf/cudf/tests/test_dataframe.py Show resolved Hide resolved

Matthew Roeschke added 3 commits August 7, 2023 17:30

Merge remote-tracking branch 'upstream/branch-23.10' into bug/namedtu…

97c9149

…ples

Simplify code path, add test for raising

4476ab5

Merge remote-tracking branch 'upstream/branch-23.10' into bug/namedtu…

421807e

…ples

github-actions bot added the Python Affects Python cuDF API. label Aug 9, 2023

Matthew Roeschke added 2 commits August 8, 2023 18:14

Remove unused keyword

d4feb46

Reduce diff further

414d7a1

galipremsagar approved these changes Aug 9, 2023

View reviewed changes

wence- approved these changes Aug 9, 2023

View reviewed changes

rapids-bot bot merged commit da6ac73 into rapidsai:branch-23.10 Aug 9, 2023

mroeschke deleted the bug/namedtuples branch August 9, 2023 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame with namedtuples uses ._field as column names #13824

DataFrame with namedtuples uses ._field as column names #13824

mroeschke commented Aug 5, 2023

galipremsagar Aug 7, 2023 •

edited

Loading

mroeschke Aug 8, 2023

mroeschke Aug 9, 2023

wence- left a comment

wence- commented Aug 9, 2023

DataFrame with namedtuples uses ._field as column names #13824

DataFrame with namedtuples uses ._field as column names #13824

Conversation

mroeschke commented Aug 5, 2023

Description

Checklist

galipremsagar Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

mroeschke Aug 8, 2023

Choose a reason for hiding this comment

mroeschke Aug 9, 2023

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

wence- commented Aug 9, 2023

galipremsagar Aug 7, 2023 •

edited

Loading