Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame with namedtuples uses ._field as column names #13824

Merged
merged 6 commits into from
Aug 9, 2023

Conversation

mroeschke
Copy link
Contributor

Description

Allow namedtuple's _field attribute to be mapped to DataFrame column labels like pandas

closes #13823

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke requested a review from a team as a code owner August 5, 2023 01:06
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 5, 2023
@mroeschke mroeschke added bug Something isn't working non-breaking Non-breaking change python and removed Python Affects Python cuDF API. labels Aug 5, 2023
Comment on lines 851 to 864
elif (
len(data) > 0
and columns is None
and isinstance(data[0], tuple)
and hasattr(data[0], "_fields")
):
# pandas behavior is to use the fields from the first tuple
# as the column names
columns = data[0]._fields
values = itertools.zip_longest(*data)
data = dict(zip(columns, values))
self._init_from_dict_like(
data, index=index, nan_as_null=nan_as_null
)
Copy link
Contributor

@galipremsagar galipremsagar Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif (
len(data) > 0
and columns is None
and isinstance(data[0], tuple)
and hasattr(data[0], "_fields")
):
# pandas behavior is to use the fields from the first tuple
# as the column names
columns = data[0]._fields
values = itertools.zip_longest(*data)
data = dict(zip(columns, values))
self._init_from_dict_like(
data, index=index, nan_as_null=nan_as_null
)
if (
len(data) > 0
and columns is None
and isinstance(data[0], tuple)
and hasattr(data[0], "_fields")
):
# pandas behavior is to use the fields from the first tuple
# as the column names
columns = data[0]._fields

Will just doing this work? i.e., I guess the code below would be handling the data construction no? I might be missing something but from the issue I thought we were only missing out on the column names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious about this too; since this code path just assigns self.columns = columns I wasn't sure if it was okay if columns was a tuple or needed to be a different type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this looks to work and produces a column Index too so I changed it to this.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't mimic pandas behaviour exactly, and has some edge cases where it silently drops data.

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Show resolved Hide resolved
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 9, 2023
@wence-
Copy link
Contributor

wence- commented Aug 9, 2023

/merge

@rapids-bot rapids-bot bot merged commit da6ac73 into rapidsai:branch-23.10 Aug 9, 2023
@mroeschke mroeschke deleted the bug/namedtuples branch August 9, 2023 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: cudf.DataFrame does not map namedtuple field names to column names like pandas
3 participants