-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame with namedtuples uses ._field as column names #13824
Conversation
python/cudf/cudf/core/dataframe.py
Outdated
elif ( | ||
len(data) > 0 | ||
and columns is None | ||
and isinstance(data[0], tuple) | ||
and hasattr(data[0], "_fields") | ||
): | ||
# pandas behavior is to use the fields from the first tuple | ||
# as the column names | ||
columns = data[0]._fields | ||
values = itertools.zip_longest(*data) | ||
data = dict(zip(columns, values)) | ||
self._init_from_dict_like( | ||
data, index=index, nan_as_null=nan_as_null | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif ( | |
len(data) > 0 | |
and columns is None | |
and isinstance(data[0], tuple) | |
and hasattr(data[0], "_fields") | |
): | |
# pandas behavior is to use the fields from the first tuple | |
# as the column names | |
columns = data[0]._fields | |
values = itertools.zip_longest(*data) | |
data = dict(zip(columns, values)) | |
self._init_from_dict_like( | |
data, index=index, nan_as_null=nan_as_null | |
) | |
if ( | |
len(data) > 0 | |
and columns is None | |
and isinstance(data[0], tuple) | |
and hasattr(data[0], "_fields") | |
): | |
# pandas behavior is to use the fields from the first tuple | |
# as the column names | |
columns = data[0]._fields |
Will just doing this work? i.e., I guess the code below would be handling the data construction no? I might be missing something but from the issue I thought we were only missing out on the column names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious about this too; since this code path just assigns self.columns = columns
I wasn't sure if it was okay if columns
was a tuple or needed to be a different type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this looks to work and produces a column Index too so I changed it to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't mimic pandas behaviour exactly, and has some edge cases where it silently drops data.
/merge |
Description
Allow namedtuple's
_field
attribute to be mapped to DataFrame column labels like pandascloses #13823
Checklist