-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: retain extension dtypes in transpose #28048
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty edge-cases, but can you check the output of
In [15]: df = pd.DataFrame({"A": pd.Categorical([1, 2]), "B": pd.Categorical([2, 1])})
In [16]: df.dtypes
Out[16]:
A category
B category
dtype: object
especially check the values? Those are considered the same dtype since the categories are unordered.
It looks like df.dtypes[0] == df.dtypes[1] == df.T.dtypes[0] == df.T.dtypes[1]. Is this the desired behavior? |
I believe so. |
Could you add a test case for unordered categories with differing orders? #28048 (review) |
will do |
pandas/core/generic.py
Outdated
# TODO: this can be made cleaner if/when (N, 1) EA are allowed | ||
dtype = self.dtypes.iloc[0] | ||
for col in result.columns: | ||
result[col] = result[col].astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pattern is very unusual, we normally don't use repeated setting like this. rather we create an array / dict in a comprehension then construct. for EA this might be ok (as we have a single EA per block), but would like not to use this patter anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also override transpose in frame.py rather than do it here (and maybe just move the current one to Series) as they don't really share code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this is sub-optimal. Ideally this will be made unnecessary by 2D EAs before too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does result[col] = result[col].astype(dtype)
not work? I don't recall what we do there when there are multiple blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand the question; result[col] = result[col].astype(dtype)
is exactly what this PR does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I meant result = result.astype(dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. updated
also see if you can find any issues this solves (e.g. likely Datetime w/timezone transpose we have a couple of issues IIRC) & needs a whatsnew note. |
Closing to clear the queue. The Right Way to do this will be to allow 2D EA. |
reopened and rebase |
return self._constructor(new_values, **new_axes).__finalize__(self) | ||
result = self._constructor(new_values, **new_axes).__finalize__(self) | ||
|
||
if self.ndim == 2 and self._is_homogeneous_type and len(self.columns): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its worthwhile to make a method to encapsulate this maybe
def _homogeneous_dtype(self):
# return the single dtype if homogeneous, None if not
closing in favor of #30091. |
I'll have to look through the issues to see what this closes.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff