Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Column level names lost when constructing DataFrame #13740

Closed
shwina opened this issue Jul 24, 2023 · 3 comments
Closed

[BUG] Column level names lost when constructing DataFrame #13740

shwina opened this issue Jul 24, 2023 · 3 comments
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@shwina
Copy link
Contributor

shwina commented Jul 24, 2023

When passing a MultIiIndex with level names as the columns= argument to a DataFrame, the level names are lost:

import numpy as np
import cudf

columns = cudf.MultiIndex.from_tuples(
    [
        ("A", "cat"),
        ("B", "dog"),
        ("B", "cat"),
        ("A", "dog"),
    ],
    names=["exp", "animal"],
)
df = cudf.DataFrame(np.random.randn(8, 4), columns=columns)
print(columns.names)
print(df.columns.names)

Output:

['exp', 'animal']
[None]
@shwina shwina added bug Something isn't working Needs Triage Need team to review and classify Python Affects Python cuDF API. labels Jul 24, 2023
@wence-
Copy link
Contributor

wence- commented Jul 26, 2023

Possibly like so:

diff --git a/python/cudf/cudf/core/dataframe.py b/python/cudf/cudf/core/dataframe.py
index 0fe8949090..318f66b62c 100644
--- a/python/cudf/cudf/core/dataframe.py
+++ b/python/cudf/cudf/core/dataframe.py
@@ -722,6 +722,10 @@ class DataFrame(IndexedFrame, Serializable, GetAttrGetItemMixin):
 
         if dtype:
             self._data = self.astype(dtype)._data
+        # Fix up
+        if isinstance(columns, pd.MultiIndex):
+            self._data.multiindex = True
+            self._data._level_names = tuple(columns.names)
 
     @_cudf_nvtx_annotate
     def _init_from_series_list(self, data, columns, index):

This is inappropriate intimacy, so is not a good fix. The issue with the DataFrame constructor is that it has ad-hoc handling of the columns argument that depends very much on the type of the data argument. Some code-paths look at the columns properly and produce the appropriate ColumnAccessor, others don't.

@galipremsagar galipremsagar self-assigned this Jul 26, 2023
@galipremsagar galipremsagar removed the Needs Triage Need team to review and classify label Jul 26, 2023
@galipremsagar
Copy link
Contributor

Something to note, this is an issue with regular Index in various code-flows. I have a PR coming up soon to cover them

rapids-bot bot pushed a commit that referenced this issue Jul 28, 2023
This PR preserves column names in various APIs by retaining `self._data._level_names` and also calculating when to preserve the column names.
Fixes: #13741, #13740

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ashwin Srinath (https://github.com/shwina)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #13772
@galipremsagar
Copy link
Contributor

Fixed by #13772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

3 participants