-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: when using id_vars in .melt(), the id_vars does not recognize the column name in a multiIndex dataframe #60018
Comments
breaking change : #55948 |
It does not look like this is supported. Two things -
Therefore, following might be the correct way to go about this - data.columns.names = ['Attribute', 'Ticker']
df.melt(id_vars= [('Date', '')]).rename(columns={('Date', ''): 'Date'}) Output -
|
@ivan-marroquin - can you post a reproducible example that does not depend on |
@rhshadrach I could reproduce it like this. Worked on pandas 2.1.4, but not on 2.2.0
Output in 2.1.4:
Output in 2.2.0:
|
In various places, we do allow df = pd.DataFrame({"x": ["a","a","b","b"], "y": ["c","d","c","d"], "z": [1,2,3,4]}).set_index("x")
df.columns = pd.MultiIndex.from_tuples([("y", "i"), ("y", "j")])
df = df.reset_index()
print(df.columns.get_indexer(pd.Index(["x"])))
# [-1] If we returned cc @mroeschke |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
KeyError Traceback (most recent call last)
Cell In [19], line 15
11 data= data.reset_index()
13 # melt the DataFrame to make it long format where each row is a
14 # unique combination of Date, Ticker, and attributes
---> 15 data_melted= data.melt(id_vars= ['Date'], var_name= ['Attribute', 'Ticker'])
17 # pivot the melted DataFrame to have the attributes (Open, High, Low, etc.) as columns
18 data_pivoted= data_melted.pivot_table(index= ['Date', 'Ticker'],
19 columns= 'Attribute', values= 'value',
20 aggfunc= 'first')
File ~/python_3.9.0/lib/python3.9/site-packages/pandas/core/frame.py:9942, in DataFrame.melt(self, id_vars, value_vars, var_name, value_name, col_level, ignore_index)
9932 @appender(_shared_docs["melt"] % {"caller": "df.melt(", "other": "melt"})
9933 def melt(
9934 self,
(...)
9940 ignore_index: bool = True,
9941 ) -> DataFrame:
-> 9942 return melt(
9943 self,
9944 id_vars=id_vars,
9945 value_vars=value_vars,
9946 var_name=var_name,
9947 value_name=value_name,
...
77 )
78 if value_vars_was_not_none:
79 frame = frame.iloc[:, algos.unique(idx)]
KeyError: "The following id_vars or value_vars are not present in the DataFrame: ['Date']"
Expected Behavior
The melt function should generate a DataFrame to make it long format where each row is a unique combination of Date, Ticker, and attributes
Installed Versions
pandas : 2.2.2
numpy : 1.23.3
pytz : 2024.2
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.3.1
Cython : 0.29.32
pytest : None
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.1
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.13.1
sqlalchemy : 2.0.31
tables : None
tabulate : 0.9.0
xarray : 2024.6.0
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
Python 3.9.0
The text was updated successfully, but these errors were encountered: