-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in pandas TestDataFrameToXArray.test_to_xarray_index_types #9661
Comments
Here's the error message from pandas's
|
cc @ilan-gold |
On it! More generally @shoyer with this extension array stuff, I would be happy for a zoom call to go over what all the various pandas adapters in the codebase (since I think they can be somewhat cut down as a lot of the code has to do with numpy conversion) and/or sound out running the pandas integration tests in this repo. We are doing that now: https://github.com/scverse/integration-testing/pull/1/files where we check out everyone's repo and then test it against the core data structure on |
@shoyer This issue is too tied up with datetimes, see: #9618. I will need to redo what I've done to work off that branch now. The issue is that pandas>2.0 has their datetime handling as extension arrays - so if we start letting in categorical indices in our indexing adapter, we let everything in, which means we break almost all converting of the datetime stuff. |
Can we explicitly cast |
This is definitely causing problems on v2024.10.0, I'm now getting an error when going from DataFrame -> DataSet -(error here)> DataArray. I'm starting with a DataFrame with a DateTime index and 20ish columns. Relevant parts of the error trace:
|
df = pd.DataFrame(
{
"sin_order_1_year": [-0.7418799885470463, -0.8171209666969853, -0.8805057639294221],
"date": [
Timestamp("2022-08-15 00:00:00"),
Timestamp("2022-08-22 00:00:00"),
Timestamp("2022-08-29 00:00:00"),
],
},
)
df = df.astype("Float64", errors="ignore")
mydataarray = xr.Dataset.from_dataframe(df.set_index("date")).to_array() the above works in |
I was just testing the above with xarray==2024.11.0 and realised it's not as easily reproducible as it could be so try this: import pandas as pd
import xarray as xr
from pandas import Timestamp
def main():
df = pd.DataFrame(
{
"sin_order_1_year": [-0.7418799885470463, -0.8171209666969853, -0.8805057639294221],
"date": [
Timestamp("2022-08-15 00:00:00"),
Timestamp("2022-08-22 00:00:00"),
Timestamp("2022-08-29 00:00:00"),
],
},
)
df = df.astype("Float64", errors="ignore")
mydataarray = xr.Dataset.from_dataframe(df.set_index("date")).to_array()
if __name__ == "__main__":
main() still failing in
|
It appears that #9520 may have broken some upstream pandas tests, specifically testing round-trips with various index types:
https://github.com/pandas-dev/pandas/blob/e78ebd3f845c086af1d71c0604701ec49df97228/pandas/tests/generic/test_to_xarray.py#L32
Here's a minimal test case:
I'm not sure if this is a pandas or xarray issue, but it's one or the other!
(My guess is that most of these tests in pandas should probably live in xarray instead, given that we implement all the conversion logic.)
Originally posted by @shoyer in #9520 (comment)
The text was updated successfully, but these errors were encountered: