-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI][Python] Tests with pandas nightlies have been failing for the last days witn NotImplementedError on S3 dtype #39437
Comments
cc @AlenkaF @jorisvandenbossche I am unsure if this was opened before but I haven't been able to find a related issue. |
This is a pandas regression known from version
arrow/python/pyarrow/tests/test_pandas.py Lines 3098 to 3101 in de3130e
Will update the |
Numpy string dtype error is fixed with #39498, but there are still some issues:
I think there was an open issue for the deprecation warning, but am not sure. Will look for it now. |
Didn't find anything. Will look for the issue and a fix. |
The deprecation warning can be ignored for now (that's still being discussed on the pandas side), although we could update the test to not fail for it so our CI can be green otherwise. The other two items seem things we should investigate further. |
👍 |
The error in the >>> df1 = pd.DataFrame({"x": ["foo", "bar", "foo"]}, dtype="string[pyarrow]")
>>> df1 = df1.astype("category")
>>> df2 = pd.DataFrame({"x": ["foo", "bar", "foo"]})
>>> df2 = df2.astype("category")
>>> pa.array(df1["x"]).type
DictionaryType(dictionary<values=string, indices=int8, ordered=0>)
>>> pa.array(df2["x"]).type
DictionaryType(dictionary<values=string, indices=int8, ordered=0>) and for the dev version I get: >>> pa.array(df1["x"]).type
DictionaryType(dictionary<values=large_string, indices=int8, ordered=0>)
>>> pa.array(df2["x"]).type
DictionaryType(dictionary<values=string, indices=int8, ordered=0>) meaning arrow string dtype gets converted to a large string with pandas dev hence raising an error. The PR that caused the change on the pandas side: pandas-dev/pandas#56220 Will update the test to reflect the change. |
For the Parquet failure (old metadata), this seems a regression on the pandas side in how tz-aware data gets converted from arrow to pandas -> pandas-dev/pandas#56775 |
@AlenkaF @jorisvandenbossche , do you think this is a blocker? |
For CI purposes, I think it would be good to have the currently open PR in the release. But nothing in that PR is critical, it's only test changes. |
…CI build (#39498) Update version checks and assertions of pyarrow array equality for pandas failing tests on the CI: [test-conda-python-3.10-pandas-nightly](https://github.com/ursacomputing/crossbow/actions/runs/7391976015/job/20109720695) * Closes: #39437 Lead-authored-by: AlenkaF <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…CI build (#39498) Update version checks and assertions of pyarrow array equality for pandas failing tests on the CI: [test-conda-python-3.10-pandas-nightly](https://github.com/ursacomputing/crossbow/actions/runs/7391976015/job/20109720695) * Closes: #39437 Lead-authored-by: AlenkaF <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…ghtly CI build (apache#39498) Update version checks and assertions of pyarrow array equality for pandas failing tests on the CI: [test-conda-python-3.10-pandas-nightly](https://github.com/ursacomputing/crossbow/actions/runs/7391976015/job/20109720695) * Closes: apache#39437 Lead-authored-by: AlenkaF <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…ghtly CI build (apache#39498) Update version checks and assertions of pyarrow array equality for pandas failing tests on the CI: [test-conda-python-3.10-pandas-nightly](https://github.com/ursacomputing/crossbow/actions/runs/7391976015/job/20109720695) * Closes: apache#39437 Lead-authored-by: AlenkaF <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…ghtly CI build (apache#39498) Update version checks and assertions of pyarrow array equality for pandas failing tests on the CI: [test-conda-python-3.10-pandas-nightly](https://github.com/ursacomputing/crossbow/actions/runs/7391976015/job/20109720695) * Closes: apache#39437 Lead-authored-by: AlenkaF <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
The following nightly job with the pandas nightlies have been failing for the last couple of weeks:
There seems to be an issue with S3:
Component(s)
Continuous Integration, Python
The text was updated successfully, but these errors were encountered: