Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas major version 2 support #601

Open
bartbroere opened this issue Sep 14, 2023 · 6 comments
Open

Pandas major version 2 support #601

bartbroere opened this issue Sep 14, 2023 · 6 comments
Labels
topic:dataframe Issue or PR about eland.DataFrame

Comments

@bartbroere
Copy link
Contributor

Last April Pandas released version 2.0.0, which introduces many breaking changes. I have been submitting some pull requests here (#596 #595 #593 #592). These fix some minor things to prepare for supporting pandas>=2.0.0. All the fixes until now do not immediately break pandas==1.5.0 support.

However, there are also some things issues that are a bit harder to upgrade to version 2, without perhaps breaking some of the previous functionality.

One such example is the fact that in aggregations such as groupby, pandas has ignored the sort parameter for a long time. Tests that compare the column order between eland and pandas will fail for either pandas 1.5.0 or pandas 2.0.0.

Is the Eland project planning a major release when starting to support pandas 2? Or will it support pandas 2 by implementing different behaviour based on runtime checks of pandas' version?

@pquentin
Copy link
Member

Ideally we should support both versions as Pandas 1.x is still generally more popular than 2.x. Thanks for all the pull requests that are moving us in the right direction. We'll have to decide when we hit more thorny issues.

@davidkyle
Copy link
Member

davidkyle commented Nov 16, 2023

Pandas requires NumPy 1.22.4 minimum version. https://pandas.pydata.org/docs/dev/getting_started/install.html#dependencies

Because Shap is incompatible with NumPy >= 1.24 (#539) we will have to pin NumPy to the range numpy>=1.22.4,<1.24 when upgrading Pandas

@pquentin
Copy link
Member

Looks like Shap is in a better shape now :) shap/shap#2943. We could probably remove the numpy pin when CI is fixed. I opened #636 for this.

@ksapkal-bmc
Copy link

Hello, for the issue that is pertaining to the pandas 2.x.x compatibility is there any development timeline that we are looking at or is there any future update/release that might fix the issue?

@pquentin
Copy link
Member

Hello! We can't share any timeline, but this is still something we would like to do. Thanks @bartbroere for all the contributions so far here.

@DaiZack
Copy link

DaiZack commented Sep 11, 2024

Upvote the issue, the pandas 2 performance is much better than the older version. Looking forward to the upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:dataframe Issue or PR about eland.DataFrame
Projects
None yet
Development

No branches or pull requests

5 participants