Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requirements: use pandas >= 2 #128

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

clbarnes
Copy link
Collaborator

@clbarnes clbarnes commented Nov 3, 2023

If pandas 2 works without much effort, we should use it as it can be much more efficient and sane with the arrow backend. If it doesn't work, we should constrain the install requirements.

@clbarnes
Copy link
Collaborator Author

clbarnes commented Nov 3, 2023

Looks like only 1 test failure, in fairly recent code and er... >2000 warnings. Probably from hitting the same code over and over.

@clbarnes clbarnes marked this pull request as ready for review November 14, 2023 14:43
@clbarnes
Copy link
Collaborator Author

clbarnes commented Nov 14, 2023

At present, navis works with both pandas 1 and 2. Locking to 2 would enable some optimisations (e.g. using the arrow backend has some speed and memory advantages), but could possibly break some downstream users' workflows. Of course, they are welcome to use an older navis version (the last pandas1-compatible version should be clearly signposted).

Implementing the optimisations conditionally based on which pandas version was installed would be a headache.

@schlegelp
Copy link
Collaborator

I don't have any strong opinions in this case. I've been using pandas >=2 for a while and haven't had any major issues - neither with navis nor any other adjacent packages. All things being equal, I'd lean towards being flexible but I guess the last 1.x.x version of pandas is now almost a year old.

Did you have anything specific in mind re speed and memory advantages?

@clbarnes
Copy link
Collaborator Author

I suppose there's no rush on this while everything is still working, but I think that if an incompatibility were to arise, pandas 1 should probably be put on the chopping block before too much effort is expended accounting for both.

Pandas' integration with arrow is a work in progress but is already faster for a number of operations. I think at the moment, pandas would just use arrow instead of numpy arrays for columns rather than directly wrapping an arrow table; I also think it doesn't directly support arrow map, struct, and list types for now (it just turns them into python dicts, dicts, and lists).

The advantage is probably more theoretical for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants