Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should distributed deal with mismatches between Python or package versions? #7017

Open
hendrikmakait opened this issue Sep 7, 2022 · 1 comment
Labels
discussion Discussing a topic with no specific actions yet

Comments

@hendrikmakait
Copy link
Member

hendrikmakait commented Sep 7, 2022

There are currently a few issues with the way we handle mismatches between Python or package versions:

This raises the question how distributed should deal with version mismatches in general and how tolerant it should be, if at all. My current thoughts are that we should be strict about Python, required and optional packages again due to #7016 and the fragility of using different versions of dask-related packages. I don't know if we should necessarily raise per default in Client._ensure_connected, but we should at least make sure that we can raise using a flag (similar to Client.get_versions(check=True)).

Notes

Cloudpickle can only be used to send objects between the exact same version of Python.

https://github.com/cloudpipe/cloudpickle

@shughes-uk
Copy link
Contributor

shughes-uk commented Sep 21, 2022

I'd be happy if everything exploded with a sensible error on mismatches for critical packages (and python). The errors that can creep in otherwise are a nightmare to debug if you haven't encountered them before. The more devious ones can be damn near impossible for anyone.

I suspect most people on the Dask dev team can recognize the usual suspects near instantly (a deserialization exception is usually a dead giveaway) , but for a new dask user they can be totally inscrutable.

I'm also very happy to walk people through the internals of the Coiled package sync feature and how it does package version detection too, it can be non-trivial. I believe right now dask is kinda cheating using things<pkg>.__version__ and a lot of try: except:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussing a topic with no specific actions yet
Projects
None yet
Development

No branches or pull requests

2 participants