-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training with ray DaskEngine to_parquet fails with TypeError: cannot pickle 'pickle5.PickleBuffer' object #1710
Comments
Would you happen to know what engine dask is using to read parquet (fastparquet vs pyarrow)? I ran into errors with pyarrow 5.0.0 and ludwig, but that was resolved after upgrading to pyarrow 6.0.1 |
I have pyarrow==6.0.1 installed. No fastparquet. Just to be sure, I downgraded pyarrow to 5.0.0 and upgraded again - no change still getting the same bug. |
@dantreiman What version of dask are you using? There seem to be pickle5 errors resolved with |
My environment has dask==2022.01.0 I tried downgrading to dask 2021.4.0 and dask 2021.10.0 - no change. My ray version is 1.9.2 I think your original idea about pyarrow is on the right track. I changed dask.py:84 to
and this saves successfully. Perhaps I have an incompatible pair of dask and pyarrow versions installed. |
I'm experiencing this same issue when I run my tests. It is failing specifically at the from_dask() function in the ray dataset constructor. I feel like the issue may not be parquet related since I'm not writing to parquet at all and still getting this error. |
Apparently uninstalling pickle5 fixes this. |
Caused by: ray-project/ray#22562 Fixed by: #1763 |
Reproducible by training any model , e.x.
examples/titanic/simple_model_training.py
If I uninstall ray or otherwise force Ludwig to use PandasEngine, training works.
With ray and dask installed, Ludwig uses DaskEngine and fails with error:
TypeError: cannot pickle 'pickle5.PickleBuffer' object
Stack trace:
The text was updated successfully, but these errors were encountered: