Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed considerations #44

Closed
shirish93 opened this issue Dec 9, 2015 · 3 comments
Closed

Speed considerations #44

shirish93 opened this issue Dec 9, 2015 · 3 comments

Comments

@shirish93
Copy link

Hey cloudpipe team!

I'm doing an exploratory analysis for the gensim library to potentially use cloudpickle (here's the discussion), and noticed that 'regular' cloudpickle is consistently ~8x slower than python's pickle module for pretty much all the data structures I threw at it.

Is this the expected normal behavior, or am I doing something wrong in my tests? I'm using python2.7/3.4 on windows, without C-compilers (not using the optimized versions if there are any),

Would you guys have any ideas if we could modify the module selectively for certain tasks to improve performance on the most-used features?

@rgbkrk
Copy link
Member

rgbkrk commented Dec 9, 2015

If you have fixes for optimizations, send patches right along!

This code base largely comes from the original cloud module, broken out, relicensed, and patched by both pyspark contributors and the folks who have contributed directly here.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 12, 2016

It's going to be slower than the pickle implementation of Python 3 or the cPickle of Python 2 as they are implemented in C and can be 10x faster on large Python objects composed of many sub-objects (e.g. a Python dict or list with millions of entries).

On small Python objects with few subobjects (e.g. a tuple with a couple of big strings or large numpy arrays) you should not see a significant difference in speed.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 12, 2016

If you have specific speed improvements please feel free to open PRs but let's close this issue as their is no easy general resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants