-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cramjam and py3 speed for string decode #580
Conversation
Nice! 🚀 I suppose the README may need a small updated for available algorithms and which are included as standard? Also, I don't know if you'll hit it in the future, but Anyway, if that happens I think it's a good use case for supporting something like a |
Also slightly faster, I imagine. I have not yet decided whether this PR should press further to get the uncompress_into benefit or to keep it simple at first. I'll update docs when merging. |
OK, It will take more time than I have right now to plumb in correct use of decompress_into. |
If you're using "raw" (assuming this is synonymous with "block") in fastparquet, then cramjam would need an update to support that. Currently it uses framed format by default, but raw/block can be exposed the same way raw is for snappy. |
I expect you are correct - but it should take just ten minutes of my time, whenever I can find it, to check. |
Yep, it seems that fastparquet and cramjam don't use the same lz4, which must mean the former is the "raw" variant.
Fastparquet is calling |
Also, compression passes |
Nice, thanks for the info. I'll try to have a PR to cramjam later this evening or tomorrow with the block format for lz4. |
This used to be the done thing ... before wheels! Now it's just bloat.
@milesgranger , this just touches the top cramjam functions. The best speedup would come from deciding when decompress_into can be used (since the target arrays are always pre-allocated). As it stands, I don't expect much speed difference compared to main branch.
Also, I took the chance to finally update UTF8 reading, doing the conversion to python strings in one step instead of reading bytes and then converting. Gives a 2x speedup.