-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving zstd and zstd_no_dict compression codecs out of experimental #7805
Comments
@sarthakaggarwal97 thanks a lot for publishing the compression gains, could you please share the CPU / memory profiles as well? Thank you. |
+1 Latency is just one dimension, we need to understand if there are trade-offs with more cycles on compress/decompress |
Some benchmarks were also performed here https://issues.apache.org/jira/browse/LUCENE-8739 although the pull request never made it into Apache Lucene. |
@reta @Bukhtawar here are the percentage CPU utilization during indexing. The profiles were taken at a 5 minute interval during active indexing with the nyc_taxis dataset. Summary of Compression Overheads across Codecs
The numbers denote the %age CPU utilized for the compression. Please let me know if I can help with more information regarding the experiments. |
@reta I think the reason why PR never made it into Lucene was because they were looking for a pure java implementation and didn't want to use libraries with JNI bindings in the lucere-core build. |
Thanks @sarthakaggarwal97 , do you have Java heap (memory) stats? |
With respect to the implementation for this issue, there are two possible approaches I can see.
I would request the community to review the implementation approaches and which one would be preferable. Please share if there are any other approaches that can be taken into consideration. |
I like (2) of course because it's more generic and extensibly, but I think I'd also merge (1) if it gives the feature to users earlier. |
I would go with 2nd option, |
The quickest approach to make it accessible to users is to move it from |
I agree option 2 is the right long term solution. Can it be done in two steps where in option 1 is done first followed by option 2? |
The implementation and design to make the Custom Codecs pluggable would require some discussions about designs and implementation. |
Is your feature request related to a problem? Please describe.
Currently, we have the experimental support for zstd and zstd compression codec as mentioned in #3354. The request is to move the feature out of the sandbox to enable the users to create an index using the new codecs.
Describe the solution you'd like
The idea is to introduce the new compression codecs for the users by moving the current implementation out of the box. With that, we will leverage the current
index.codec
settings that can be used to specifyzstd
andzstd_no_dict
upon index creation. We will continue to support the existing zlib and lz4 codecs, with the the default aslz4
orBEST_SPEED
.There are the outcomes of the benchmarks with these new codecs:
cc: @mulugetam @backslasht @mgodwan
The text was updated successfully, but these errors were encountered: