-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26405 IntegrationTestLoadSmallValues #3802
base: master
Are you sure you want to change the base?
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Oops, fixed missing header license issue |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
So what does the tool actually verify? The correctness or the performance of a compression algorithm? |
This was used to simulate a use case with small integer values in order to prove a compression improvement. The value here, as such, is the simulation of a use case with a potentially very large set of rows comprised of small integer values / small cells. We don't need to accept this into the suite if there isn't enough value there. I have no strong opinion either way. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
This integration test emulates a use case that stores a lot of small values into a table that would likely be heavily indexed (ROW_INDEX_V1, small blocks, etc.), an application that crowdsources weather (temperature) observation data. This IT can be used to test and optimize compression settings for such cases. It comes with a companion utility, HFileBlockExtracter, which extracts block data from HFiles into a set of local files for use in training external compression dictionaries, perhaps with ZStandard's
zstd
utility.Run like:
You can also split the Loader and Verify stages:
Load with:
Verify with:
Use HFileExtractor like so:
Where options are:
You might train ZStandard dictionaries on the extracted block files like so:
(Assumes outputDir given to HFileExtractor was 't'.)
Or:
This was used to test the changes on HBASE-26353.