-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: BitPackedCompressor allows signed arrays #1699
Conversation
Most of the work to support signed integers has been done in BitPackedArray. This PR removes some assertions and branches in the compressor to make it possible to bit-pack an array of signed ints.
Is it worth lumping in the change to disallow a no-op FoR array too? Or better yet, disallow FoR when it has minimal impact on bitpacking width. |
I've changed the condition in I did not change FoRCompressor to determine how many bits of savings is enough to make up for the extra unfor step. I don't have a good intuition on what that would be. |
I can live with excluding if min and shift are zero (as a way to short-circuit), but it's a bit unnecessary. The sampling compressor already penalizes extra array depth. So FoR(0,0) + Bitpacking on signed ints will be strictly worse than just Bitpacking and won't be chosen anyway |
By this logic we shouldn't exclude any compressors based on metrics, e.g. run-end encoding should ignore the avg_run_length. |
Yeah, I think that's correct. More importantly, we should also filter codecs based on whether E.g., if we theoretically implemented BitPackedArray::can_compress for signed ints by checking if |
…es (#1705) Following up from #1699. In the previous PR we allowed signed arrays to be bit-packed directly. However, we did not explicitly reject arrays with negative values. We **need** to do this because it is critical for ensuring we have fast `search_sorted` over BitPacked data with patches, only when the patches sort to the right-most side of the array can we do efficient binary search. I've added explicit preconditions that values are non-negative, and made the BitPackedArray constructor unsafe to make it clear to callers that they must explicitly check this themselves (the recommended safe way to create a BPA is via the `BPA::encode()` method, which returns an Error if there are negative values).
Most of the work to support signed integers has been done in BitPackedArray.
This PR removes some assertions and branches in the compressor to make it possible to bit-pack an array of signed ints.