feat: BitPackedCompressor allows signed arrays #1699

a10y · 2024-12-16T23:15:13Z

Most of the work to support signed integers has been done in BitPackedArray.

This PR removes some assertions and branches in the compressor to make it possible to bit-pack an array of signed ints.

Most of the work to support signed integers has been done in BitPackedArray. This PR removes some assertions and branches in the compressor to make it possible to bit-pack an array of signed ints.

gatesn · 2024-12-16T23:51:02Z

Is it worth lumping in the change to disallow a no-op FoR array too?

Or better yet, disallow FoR when it has minimal impact on bitpacking width.

a10y · 2024-12-17T00:28:02Z

I've changed the condition in FoRCompressor::can_compress to now always reject if min=0 and shift=0. Previously if the type was signed it would not bail.

I did not change FoRCompressor to determine how many bits of savings is enough to make up for the extra unfor step. I don't have a good intuition on what that would be.

docs/quickstart.rst

lwwmanning · 2024-12-17T16:15:28Z

I've changed the condition in FoRCompressor::can_compress to now always reject if min=0 and shift=0. Previously if the type was signed it would not bail.

I did not change FoRCompressor to determine how many bits of savings is enough to make up for the extra unfor step. I don't have a good intuition on what that would be.

I can live with excluding if min and shift are zero (as a way to short-circuit), but it's a bit unnecessary. The sampling compressor already penalizes extra array depth. So FoR(0,0) + Bitpacking on signed ints will be strictly worse than just Bitpacking and won't be chosen anyway

gatesn · 2024-12-17T16:20:57Z

I've changed the condition in FoRCompressor::can_compress to now always reject if min=0 and shift=0. Previously if the type was signed it would not bail.
I did not change FoRCompressor to determine how many bits of savings is enough to make up for the extra unfor step. I don't have a good intuition on what that would be.

I can live with excluding if min and shift are zero (as a way to short-circuit), but it's a bit unnecessary. The sampling compressor already penalizes extra array depth. So FoR(0,0) + Bitpacking on signed ints will be strictly worse than just Bitpacking and won't be chosen anyway

By this logic we shouldn't exclude any compressors based on metrics, e.g. run-end encoding should ignore the avg_run_length.

encodings/fastlanes/src/bitpacking/mod.rs

lwwmanning · 2024-12-17T16:25:35Z

I've changed the condition in FoRCompressor::can_compress to now always reject if min=0 and shift=0. Previously if the type was signed it would not bail.
I did not change FoRCompressor to determine how many bits of savings is enough to make up for the extra unfor step. I don't have a good intuition on what that would be.

I can live with excluding if min and shift are zero (as a way to short-circuit), but it's a bit unnecessary. The sampling compressor already penalizes extra array depth. So FoR(0,0) + Bitpacking on signed ints will be strictly worse than just Bitpacking and won't be chosen anyway

By this logic we shouldn't exclude any compressors based on metrics, e.g. run-end encoding should ignore the avg_run_length.

Yeah, I think that's correct. More importantly, we should also filter codecs based on whether can_compress the full array, THEN do the sampling compression, then compress-like the best on sample. Right now, we check can_compress on the sample, which may fail on the full array, causing a resample... which could fail arbitrarily many times.

E.g., if we theoretically implemented BitPackedArray::can_compress for signed ints by checking if min >= 0 on an array with 1 million integers, of which only 1 was negative, we'd expect it to have to resample ~1000 times on average in order to successfully compress.

…es (#1705) Following up from #1699. In the previous PR we allowed signed arrays to be bit-packed directly. However, we did not explicitly reject arrays with negative values. We **need** to do this because it is critical for ensuring we have fast `search_sorted` over BitPacked data with patches, only when the patches sort to the right-most side of the array can we do efficient binary search. I've added explicit preconditions that values are non-negative, and made the BitPackedArray constructor unsafe to make it clear to callers that they must explicitly check this themselves (the recommended safe way to create a BPA is via the `BPA::encode()` method, which returns an Error if there are negative values).

a10y added 2 commits December 16, 2024 18:13

feat: BitPackedArray supports signed arrays

2c78c23

Most of the work to support signed integers has been done in BitPackedArray. This PR removes some assertions and branches in the compressor to make it possible to bit-pack an array of signed ints.

fix filter

49d217e

a10y added 2 commits December 16, 2024 19:22

some fixes

a56b05e

FoR rejects min=0 shift=0 for all ptypes now

c77e6a2

fix pydocs

cd3279b

a10y marked this pull request as ready for review December 17, 2024 00:43

a10y commented Dec 17, 2024

View reviewed changes

docs/quickstart.rst Outdated Show resolved Hide resolved

fixy

eea1422

lwwmanning reviewed Dec 17, 2024

View reviewed changes

encodings/fastlanes/src/bitpacking/mod.rs Show resolved Hide resolved

fix own comment

87e042b

lwwmanning enabled auto-merge (squash) December 17, 2024 16:41

lwwmanning approved these changes Dec 17, 2024

View reviewed changes

lwwmanning merged commit 8e0e25c into develop Dec 17, 2024
19 checks passed

lwwmanning deleted the aduffy/bitpacked-pos-signed branch December 17, 2024 16:57

a10y mentioned this pull request Dec 17, 2024

fix: BitPackedArray enforces can only be built over non-negative values #1705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: BitPackedCompressor allows signed arrays #1699

feat: BitPackedCompressor allows signed arrays #1699

a10y commented Dec 16, 2024

gatesn commented Dec 16, 2024

a10y commented Dec 17, 2024 •

edited

Loading

lwwmanning commented Dec 17, 2024

gatesn commented Dec 17, 2024

lwwmanning commented Dec 17, 2024 •

edited

Loading

feat: BitPackedCompressor allows signed arrays #1699

feat: BitPackedCompressor allows signed arrays #1699

Conversation

a10y commented Dec 16, 2024

gatesn commented Dec 16, 2024

a10y commented Dec 17, 2024 • edited Loading

lwwmanning commented Dec 17, 2024

gatesn commented Dec 17, 2024

lwwmanning commented Dec 17, 2024 • edited Loading

a10y commented Dec 17, 2024 •

edited

Loading

lwwmanning commented Dec 17, 2024 •

edited

Loading