Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED #37940

Merged
merged 1 commit into from
Oct 3, 2023

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Sep 28, 2023

Closes #37939.

What changes are included in this PR?

This PR changes values used in the DELTA_BINARY_PACKED encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain.

Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common).

Are these changes tested?

I've included two tests that result in overflow.

Are there any user-facing changes?

No

@etseidl etseidl requested a review from wgtmac as a code owner September 28, 2023 18:12
@github-actions
Copy link

⚠️ GitHub issue #37939 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good idea!

Comment on lines 2188 to 2192
private:
constexpr T SafeSubtract(T a, T b) const {
constexpr WT mask = static_cast<WT>(static_cast<UT>(static_cast<T>(-1)));
return static_cast<T>((static_cast<WT>(a) - static_cast<WT>(b)) & mask);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this maybe better fit in int_util_overflow.h?

Copy link
Contributor Author

@etseidl etseidl Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds like a better location. Thanks!

Edit: since it needs ::arrow::internal::int128_t I don't think it can go in int_util_overflow.h :(

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Sep 28, 2023
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think we don't need an extra SafeSignedSubstractSigned for some original unsigned value. But this patch is great. I'll checkout parquet-mr and arrow-rs impl here

// making subtraction operations well-defined and correct even in case of overflow.
// Encoded integers will wrap back around on decoding.
// See http://en.wikipedia.org/wiki/Modular_arithmetic#Integers_modulo_n
deltas_[values_current_block_] = value - current_value_;
deltas_[values_current_block_] = SafeSignedSubtractSigned(value, current_value_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a static_cast<T>(value - current_value_); be ok here? Originally, value and current_value_ is unsigned, doing unsigned op would not require an wider type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, we don't need a new safe subtract. The existing SafeSignedSubtract does the subtraction with unsigned to get the correct wrapping behavior, and then casts back to signed. No need for promotion to the larger type. I've changed to SafeSignedSubtract everywhere.

Comment on lines 2258 to 2278
const auto bit_width = bit_width_data[i] =
bit_util::NumRequiredBits(max_delta - min_delta);
const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
static_cast<UT>(SafeSignedSubtractSigned(max_delta, min_delta)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not fully understand here. If the sequence is {1, 0, -1, 0, 1, 0, -1, 0, 1}

Previously, the value glows large because max_delta - min_delta. The max_delta should be 1, however, it turns to be a huge unsigned int, and the min_delta is 1. The result is correct, however it waste huge space?

Copy link
Contributor Author

@etseidl etseidl Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. Leaving everything unsigned, we have (unsigned)(0 - 1) = 0xffffffffU. min_delta == 1, max_delta == 0xffffffffU, so max_delta - min_delta == 0xfffffffeU, which requires the full 32 bits to encode.

The end result is correct, but uses more space than it needs to.

@mapleFU
Copy link
Member

mapleFU commented Sep 29, 2023

Seems CI failed because some weird problem, would you mind retry and trigger these CI?

@mapleFU
Copy link
Member

mapleFU commented Sep 29, 2023

Oh after check arrow-rs( https://github.com/apache/arrow-rs/blob/471f6dd2911d8328ca56efe2f685e08c0a3fb8c8/parquet/src/encodings/encoding/mod.rs#L275 ) and parquet-mr ( See DeltaBinaryPackingValuesWriterForLong.flushBlockBuffer ), so we can just move forward, great

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General looks good to me, it would be great to introducing this! Also @pitrou @wgtmac for review

const auto bit_width = bit_width_data[i] =
bit_util::NumRequiredBits(max_delta - min_delta);
const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
static_cast<UT>(SafeSignedSubtract(max_delta, min_delta)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can add a DCHECK for SafeSignedSubtract(max_delta, min_delta) >= 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we would want that. It's ok for the subtraction to wrap around and become negative, we just cast it back to unsigned. On the decoding side it will wrap around again and yield the correct result. The only reason to use SafeSignedSubtract is to get well defined wrapping behavior, which signed subtraction does not give us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, {0, 0, INT32_MIN, INT32_MAX} might get SafeSignedSubtract(max_delta, min_delta) < 0.

@@ -1413,6 +1413,68 @@ TEST_F(TestLargeStringParquetIO, Basics) {
this->RoundTripSingleColumn(large_array, large_array, arrow_properties);
}

using TestDeltaBinaryPacked32ParquetIO = TestParquetIO<::arrow::Int32Type>;

TEST_F(TestDeltaBinaryPacked32ParquetIO, DeltaBinaryPacked) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test looks good to me, but can we move them just to encoding_test.cc?

Copy link
Contributor Author

@etseidl etseidl Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Since the tests are identical except for the typing, is it worth the effort to make this a typed test suite, but for just two types?

Oh, I see there is one in encoding_test already. Thanks for pointing me there @mapleFU

@mapleFU
Copy link
Member

mapleFU commented Oct 3, 2023

(This looks ok to me now, but maybe it's better to rebase, since lots of flaky unit test is already fixed)

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for noticing and fixing this! LGTM on the principle, here are a few more comments.

ASSERT_EQ(num_values, values_decoded);
ASSERT_NO_FATAL_FAILURE(
VerifyResults<T>(decoded.data(), int_values.data(), num_values));
}
Copy link
Member

@pitrou pitrou Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR also fixes the encoded data size, can you add a test for that? For example check that the encoded buffer size is equal to a certain value given data that would have triggered the bug, such as the data in #37939 (comment)

Comment on lines 2261 to 2262
const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
static_cast<UT>(SafeSignedSubtract(max_delta, min_delta)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that SafeSignedSubtract simply does the substraction in the unsigned domain, so you could also write static_cast<UT>(max_delta) - static_cast<UT>(min_delta).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Nice catch

@@ -1634,6 +1634,41 @@ TYPED_TEST(TestDeltaBitPackEncoding, NonZeroPaddedMiniblockBitWidth) {
}
}

TYPED_TEST(TestDeltaBitPackEncoding, DeltaBitPackedWrapping) {
using T = typename TypeParam::c_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment refering to the GH issue?

@pitrou
Copy link
Member

pitrou commented Oct 3, 2023

@etseidl Did you try to read data generated by this patch using other Parquet implementations such as parquet-mr?

@etseidl
Copy link
Contributor Author

etseidl commented Oct 3, 2023

@etseidl Did you try to read data generated by this patch using other Parquet implementations such as parquet-mr?

The data I posted in the issue was readable by arrow-rs and parquet-mr, but I can do a bigger test of that with more varied data.

@pitrou
Copy link
Member

pitrou commented Oct 3, 2023

The data I posted in the issue was readable by arrow-rs and parquet-mr, but I can do a bigger test of that with more varied data.

No need to. Thanks for the answer!

@etseidl
Copy link
Contributor Author

etseidl commented Oct 3, 2023

Thank you all for your help with this! Should I rebase and force push now? And should I update the description to match the current state of the patch (as it no longer promotes to a larger bit width)? @mapleFU @pitrou

@pitrou
Copy link
Member

pitrou commented Oct 3, 2023

Yes, please rebase and update the description to match the contents!

@mapleFU
Copy link
Member

mapleFU commented Oct 3, 2023

I'm interested in performance it can bring. Verified decoding benchmark in MacOS M1Pro with RelWithDebugInfo and O2.

Encoding doesn't changed a lot, for Decoding, I guess previously we tent to use 32bits and 64bits, which is waste of space but a benefit for decoding. Smaller size would just make decoding a bit smaller. However I think we should merge this patch, it can make DELTA_BINARY_PACKED much more smaller in most cases, and be much better in some really "Delta" cases.

Encode

Before:

BM_DeltaBitPackingEncode_Int32_Fixed/1024         7388 ns         6727 ns        93419 bytes_per_second=580.658M/s items_per_second=152.216M/s
BM_DeltaBitPackingEncode_Int32_Fixed/4096        46325 ns        27672 ns        27080 bytes_per_second=564.655M/s items_per_second=148.021M/s
BM_DeltaBitPackingEncode_Int32_Fixed/32768      221001 ns       205395 ns         3243 bytes_per_second=608.584M/s items_per_second=159.537M/s
BM_DeltaBitPackingEncode_Int32_Fixed/65536      531356 ns       423160 ns         1679 bytes_per_second=590.793M/s items_per_second=154.873M/s
BM_DeltaBitPackingEncode_Int64_Fixed/1024         8327 ns         6813 ns       105758 bytes_per_second=1.11987G/s items_per_second=150.306M/s
BM_DeltaBitPackingEncode_Int64_Fixed/4096        32330 ns        27128 ns        25970 bytes_per_second=1.12495G/s items_per_second=150.988M/s
BM_DeltaBitPackingEncode_Int64_Fixed/32768      239290 ns       210625 ns         3289 bytes_per_second=1.15913G/s items_per_second=155.575M/s
BM_DeltaBitPackingEncode_Int64_Fixed/65536      417320 ns       405689 ns         1687 bytes_per_second=1.20359G/s items_per_second=161.543M/s
BM_DeltaBitPackingEncode_Int32_Narrow/1024        8008 ns         7857 ns        91353 bytes_per_second=497.153M/s items_per_second=130.326M/s
BM_DeltaBitPackingEncode_Int32_Narrow/4096       29652 ns        29567 ns        23353 bytes_per_second=528.467M/s items_per_second=138.534M/s
BM_DeltaBitPackingEncode_Int32_Narrow/32768     274176 ns       258310 ns         2694 bytes_per_second=483.915M/s items_per_second=126.856M/s
BM_DeltaBitPackingEncode_Int32_Narrow/65536     600979 ns       557093 ns         1000 bytes_per_second=448.758M/s items_per_second=117.639M/s
BM_DeltaBitPackingEncode_Int64_Narrow/1024       10400 ns        10082 ns        68428 bytes_per_second=774.872M/s items_per_second=101.564M/s
BM_DeltaBitPackingEncode_Int64_Narrow/4096       48600 ns        46734 ns        14683 bytes_per_second=668.678M/s items_per_second=87.645M/s
BM_DeltaBitPackingEncode_Int64_Narrow/32768     372571 ns       358108 ns         1978 bytes_per_second=698.113M/s items_per_second=91.5031M/s
BM_DeltaBitPackingEncode_Int64_Narrow/65536     693363 ns       687021 ns         1029 bytes_per_second=727.779M/s items_per_second=95.3915M/s
BM_DeltaBitPackingEncode_Int32_Wide/1024          8086 ns         7889 ns        90200 bytes_per_second=495.166M/s items_per_second=129.805M/s
BM_DeltaBitPackingEncode_Int32_Wide/4096         31668 ns        30423 ns        23291 bytes_per_second=513.592M/s items_per_second=134.635M/s
BM_DeltaBitPackingEncode_Int32_Wide/32768       269229 ns       262281 ns         2667 bytes_per_second=476.588M/s items_per_second=124.935M/s
BM_DeltaBitPackingEncode_Int32_Wide/65536       517646 ns       506281 ns         1395 bytes_per_second=493.797M/s items_per_second=129.446M/s
BM_DeltaBitPackingEncode_Int64_Wide/1024         10090 ns        10087 ns        69206 bytes_per_second=774.544M/s items_per_second=101.521M/s
BM_DeltaBitPackingEncode_Int64_Wide/4096         46402 ns        46005 ns        15212 bytes_per_second=679.276M/s items_per_second=89.0341M/s
BM_DeltaBitPackingEncode_Int64_Wide/32768       361227 ns       356360 ns         1967 bytes_per_second=701.538M/s items_per_second=91.952M/s
BM_DeltaBitPackingEncode_Int64_Wide/65536       687265 ns       687060 ns         1013 bytes_per_second=727.738M/s items_per_second=95.3861M/s

After:

BM_DeltaBitPackingEncode_Int32_Fixed/1024         6746 ns         6622 ns       107996 bytes_per_second=589.889M/s items_per_second=154.636M/s
BM_DeltaBitPackingEncode_Int32_Fixed/4096        26207 ns        25429 ns        27415 bytes_per_second=614.466M/s items_per_second=161.078M/s
BM_DeltaBitPackingEncode_Int32_Fixed/32768      236115 ns       207114 ns         3471 bytes_per_second=603.534M/s items_per_second=158.213M/s
BM_DeltaBitPackingEncode_Int32_Fixed/65536      450868 ns       416761 ns         1668 bytes_per_second=599.864M/s items_per_second=157.251M/s
BM_DeltaBitPackingEncode_Int64_Fixed/1024         6489 ns         6484 ns       108361 bytes_per_second=1.17666G/s items_per_second=157.929M/s
BM_DeltaBitPackingEncode_Int64_Fixed/4096        25210 ns        25206 ns        27800 bytes_per_second=1.21072G/s items_per_second=162.5M/s
BM_DeltaBitPackingEncode_Int64_Fixed/32768      202326 ns       202064 ns         3460 bytes_per_second=1.20823G/s items_per_second=162.166M/s
BM_DeltaBitPackingEncode_Int64_Fixed/65536      403463 ns       403353 ns         1743 bytes_per_second=1.21056G/s items_per_second=162.478M/s
BM_DeltaBitPackingEncode_Int32_Narrow/1024        7066 ns         7062 ns        99590 bytes_per_second=553.105M/s items_per_second=144.993M/s
BM_DeltaBitPackingEncode_Int32_Narrow/4096       26993 ns        26980 ns        26047 bytes_per_second=579.125M/s items_per_second=151.814M/s
BM_DeltaBitPackingEncode_Int32_Narrow/32768     232130 ns       227611 ns         3087 bytes_per_second=549.182M/s items_per_second=143.965M/s
BM_DeltaBitPackingEncode_Int32_Narrow/65536     445752 ns       444218 ns         1574 bytes_per_second=562.787M/s items_per_second=147.531M/s
BM_DeltaBitPackingEncode_Int64_Narrow/1024        6998 ns         6994 ns       100485 bytes_per_second=1116.97M/s items_per_second=146.404M/s
BM_DeltaBitPackingEncode_Int64_Narrow/4096       26963 ns        26955 ns        25345 bytes_per_second=1.13219G/s items_per_second=151.96M/s
BM_DeltaBitPackingEncode_Int64_Narrow/32768     224846 ns       223845 ns         3141 bytes_per_second=1116.85M/s items_per_second=146.387M/s
BM_DeltaBitPackingEncode_Int64_Narrow/65536     440290 ns       440284 ns         1591 bytes_per_second=1.10901G/s items_per_second=148.849M/s
BM_DeltaBitPackingEncode_Int32_Wide/1024          7925 ns         7923 ns        87495 bytes_per_second=493.032M/s items_per_second=129.245M/s
BM_DeltaBitPackingEncode_Int32_Wide/4096         30254 ns        30251 ns        23146 bytes_per_second=516.518M/s items_per_second=135.402M/s
BM_DeltaBitPackingEncode_Int32_Wide/32768       256295 ns       256292 ns         2714 bytes_per_second=487.725M/s items_per_second=127.854M/s
BM_DeltaBitPackingEncode_Int32_Wide/65536       507129 ns       500402 ns         1399 bytes_per_second=499.598M/s items_per_second=130.967M/s
BM_DeltaBitPackingEncode_Int64_Wide/1024         10406 ns        10405 ns        67149 bytes_per_second=750.859M/s items_per_second=98.4165M/s
BM_DeltaBitPackingEncode_Int64_Wide/4096         45677 ns        45439 ns        15300 bytes_per_second=687.734M/s items_per_second=90.1427M/s
BM_DeltaBitPackingEncode_Int64_Wide/32768       346803 ns       346780 ns         2019 bytes_per_second=720.919M/s items_per_second=94.4923M/s
BM_DeltaBitPackingEncode_Int64_Wide/65536       710268 ns       705693 ns          964 bytes_per_second=708.523M/s items_per_second=92.8676M/s

Decode

Before:

Run on (10 X 24.1205 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 11.76, 8.38, 5.93
------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_DeltaBitPackingDecode_Int32_Fixed/1024         1170 ns         1140 ns       617551 bytes_per_second=3.34478G/s items_per_second=897.857M/s
BM_DeltaBitPackingDecode_Int32_Fixed/4096         4223 ns         4111 ns       171441 bytes_per_second=3.71136G/s items_per_second=996.261M/s
BM_DeltaBitPackingDecode_Int32_Fixed/32768       33916 ns        32218 ns        22054 bytes_per_second=3.78883G/s items_per_second=1017.06M/s
BM_DeltaBitPackingDecode_Int32_Fixed/65536       65659 ns        64014 ns        11133 bytes_per_second=3.81385G/s items_per_second=1023.77M/s
BM_DeltaBitPackingDecode_Int64_Fixed/1024         1170 ns         1124 ns       630523 bytes_per_second=6.7877G/s items_per_second=911.029M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096         3725 ns         3723 ns       188036 bytes_per_second=8.19693G/s items_per_second=1.10017G/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768       29321 ns        29308 ns        23828 bytes_per_second=8.33016G/s items_per_second=1.11805G/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536       58346 ns        58318 ns        11880 bytes_per_second=8.37271G/s items_per_second=1.12377G/s
BM_DeltaBitPackingDecode_Int32_Narrow/1024        1068 ns         1068 ns       664976 bytes_per_second=3.5725G/s items_per_second=958.986M/s
BM_DeltaBitPackingDecode_Int32_Narrow/4096        4082 ns         4028 ns       176849 bytes_per_second=3.78799G/s items_per_second=1016.83M/s
BM_DeltaBitPackingDecode_Int32_Narrow/32768      31969 ns        31932 ns        22023 bytes_per_second=3.82279G/s items_per_second=1026.17M/s
BM_DeltaBitPackingDecode_Int32_Narrow/65536      63682 ns        63643 ns        11016 bytes_per_second=3.83611G/s items_per_second=1029.75M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024         934 ns          932 ns       748351 bytes_per_second=8.18632G/s items_per_second=1098.75M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096        3404 ns         3403 ns       202131 bytes_per_second=8.96808G/s items_per_second=1.20367G/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768      29199 ns        29196 ns        24017 bytes_per_second=8.36203G/s items_per_second=1.12233G/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536      57770 ns        57768 ns        12077 bytes_per_second=8.45249G/s items_per_second=1.13447G/s
BM_DeltaBitPackingDecode_Int32_Wide/1024          1087 ns         1087 ns       643548 bytes_per_second=3.50993G/s items_per_second=942.189M/s
BM_DeltaBitPackingDecode_Int32_Wide/4096          4087 ns         4087 ns       172062 bytes_per_second=3.73328G/s items_per_second=1002.15M/s
BM_DeltaBitPackingDecode_Int32_Wide/32768        32799 ns        32797 ns        21498 bytes_per_second=3.72199G/s items_per_second=999.114M/s
BM_DeltaBitPackingDecode_Int32_Wide/65536        65459 ns        65453 ns        10717 bytes_per_second=3.73004G/s items_per_second=1001.27M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          1016 ns         1016 ns       687198 bytes_per_second=7.5105G/s items_per_second=1008.04M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096          3742 ns         3742 ns       187931 bytes_per_second=8.15518G/s items_per_second=1094.57M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768        31511 ns        31509 ns        22198 bytes_per_second=7.74827G/s items_per_second=1039.96M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536        62441 ns        62433 ns        11171 bytes_per_second=7.82092G/s items_per_second=1049.71M/s

After:

------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_DeltaBitPackingDecode_Int32_Fixed/1024         1524 ns         1210 ns       582799 bytes_per_second=3.15193G/s items_per_second=846.089M/s
BM_DeltaBitPackingDecode_Int32_Fixed/4096         6155 ns         4283 ns       154236 bytes_per_second=3.56298G/s items_per_second=956.43M/s
BM_DeltaBitPackingDecode_Int32_Fixed/32768       40811 ns        32998 ns        20873 bytes_per_second=3.69928G/s items_per_second=993.019M/s
BM_DeltaBitPackingDecode_Int32_Fixed/65536       73049 ns        64550 ns        10534 bytes_per_second=3.78218G/s items_per_second=1015.27M/s
BM_DeltaBitPackingDecode_Int64_Fixed/1024         1164 ns         1119 ns       628761 bytes_per_second=6.81583G/s items_per_second=914.806M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096         3824 ns         3780 ns       186756 bytes_per_second=8.07281G/s items_per_second=1083.51M/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768       30175 ns        29516 ns        24098 bytes_per_second=8.2714G/s items_per_second=1.11017G/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536       60017 ns        59018 ns        11983 bytes_per_second=8.27339G/s items_per_second=1.11044G/s
BM_DeltaBitPackingDecode_Int32_Narrow/1024        1381 ns         1378 ns       501986 bytes_per_second=2.76793G/s items_per_second=743.01M/s
BM_DeltaBitPackingDecode_Int32_Narrow/4096        5404 ns         5369 ns       131067 bytes_per_second=2.84204G/s items_per_second=762.903M/s
BM_DeltaBitPackingDecode_Int32_Narrow/32768      45844 ns        43339 ns        16220 bytes_per_second=2.81666G/s items_per_second=756.091M/s
BM_DeltaBitPackingDecode_Int32_Narrow/65536      86916 ns        84916 ns         8257 bytes_per_second=2.87509G/s items_per_second=771.777M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024        1248 ns         1159 ns       615866 bytes_per_second=6.58504G/s items_per_second=883.829M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096        4298 ns         4296 ns       162309 bytes_per_second=7.10393G/s items_per_second=953.473M/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768      35427 ns        35378 ns        19834 bytes_per_second=6.90082G/s items_per_second=926.212M/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536      70880 ns        70862 ns         9877 bytes_per_second=6.89062G/s items_per_second=924.844M/s
BM_DeltaBitPackingDecode_Int32_Wide/1024          1360 ns         1359 ns       515479 bytes_per_second=2.80606G/s items_per_second=753.246M/s
BM_DeltaBitPackingDecode_Int32_Wide/4096          5124 ns         5121 ns       136309 bytes_per_second=2.97938G/s items_per_second=799.772M/s
BM_DeltaBitPackingDecode_Int32_Wide/32768        40943 ns        40924 ns        16860 bytes_per_second=2.98282G/s items_per_second=800.695M/s
BM_DeltaBitPackingDecode_Int32_Wide/65536        82019 ns        81988 ns         8431 bytes_per_second=2.97777G/s items_per_second=799.34M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          1278 ns         1278 ns       548551 bytes_per_second=5.97075G/s items_per_second=801.38M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096          4778 ns         4777 ns       146543 bytes_per_second=6.38874G/s items_per_second=857.482M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768        38399 ns        38390 ns        18267 bytes_per_second=6.35954G/s items_per_second=853.563M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536        76813 ns        76764 ns         9079 bytes_per_second=6.36081G/s items_per_second=853.734M/s

@pitrou
Copy link
Member

pitrou commented Oct 3, 2023

I've changed the PR description and will merge if CI passes.

@pitrou
Copy link
Member

pitrou commented Oct 3, 2023

The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well.

@mapleFU
Copy link
Member

mapleFU commented Oct 3, 2023

The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well.

Yeah, but I guess you mean DELTA_BINARY_PACKED, and not DELTA_BYTE_ARRAY...

@etseidl
Copy link
Contributor Author

etseidl commented Oct 3, 2023

The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well.

Yeah, but I guess you mean DELTA_BINARY_PACKED, and not DELTA_BYTE_ARRAY...

There are 2 DELTA_BINARY_PACKED streams in DELTA_BYTE_ARRAY, so if the deltas are small and varying +/-, this could still be a big benefit. In fact, I discovered this problem while implementing a DELTA_LENGTH_BYTE_ARRAY decoder. :)

@pitrou pitrou merged commit 5514b22 into apache:main Oct 3, 2023
30 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Oct 3, 2023
@etseidl etseidl deleted the delta_binary_FoR branch October 3, 2023 19:15
@pitrou
Copy link
Member

pitrou commented Oct 3, 2023

Thanks a lot for this @etseidl . It was embarassing not to get any space-saving benefits from the encoding...

@etseidl
Copy link
Contributor Author

etseidl commented Oct 3, 2023

Thanks again @pitrou @mapleFU @rok for shepherding this PR through. It wound up much better than when I started :)

@mapleFU
Copy link
Member

mapleFU commented Oct 3, 2023

There are 2 DELTA_BINARY_PACKED streams in DELTA_BYTE_ARRAY, so if the deltas are small and varying +/-, this could still be a big benefit. In fact, I discovered this problem while implementing a DELTA_LENGTH_BYTE_ARRAY decoder. :)

Ooops, I forgot that a bit. It should have a length. So this can benefits severo encodings

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 5514b22.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 40 possible false positives for unstable benchmarks that are known to sometimes produce them.

@mapleFU
Copy link
Member

mapleFU commented Oct 7, 2023

lol, I tried this in my own dataset, the page size after encoding becomes smaller, but the size after compression even glows twice than before😂

@etseidl
Copy link
Contributor Author

etseidl commented Oct 7, 2023

lol, I tried this in my own dataset, the page size after encoding becomes smaller, but the size after compression even glows twice than before😂

Oh no 😮

@mapleFU
Copy link
Member

mapleFU commented Oct 7, 2023

I generate data using the code below, and use zstd default to compress it with 10000 values.

if i % 4 == 0:
  return i * -400;
return i * 4;

The plain size is 40000, after compression it's about 15000bytes.

The delta bit pack with previous code is a bit greater than 40000, but after compression, it's 10000bytes. And after change, the size before compression is 28000bytes, after compression is 27000bytes.

However I think the patch is great, at least we fix a bad problem here...🤔

@pitrou
Copy link
Member

pitrou commented Oct 7, 2023

@mapleFU Interesting, thank you. I think this shows that the DELTA encodings should be used with care, only if the data is very well-suited to them (for example integers with a small range of values). Otherwise, generic and fast compressors such as Lz4 and Zstd are probably a better choice.

@wgtmac
Copy link
Member

wgtmac commented Oct 7, 2023

Sorry for the late reply due to a long holiday vacation.

IIRC, DELTA_BINARY_PACKED encoding was inspired by FastPFor. However, FastPFor was designed mainly for positive integers. DELTA_BINARY_PACKED encoding even does not follow what FastPFor does for exception numbers (like the negative number exemplified by @mapleFU ).

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…en encoding DELTA_BINARY_PACKED (apache#37940)

Closes apache#37939. 

### What changes are included in this PR?

This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain.

Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common).

### Are these changes tested?
I've included two tests that result in overflow.

### Are there any user-facing changes?
No

* Closes: apache#37939

Authored-by: seidl <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…en encoding DELTA_BINARY_PACKED (apache#37940)

Closes apache#37939. 

### What changes are included in this PR?

This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain.

Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common).

### Are these changes tested?
I've included two tests that result in overflow.

### Are there any user-facing changes?
No

* Closes: apache#37939

Authored-by: seidl <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…en encoding DELTA_BINARY_PACKED (apache#37940)

Closes apache#37939. 

### What changes are included in this PR?

This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain.

Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common).

### Are these changes tested?
I've included two tests that result in overflow.

### Are there any user-facing changes?
No

* Closes: apache#37939

Authored-by: seidl <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder
5 participants