GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED #37940

etseidl · 2023-09-28T18:12:19Z

What changes are included in this PR?

This PR changes values used in the DELTA_BINARY_PACKED encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain.

Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common).

Are these changes tested?

I've included two tests that result in overflow.

Are there any user-facing changes?

No

Closes: [C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder #37939

github-actions · 2023-09-28T18:12:50Z

⚠️ GitHub issue #37939 has been automatically assigned in GitHub to PR creator.

rok

This seems like a good idea!

rok · 2023-09-28T18:27:01Z

cpp/src/parquet/encoding.cc

+ private:
+  constexpr T SafeSubtract(T a, T b) const {
+    constexpr WT mask = static_cast<WT>(static_cast<UT>(static_cast<T>(-1)));
+    return static_cast<T>((static_cast<WT>(a) - static_cast<WT>(b)) & mask);
+  }


Would this maybe better fit in int_util_overflow.h?

Yes, that sounds like a better location. Thanks!

Edit: since it needs ::arrow::internal::int128_t I don't think it can go in int_util_overflow.h :(

mapleFU

Generally I think we don't need an extra SafeSignedSubstractSigned for some original unsigned value. But this patch is great. I'll checkout parquet-mr and arrow-rs impl here

mapleFU · 2023-09-29T09:07:15Z

cpp/src/parquet/encoding.cc

    // making subtraction operations well-defined and correct even in case of overflow.
    // Encoded integers will wrap back around on decoding.
    // See http://en.wikipedia.org/wiki/Modular_arithmetic#Integers_modulo_n
-    deltas_[values_current_block_] = value - current_value_;
+    deltas_[values_current_block_] = SafeSignedSubtractSigned(value, current_value_);


Would a static_cast<T>(value - current_value_); be ok here? Originally, value and current_value_ is unsigned, doing unsigned op would not require an wider type?

Yes, you're right, we don't need a new safe subtract. The existing SafeSignedSubtract does the subtraction with unsigned to get the correct wrapping behavior, and then casts back to signed. No need for promotion to the larger type. I've changed to SafeSignedSubtract everywhere.

mapleFU · 2023-09-29T09:26:01Z

cpp/src/parquet/encoding.cc

-    const auto bit_width = bit_width_data[i] =
-        bit_util::NumRequiredBits(max_delta - min_delta);
+    const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
+        static_cast<UT>(SafeSignedSubtractSigned(max_delta, min_delta)));


Actually I'm not fully understand here. If the sequence is {1, 0, -1, 0, 1, 0, -1, 0, 1}

Previously, the value glows large because max_delta - min_delta. The max_delta should be 1, however, it turns to be a huge unsigned int, and the min_delta is 1. The result is correct, however it waste huge space?

Yes, you are correct. Leaving everything unsigned, we have (unsigned)(0 - 1) = 0xffffffffU. min_delta == 1, max_delta == 0xffffffffU, so max_delta - min_delta == 0xfffffffeU, which requires the full 32 bits to encode.

The end result is correct, but uses more space than it needs to.

mapleFU · 2023-09-29T09:33:23Z

Seems CI failed because some weird problem, would you mind retry and trigger these CI?

mapleFU · 2023-09-29T10:06:19Z

Oh after check arrow-rs( https://github.com/apache/arrow-rs/blob/471f6dd2911d8328ca56efe2f685e08c0a3fb8c8/parquet/src/encodings/encoding/mod.rs#L275 ) and parquet-mr ( See DeltaBinaryPackingValuesWriterForLong.flushBlockBuffer ), so we can just move forward, great

mapleFU

General looks good to me, it would be great to introducing this! Also @pitrou @wgtmac for review

mapleFU · 2023-10-02T15:40:14Z

cpp/src/parquet/encoding.cc

-    const auto bit_width = bit_width_data[i] =
-        bit_util::NumRequiredBits(max_delta - min_delta);
+    const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
+        static_cast<UT>(SafeSignedSubtract(max_delta, min_delta)));


nit: We can add a DCHECK for SafeSignedSubtract(max_delta, min_delta) >= 0

I don't think we would want that. It's ok for the subtraction to wrap around and become negative, we just cast it back to unsigned. On the decoding side it will wrap around again and yield the correct result. The only reason to use SafeSignedSubtract is to get well defined wrapping behavior, which signed subtraction does not give us.

Yeah you're right, {0, 0, INT32_MIN, INT32_MAX} might get SafeSignedSubtract(max_delta, min_delta) < 0.

mapleFU · 2023-10-02T15:40:59Z

cpp/src/parquet/arrow/arrow_reader_writer_test.cc

@@ -1413,6 +1413,68 @@ TEST_F(TestLargeStringParquetIO, Basics) {
  this->RoundTripSingleColumn(large_array, large_array, arrow_properties);
 }

+using TestDeltaBinaryPacked32ParquetIO = TestParquetIO<::arrow::Int32Type>;
+
+TEST_F(TestDeltaBinaryPacked32ParquetIO, DeltaBinaryPacked) {


The unit test looks good to me, but can we move them just to encoding_test.cc?

Sure. Since the tests are identical except for the typing, is it worth the effort to make this a typed test suite, but for just two types?

Oh, I see there is one in encoding_test already. Thanks for pointing me there @mapleFU

mapleFU · 2023-10-03T13:45:39Z

(This looks ok to me now, but maybe it's better to rebase, since lots of flaky unit test is already fixed)

pitrou

Thanks a lot for noticing and fixing this! LGTM on the principle, here are a few more comments.

pitrou · 2023-10-03T14:29:17Z

cpp/src/parquet/encoding_test.cc

+  ASSERT_EQ(num_values, values_decoded);
+  ASSERT_NO_FATAL_FAILURE(
+      VerifyResults<T>(decoded.data(), int_values.data(), num_values));
+}


Since this PR also fixes the encoded data size, can you add a test for that? For example check that the encoded buffer size is equal to a certain value given data that would have triggered the bug, such as the data in #37939 (comment)

pitrou · 2023-10-03T14:30:55Z

cpp/src/parquet/encoding.cc

+    const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
+        static_cast<UT>(SafeSignedSubtract(max_delta, min_delta)));


Note that SafeSignedSubtract simply does the substraction in the unsigned domain, so you could also write static_cast<UT>(max_delta) - static_cast<UT>(min_delta).

True. Nice catch

pitrou · 2023-10-03T14:32:47Z

cpp/src/parquet/encoding_test.cc

@@ -1634,6 +1634,41 @@ TYPED_TEST(TestDeltaBitPackEncoding, NonZeroPaddedMiniblockBitWidth) {
  }
 }

+TYPED_TEST(TestDeltaBitPackEncoding, DeltaBitPackedWrapping) {
+  using T = typename TypeParam::c_type;


Can you add a comment refering to the GH issue?

pitrou · 2023-10-03T14:35:02Z

@etseidl Did you try to read data generated by this patch using other Parquet implementations such as parquet-mr?

etseidl · 2023-10-03T14:48:11Z

@etseidl Did you try to read data generated by this patch using other Parquet implementations such as parquet-mr?

The data I posted in the issue was readable by arrow-rs and parquet-mr, but I can do a bigger test of that with more varied data.

pitrou · 2023-10-03T14:49:28Z

The data I posted in the issue was readable by arrow-rs and parquet-mr, but I can do a bigger test of that with more varied data.

No need to. Thanks for the answer!

etseidl · 2023-10-03T17:07:13Z

Thank you all for your help with this! Should I rebase and force push now? And should I update the description to match the current state of the patch (as it no longer promotes to a larger bit width)? @mapleFU @pitrou

pitrou · 2023-10-03T17:29:14Z

Yes, please rebase and update the description to match the contents!

mapleFU · 2023-10-03T17:44:40Z

I'm interested in performance it can bring. Verified decoding benchmark in MacOS M1Pro with RelWithDebugInfo and O2.

Encoding doesn't changed a lot, for Decoding, I guess previously we tent to use 32bits and 64bits, which is waste of space but a benefit for decoding. Smaller size would just make decoding a bit smaller. However I think we should merge this patch, it can make DELTA_BINARY_PACKED much more smaller in most cases, and be much better in some really "Delta" cases.

Encode

Before:

BM_DeltaBitPackingEncode_Int32_Fixed/1024         7388 ns         6727 ns        93419 bytes_per_second=580.658M/s items_per_second=152.216M/s
BM_DeltaBitPackingEncode_Int32_Fixed/4096        46325 ns        27672 ns        27080 bytes_per_second=564.655M/s items_per_second=148.021M/s
BM_DeltaBitPackingEncode_Int32_Fixed/32768      221001 ns       205395 ns         3243 bytes_per_second=608.584M/s items_per_second=159.537M/s
BM_DeltaBitPackingEncode_Int32_Fixed/65536      531356 ns       423160 ns         1679 bytes_per_second=590.793M/s items_per_second=154.873M/s
BM_DeltaBitPackingEncode_Int64_Fixed/1024         8327 ns         6813 ns       105758 bytes_per_second=1.11987G/s items_per_second=150.306M/s
BM_DeltaBitPackingEncode_Int64_Fixed/4096        32330 ns        27128 ns        25970 bytes_per_second=1.12495G/s items_per_second=150.988M/s
BM_DeltaBitPackingEncode_Int64_Fixed/32768      239290 ns       210625 ns         3289 bytes_per_second=1.15913G/s items_per_second=155.575M/s
BM_DeltaBitPackingEncode_Int64_Fixed/65536      417320 ns       405689 ns         1687 bytes_per_second=1.20359G/s items_per_second=161.543M/s
BM_DeltaBitPackingEncode_Int32_Narrow/1024        8008 ns         7857 ns        91353 bytes_per_second=497.153M/s items_per_second=130.326M/s
BM_DeltaBitPackingEncode_Int32_Narrow/4096       29652 ns        29567 ns        23353 bytes_per_second=528.467M/s items_per_second=138.534M/s
BM_DeltaBitPackingEncode_Int32_Narrow/32768     274176 ns       258310 ns         2694 bytes_per_second=483.915M/s items_per_second=126.856M/s
BM_DeltaBitPackingEncode_Int32_Narrow/65536     600979 ns       557093 ns         1000 bytes_per_second=448.758M/s items_per_second=117.639M/s
BM_DeltaBitPackingEncode_Int64_Narrow/1024       10400 ns        10082 ns        68428 bytes_per_second=774.872M/s items_per_second=101.564M/s
BM_DeltaBitPackingEncode_Int64_Narrow/4096       48600 ns        46734 ns        14683 bytes_per_second=668.678M/s items_per_second=87.645M/s
BM_DeltaBitPackingEncode_Int64_Narrow/32768     372571 ns       358108 ns         1978 bytes_per_second=698.113M/s items_per_second=91.5031M/s
BM_DeltaBitPackingEncode_Int64_Narrow/65536     693363 ns       687021 ns         1029 bytes_per_second=727.779M/s items_per_second=95.3915M/s
BM_DeltaBitPackingEncode_Int32_Wide/1024          8086 ns         7889 ns        90200 bytes_per_second=495.166M/s items_per_second=129.805M/s
BM_DeltaBitPackingEncode_Int32_Wide/4096         31668 ns        30423 ns        23291 bytes_per_second=513.592M/s items_per_second=134.635M/s
BM_DeltaBitPackingEncode_Int32_Wide/32768       269229 ns       262281 ns         2667 bytes_per_second=476.588M/s items_per_second=124.935M/s
BM_DeltaBitPackingEncode_Int32_Wide/65536       517646 ns       506281 ns         1395 bytes_per_second=493.797M/s items_per_second=129.446M/s
BM_DeltaBitPackingEncode_Int64_Wide/1024         10090 ns        10087 ns        69206 bytes_per_second=774.544M/s items_per_second=101.521M/s
BM_DeltaBitPackingEncode_Int64_Wide/4096         46402 ns        46005 ns        15212 bytes_per_second=679.276M/s items_per_second=89.0341M/s
BM_DeltaBitPackingEncode_Int64_Wide/32768       361227 ns       356360 ns         1967 bytes_per_second=701.538M/s items_per_second=91.952M/s
BM_DeltaBitPackingEncode_Int64_Wide/65536       687265 ns       687060 ns         1013 bytes_per_second=727.738M/s items_per_second=95.3861M/s

After:

BM_DeltaBitPackingEncode_Int32_Fixed/1024         6746 ns         6622 ns       107996 bytes_per_second=589.889M/s items_per_second=154.636M/s
BM_DeltaBitPackingEncode_Int32_Fixed/4096        26207 ns        25429 ns        27415 bytes_per_second=614.466M/s items_per_second=161.078M/s
BM_DeltaBitPackingEncode_Int32_Fixed/32768      236115 ns       207114 ns         3471 bytes_per_second=603.534M/s items_per_second=158.213M/s
BM_DeltaBitPackingEncode_Int32_Fixed/65536      450868 ns       416761 ns         1668 bytes_per_second=599.864M/s items_per_second=157.251M/s
BM_DeltaBitPackingEncode_Int64_Fixed/1024         6489 ns         6484 ns       108361 bytes_per_second=1.17666G/s items_per_second=157.929M/s
BM_DeltaBitPackingEncode_Int64_Fixed/4096        25210 ns        25206 ns        27800 bytes_per_second=1.21072G/s items_per_second=162.5M/s
BM_DeltaBitPackingEncode_Int64_Fixed/32768      202326 ns       202064 ns         3460 bytes_per_second=1.20823G/s items_per_second=162.166M/s
BM_DeltaBitPackingEncode_Int64_Fixed/65536      403463 ns       403353 ns         1743 bytes_per_second=1.21056G/s items_per_second=162.478M/s
BM_DeltaBitPackingEncode_Int32_Narrow/1024        7066 ns         7062 ns        99590 bytes_per_second=553.105M/s items_per_second=144.993M/s
BM_DeltaBitPackingEncode_Int32_Narrow/4096       26993 ns        26980 ns        26047 bytes_per_second=579.125M/s items_per_second=151.814M/s
BM_DeltaBitPackingEncode_Int32_Narrow/32768     232130 ns       227611 ns         3087 bytes_per_second=549.182M/s items_per_second=143.965M/s
BM_DeltaBitPackingEncode_Int32_Narrow/65536     445752 ns       444218 ns         1574 bytes_per_second=562.787M/s items_per_second=147.531M/s
BM_DeltaBitPackingEncode_Int64_Narrow/1024        6998 ns         6994 ns       100485 bytes_per_second=1116.97M/s items_per_second=146.404M/s
BM_DeltaBitPackingEncode_Int64_Narrow/4096       26963 ns        26955 ns        25345 bytes_per_second=1.13219G/s items_per_second=151.96M/s
BM_DeltaBitPackingEncode_Int64_Narrow/32768     224846 ns       223845 ns         3141 bytes_per_second=1116.85M/s items_per_second=146.387M/s
BM_DeltaBitPackingEncode_Int64_Narrow/65536     440290 ns       440284 ns         1591 bytes_per_second=1.10901G/s items_per_second=148.849M/s
BM_DeltaBitPackingEncode_Int32_Wide/1024          7925 ns         7923 ns        87495 bytes_per_second=493.032M/s items_per_second=129.245M/s
BM_DeltaBitPackingEncode_Int32_Wide/4096         30254 ns        30251 ns        23146 bytes_per_second=516.518M/s items_per_second=135.402M/s
BM_DeltaBitPackingEncode_Int32_Wide/32768       256295 ns       256292 ns         2714 bytes_per_second=487.725M/s items_per_second=127.854M/s
BM_DeltaBitPackingEncode_Int32_Wide/65536       507129 ns       500402 ns         1399 bytes_per_second=499.598M/s items_per_second=130.967M/s
BM_DeltaBitPackingEncode_Int64_Wide/1024         10406 ns        10405 ns        67149 bytes_per_second=750.859M/s items_per_second=98.4165M/s
BM_DeltaBitPackingEncode_Int64_Wide/4096         45677 ns        45439 ns        15300 bytes_per_second=687.734M/s items_per_second=90.1427M/s
BM_DeltaBitPackingEncode_Int64_Wide/32768       346803 ns       346780 ns         2019 bytes_per_second=720.919M/s items_per_second=94.4923M/s
BM_DeltaBitPackingEncode_Int64_Wide/65536       710268 ns       705693 ns          964 bytes_per_second=708.523M/s items_per_second=92.8676M/s

Decode

Before:

Run on (10 X 24.1205 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 11.76, 8.38, 5.93
------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_DeltaBitPackingDecode_Int32_Fixed/1024         1170 ns         1140 ns       617551 bytes_per_second=3.34478G/s items_per_second=897.857M/s
BM_DeltaBitPackingDecode_Int32_Fixed/4096         4223 ns         4111 ns       171441 bytes_per_second=3.71136G/s items_per_second=996.261M/s
BM_DeltaBitPackingDecode_Int32_Fixed/32768       33916 ns        32218 ns        22054 bytes_per_second=3.78883G/s items_per_second=1017.06M/s
BM_DeltaBitPackingDecode_Int32_Fixed/65536       65659 ns        64014 ns        11133 bytes_per_second=3.81385G/s items_per_second=1023.77M/s
BM_DeltaBitPackingDecode_Int64_Fixed/1024         1170 ns         1124 ns       630523 bytes_per_second=6.7877G/s items_per_second=911.029M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096         3725 ns         3723 ns       188036 bytes_per_second=8.19693G/s items_per_second=1.10017G/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768       29321 ns        29308 ns        23828 bytes_per_second=8.33016G/s items_per_second=1.11805G/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536       58346 ns        58318 ns        11880 bytes_per_second=8.37271G/s items_per_second=1.12377G/s
BM_DeltaBitPackingDecode_Int32_Narrow/1024        1068 ns         1068 ns       664976 bytes_per_second=3.5725G/s items_per_second=958.986M/s
BM_DeltaBitPackingDecode_Int32_Narrow/4096        4082 ns         4028 ns       176849 bytes_per_second=3.78799G/s items_per_second=1016.83M/s
BM_DeltaBitPackingDecode_Int32_Narrow/32768      31969 ns        31932 ns        22023 bytes_per_second=3.82279G/s items_per_second=1026.17M/s
BM_DeltaBitPackingDecode_Int32_Narrow/65536      63682 ns        63643 ns        11016 bytes_per_second=3.83611G/s items_per_second=1029.75M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024         934 ns          932 ns       748351 bytes_per_second=8.18632G/s items_per_second=1098.75M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096        3404 ns         3403 ns       202131 bytes_per_second=8.96808G/s items_per_second=1.20367G/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768      29199 ns        29196 ns        24017 bytes_per_second=8.36203G/s items_per_second=1.12233G/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536      57770 ns        57768 ns        12077 bytes_per_second=8.45249G/s items_per_second=1.13447G/s
BM_DeltaBitPackingDecode_Int32_Wide/1024          1087 ns         1087 ns       643548 bytes_per_second=3.50993G/s items_per_second=942.189M/s
BM_DeltaBitPackingDecode_Int32_Wide/4096          4087 ns         4087 ns       172062 bytes_per_second=3.73328G/s items_per_second=1002.15M/s
BM_DeltaBitPackingDecode_Int32_Wide/32768        32799 ns        32797 ns        21498 bytes_per_second=3.72199G/s items_per_second=999.114M/s
BM_DeltaBitPackingDecode_Int32_Wide/65536        65459 ns        65453 ns        10717 bytes_per_second=3.73004G/s items_per_second=1001.27M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          1016 ns         1016 ns       687198 bytes_per_second=7.5105G/s items_per_second=1008.04M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096          3742 ns         3742 ns       187931 bytes_per_second=8.15518G/s items_per_second=1094.57M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768        31511 ns        31509 ns        22198 bytes_per_second=7.74827G/s items_per_second=1039.96M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536        62441 ns        62433 ns        11171 bytes_per_second=7.82092G/s items_per_second=1049.71M/s

After:

------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_DeltaBitPackingDecode_Int32_Fixed/1024         1524 ns         1210 ns       582799 bytes_per_second=3.15193G/s items_per_second=846.089M/s
BM_DeltaBitPackingDecode_Int32_Fixed/4096         6155 ns         4283 ns       154236 bytes_per_second=3.56298G/s items_per_second=956.43M/s
BM_DeltaBitPackingDecode_Int32_Fixed/32768       40811 ns        32998 ns        20873 bytes_per_second=3.69928G/s items_per_second=993.019M/s
BM_DeltaBitPackingDecode_Int32_Fixed/65536       73049 ns        64550 ns        10534 bytes_per_second=3.78218G/s items_per_second=1015.27M/s
BM_DeltaBitPackingDecode_Int64_Fixed/1024         1164 ns         1119 ns       628761 bytes_per_second=6.81583G/s items_per_second=914.806M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096         3824 ns         3780 ns       186756 bytes_per_second=8.07281G/s items_per_second=1083.51M/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768       30175 ns        29516 ns        24098 bytes_per_second=8.2714G/s items_per_second=1.11017G/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536       60017 ns        59018 ns        11983 bytes_per_second=8.27339G/s items_per_second=1.11044G/s
BM_DeltaBitPackingDecode_Int32_Narrow/1024        1381 ns         1378 ns       501986 bytes_per_second=2.76793G/s items_per_second=743.01M/s
BM_DeltaBitPackingDecode_Int32_Narrow/4096        5404 ns         5369 ns       131067 bytes_per_second=2.84204G/s items_per_second=762.903M/s
BM_DeltaBitPackingDecode_Int32_Narrow/32768      45844 ns        43339 ns        16220 bytes_per_second=2.81666G/s items_per_second=756.091M/s
BM_DeltaBitPackingDecode_Int32_Narrow/65536      86916 ns        84916 ns         8257 bytes_per_second=2.87509G/s items_per_second=771.777M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024        1248 ns         1159 ns       615866 bytes_per_second=6.58504G/s items_per_second=883.829M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096        4298 ns         4296 ns       162309 bytes_per_second=7.10393G/s items_per_second=953.473M/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768      35427 ns        35378 ns        19834 bytes_per_second=6.90082G/s items_per_second=926.212M/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536      70880 ns        70862 ns         9877 bytes_per_second=6.89062G/s items_per_second=924.844M/s
BM_DeltaBitPackingDecode_Int32_Wide/1024          1360 ns         1359 ns       515479 bytes_per_second=2.80606G/s items_per_second=753.246M/s
BM_DeltaBitPackingDecode_Int32_Wide/4096          5124 ns         5121 ns       136309 bytes_per_second=2.97938G/s items_per_second=799.772M/s
BM_DeltaBitPackingDecode_Int32_Wide/32768        40943 ns        40924 ns        16860 bytes_per_second=2.98282G/s items_per_second=800.695M/s
BM_DeltaBitPackingDecode_Int32_Wide/65536        82019 ns        81988 ns         8431 bytes_per_second=2.97777G/s items_per_second=799.34M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          1278 ns         1278 ns       548551 bytes_per_second=5.97075G/s items_per_second=801.38M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096          4778 ns         4777 ns       146543 bytes_per_second=6.38874G/s items_per_second=857.482M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768        38399 ns        38390 ns        18267 bytes_per_second=6.35954G/s items_per_second=853.563M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536        76813 ns        76764 ns         9079 bytes_per_second=6.36081G/s items_per_second=853.734M/s

pitrou · 2023-10-03T18:00:57Z

I've changed the PR description and will merge if CI passes.

pitrou · 2023-10-03T18:12:26Z

The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well.

mapleFU · 2023-10-03T18:40:36Z

The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well.

Yeah, but I guess you mean DELTA_BINARY_PACKED, and not DELTA_BYTE_ARRAY...

etseidl · 2023-10-03T18:45:50Z

The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well.

Yeah, but I guess you mean DELTA_BINARY_PACKED, and not DELTA_BYTE_ARRAY...

There are 2 DELTA_BINARY_PACKED streams in DELTA_BYTE_ARRAY, so if the deltas are small and varying +/-, this could still be a big benefit. In fact, I discovered this problem while implementing a DELTA_LENGTH_BYTE_ARRAY decoder. :)

pitrou · 2023-10-03T19:16:15Z

Thanks a lot for this @etseidl . It was embarassing not to get any space-saving benefits from the encoding...

etseidl · 2023-10-03T19:19:35Z

Thanks again @pitrou @mapleFU @rok for shepherding this PR through. It wound up much better than when I started :)

mapleFU · 2023-10-03T19:54:58Z

There are 2 DELTA_BINARY_PACKED streams in DELTA_BYTE_ARRAY, so if the deltas are small and varying +/-, this could still be a big benefit. In fact, I discovered this problem while implementing a DELTA_LENGTH_BYTE_ARRAY decoder. :)

Ooops, I forgot that a bit. It should have a length. So this can benefits severo encodings

conbench-apache-arrow · 2023-10-04T01:06:05Z

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 5514b22.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 40 possible false positives for unstable benchmarks that are known to sometimes produce them.

mapleFU · 2023-10-07T03:40:18Z

lol, I tried this in my own dataset, the page size after encoding becomes smaller, but the size after compression even glows twice than before😂

etseidl · 2023-10-07T05:46:25Z

lol, I tried this in my own dataset, the page size after encoding becomes smaller, but the size after compression even glows twice than before😂

Oh no 😮

mapleFU · 2023-10-07T05:55:09Z

I generate data using the code below, and use zstd default to compress it with 10000 values.

if i % 4 == 0:
  return i * -400;
return i * 4;

The plain size is 40000, after compression it's about 15000bytes.

The delta bit pack with previous code is a bit greater than 40000, but after compression, it's 10000bytes. And after change, the size before compression is 28000bytes, after compression is 27000bytes.

However I think the patch is great, at least we fix a bad problem here...🤔

pitrou · 2023-10-07T08:42:10Z

@mapleFU Interesting, thank you. I think this shows that the DELTA encodings should be used with care, only if the data is very well-suited to them (for example integers with a small range of values). Otherwise, generic and fast compressors such as Lz4 and Zstd are probably a better choice.

wgtmac · 2023-10-07T15:46:53Z

Sorry for the late reply due to a long holiday vacation.

IIRC, DELTA_BINARY_PACKED encoding was inspired by FastPFor. However, FastPFor was designed mainly for positive integers. DELTA_BINARY_PACKED encoding even does not follow what FastPFor does for exception numbers (like the negative number exemplified by @mapleFU ).

…en encoding DELTA_BINARY_PACKED (apache#37940) Closes apache#37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain. Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common). ### Are these changes tested? I've included two tests that result in overflow. ### Are there any user-facing changes? No * Closes: apache#37939 Authored-by: seidl <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

etseidl requested a review from wgtmac as a code owner September 28, 2023 18:12

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Sep 28, 2023

rok reviewed Sep 28, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Sep 28, 2023

mapleFU reviewed Sep 29, 2023

View reviewed changes

mapleFU reviewed Oct 2, 2023

View reviewed changes

pitrou requested changes Oct 3, 2023

View reviewed changes

mapleFU approved these changes Oct 3, 2023

View reviewed changes

pitrou approved these changes Oct 3, 2023

View reviewed changes

change delta binary packed to use signed types

19b304b

etseidl force-pushed the delta_binary_FoR branch from b981be6 to 19b304b Compare October 3, 2023 18:00

pitrou merged commit 5514b22 into apache:main Oct 3, 2023
30 checks passed

pitrou removed the awaiting change review Awaiting change review label Oct 3, 2023

etseidl deleted the delta_binary_FoR branch October 3, 2023 19:15

mapleFU mentioned this pull request Nov 19, 2023

[Python] [Parquet] Compression degradation when column type changed from INT64 to INT32 #35726

Open

mapleFU mentioned this pull request Feb 19, 2024

[Parquet] DELTA_BINARY_PACKED constraint on num_bits is too restrict? #20374

Closed

etseidl mentioned this pull request Feb 26, 2024

PARQUET-2435: Clarify behavior of DELTA_BINARY_PACKED encoding apache/parquet-format#231

Merged

3 tasks

		const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
		static_cast<UT>(SafeSignedSubtract(max_delta, min_delta)));

GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED #37940

GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED #37940

Conversation

etseidl commented Sep 28, 2023 • edited by pitrou Loading

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Sep 28, 2023

rok left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etseidl Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

mapleFU left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etseidl Sep 29, 2023 • edited Loading

Choose a reason for hiding this comment

mapleFU commented Sep 29, 2023

mapleFU commented Sep 29, 2023

mapleFU left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etseidl Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

mapleFU commented Oct 3, 2023

pitrou left a comment

Choose a reason for hiding this comment

pitrou Oct 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Oct 3, 2023

etseidl commented Oct 3, 2023

pitrou commented Oct 3, 2023

etseidl commented Oct 3, 2023

pitrou commented Oct 3, 2023

mapleFU commented Oct 3, 2023

Encode

Decode

pitrou commented Oct 3, 2023

pitrou commented Oct 3, 2023

mapleFU commented Oct 3, 2023

etseidl commented Oct 3, 2023

pitrou commented Oct 3, 2023

etseidl commented Oct 3, 2023

mapleFU commented Oct 3, 2023 • edited Loading

conbench-apache-arrow bot commented Oct 4, 2023

mapleFU commented Oct 7, 2023

etseidl commented Oct 7, 2023

mapleFU commented Oct 7, 2023

pitrou commented Oct 7, 2023

wgtmac commented Oct 7, 2023

etseidl commented Sep 28, 2023 •

edited by pitrou

Loading

etseidl Sep 28, 2023 •

edited

Loading

etseidl Sep 29, 2023 •

edited

Loading

etseidl Oct 2, 2023 •

edited

Loading

pitrou Oct 3, 2023 •

edited

Loading

mapleFU commented Oct 3, 2023 •

edited

Loading