Add compression codec configurations for data in exchanges and spills #20274

hackeryang · 2024-01-04T04:54:52Z

Description

Our community knows that ZSTD compression is better about speed and compression ratio in many scenarios, for example: #10058
Many blogs and test reports on the internet also mentioned that ZSTD is better than ZLIB in most cases: https://databento.com/blog/zstd-vs-zlib

We found that if page serialization and deserialization also use ZSTD, the compression and decompression performance is also better, and Fault Tolerant Execution also uses compressed pages: #14957

The benchmark result is shown as below, It seems that if compression is enabled, Zstd is about 10X faster than Lz4.

Because ZSTD may not be a best choice for all scenarios, and we want to take into account the demands of different users, we added a session property and configuration to switch between several compresion codecs, and still use Lz4 by default when compression is enabled.

Additional context and related issues

Lz4 Compression

Benchmark                        (compressed)  (encrypted)  (randomSeed)   Mode  Cnt     Score     Error  Units
BenchmarkPagesSerde.deserialize          true         true          1000  thrpt   10   104.194 ±   4.508  ops/s
BenchmarkPagesSerde.deserialize          true        false          1000  thrpt   10  4770.274 ± 106.577  ops/s
BenchmarkPagesSerde.deserialize         false         true          1000  thrpt   10    83.668 ±   3.172  ops/s
BenchmarkPagesSerde.deserialize         false        false          1000  thrpt   10  8175.870 ± 423.283  ops/s
BenchmarkPagesSerde.serialize            true         true          1000  thrpt   10    90.786 ±  18.355  ops/s
BenchmarkPagesSerde.serialize            true        false          1000  thrpt   10   952.958 ±  31.424  ops/s
BenchmarkPagesSerde.serialize           false         true          1000  thrpt   10    85.247 ±   1.277  ops/s
BenchmarkPagesSerde.serialize           false        false          1000  thrpt   10  8060.584 ± 119.043  ops/s

Zstd Compression

Benchmark                        (compressed)  (encrypted)  (randomSeed)   Mode  Cnt      Score      Error  Units
BenchmarkPagesSerde.deserialize          true         true          1000  thrpt   10    102.974 ±    1.746  ops/s
BenchmarkPagesSerde.deserialize          true        false          1000  thrpt   10    373.737 ±   55.635  ops/s
BenchmarkPagesSerde.deserialize         false         true          1000  thrpt   10     85.315 ±    0.897  ops/s
BenchmarkPagesSerde.deserialize         false        false          1000  thrpt   10  10393.729 ± 1296.839  ops/s
BenchmarkPagesSerde.serialize            true         true          1000  thrpt   10     44.524 ±    0.103  ops/s
BenchmarkPagesSerde.serialize            true        false          1000  thrpt   10     65.594 ±    0.373  ops/s
BenchmarkPagesSerde.serialize           false         true          1000  thrpt   10     83.790 ±    2.491  ops/s
BenchmarkPagesSerde.serialize           false        false          1000  thrpt   10   7675.937 ±  347.427  ops/s

Benchmarks of exchanges between nodes in a real cluster on AWS, thanks to Mr Raunaq: #20274 (comment)
Our discussion about when to use ZSTD: https://trinodb.slack.com/archives/CP1MUNEUX/p1704978337669429
Snappy VS ZSTD: https://www.percona.com/blog/compression-methods-in-mongodb-snappy-vs-zstd/

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* The configuration property `exchange.compression-enabled` is now replaced by `exchange.compression-codec` which allows specifying `NONE`, `LZ4` or `ZSTD` as the compression codec for data exchanged between workers. The corresponding system session property `exchange_compression` is replaced by `exchange_compression_codec`. ({issue}`20274`)
* The configuration property `spill-compression-enabled` is now replaced by `spill-compression-codec` which allows specifying `NONE`, `LZ4` or `ZSTD` as the compression codec for data spilled to disk. The already deprecated configuration property `experimental.spill-compression-enabled` is now defunct and must be removed from server configurations. ({issue}`20274`)

hackeryang · 2024-01-04T05:55:24Z

Hello @findepi @wendigo @losipiuk , can you please take a review when you have time, thank you~

wendigo · 2024-01-04T07:05:30Z

I'm concerned that zstd is using jni which could affect gc @hackeryang

wendigo · 2024-01-04T07:14:30Z

I don't think that we should switch by default. We could make it configurable

findepi · 2024-01-04T08:41:43Z

cc @raunaqmorarka

raunaqmorarka · 2024-01-04T10:06:51Z

I'm concerned that zstd is using jni which could affect gc @hackeryang

This is using the airlift implementation which is pure java code rather than jni unless I'm mistaken about that

raunaqmorarka · 2024-01-04T10:17:17Z

Our community knows that ZSTD compression is better about speed and compression ratio in most scenarios, for example: #10058

That's a comparison between ZSTD and GZIP, both these prioritise achieving higher compression over CPU efficiency. ZSTD is better between these two, that does not automatically make it the better choice for every scenario compared to lighter compression schemes like SNAPPY and LZ4.
ZSTD makes sense for data files or exchange data stored on S3. But it's not clear that it would be the best choice for data exchanged over the network. With a high speed and bandwidth network we're better off using less CPU heavy compression. The existing LZ4 exchange compression wasn't turned on by default for pipelined execution because its impact is workload and setup dependent, it makes some queries faster and some slower.
This needs benchmarking in an actual Trino cluster rather than JMH or generic compression/decompression speed tests to assess properly.

wendigo · 2024-01-04T10:26:27Z

@raunaqmorarka I've assumed that zstd codec is using zstd-jni underneath, my mistake

hackeryang · 2024-01-04T10:54:30Z

Our community knows that ZSTD compression is better about speed and compression ratio in most scenarios, for example: #10058

That's a comparison between ZSTD and GZIP, both these prioritise achieving higher compression over CPU efficiency. ZSTD is better between these two, that does not automatically make it the better choice for every scenario compared to lighter compression schemes like SNAPPY and LZ4. ZSTD makes sense for data files or exchange data stored on S3. But it's not clear that it would be the best choice for data exchanged over the network. With a high speed and bandwidth network we're better off using less CPU heavy compression. The existing LZ4 exchange compression wasn't turned on by default for pipelined execution because its impact is workload and setup dependent, it makes some queries faster and some slower. This needs benchmarking in an actual Trino cluster rather than JMH or generic compression/decompression speed tests to assess properly.

@wendigo @raunaqmorarka you are right, zstd may not be a silver bullet for all situations~

I have modified the pr description, and added a configuration with a session property to switch between different codecs, please cc again, it seems that the CICD errors are not related to our PR

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

core/trino-main/src/main/java/io/trino/SystemSessionProperties.java

core/trino-main/src/main/java/io/trino/execution/buffer/CompressionKind.java

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java

core/trino-main/src/test/java/io/trino/execution/buffer/TestPagesSerde.java

core/trino-main/src/main/java/io/trino/execution/buffer/PageDeserializer.java

core/trino-main/src/main/java/io/trino/execution/buffer/PageSerializer.java

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java

core/trino-main/src/main/java/io/trino/execution/buffer/CompressionCodec.java

core/trino-main/src/main/java/io/trino/execution/buffer/PageSerializer.java

losipiuk

Nice

hackeryang · 2024-01-05T08:32:42Z

@wendigo @losipiuk Thank you for your advice~ I have modified some codes, and the 2 test errors seemed not related to our PR, please cc again

core/trino-main/src/main/java/io/trino/spiller/NodeSpillConfig.java

core/trino-main/src/main/java/io/trino/execution/buffer/CompressionCodec.java

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

raunaqmorarka · 2024-01-05T08:56:06Z

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

@@ -81,6 +83,7 @@ public class FeaturesConfig
     * default value is overwritten for fault tolerant execution in {@link #applyFaultTolerantExecutionDefaults()}}
     */
    private boolean exchangeCompressionEnabled;
+    private CompressionCodec exchangeCompressionCodec = LZ4;


I ran TPCDS sf1000 benchmark on unpartitioned hive tables with ZSTD exchange compression in pipelined execution mode. The setup is a 6 node cluster of r6g.8xlarge machines on AWS.

TPCDS unpartitioned parquet zstd exchange.pdf

Even though there is a big reduction in internal network data exchanged, there are still regressions in latency due to the high CPU cost of ZSTD compression.
I think this trade-off of more aggressive compression at high CPU cost doesn't make much sense for pipelined execution mode where we exchange data over internal network directly.
This trade off might make sense for filesystem based exchange in fault tolerant execution mode, but that also needs to be assessed with a benchmark.
@losipiuk can we do these changes in way where this config becomes specific to filesystem exchange in FTE ?

Benchmark results with FTE
FTE unpartitioned hive parquet sf1k zstd compression.pdf

FTE is using LZ4 exchange compression by default already. Switching to ZSTD improves some queries but it's significantly worse overall. So I'm not seeing a compelling reason to add this option even for FTE.

Benchmark results with FTE FTE unpartitioned hive parquet sf1k zstd compression.pdf
FTE is using LZ4 exchange compression by default already. Switching to ZSTD improves some queries but it's significantly worse overall. So I'm not seeing a compelling reason to add this option even for FTE.

You are right, according to the benchmark results, ZSTD should only be used in clusters with little network bandwidth.

We found that some gaming analytics customers of our company have clusters with strong CPU and poor network cards, maybe in this condition we can use it~

I saw that Huawei OpenLookeng(originated from Trino 350) uses ZSTD in exchanges, but it was for their datacenter connector:

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

core/trino-main/src/main/java/io/trino/SystemSessionProperties.java

core/trino-main/src/main/java/io/trino/execution/buffer/CompressionCodec.java

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java

docs/src/main/sphinx/admin/properties-exchange.md

core/trino-main/src/main/java/io/trino/spiller/NodeSpillConfig.java

docs/src/main/sphinx/admin/properties-spilling.md

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java

hackeryang · 2024-01-17T09:41:34Z

It seems CI has been passed(except the irrelevant errors), @raunaqmorarka can you please help merge, thanks~

raunaqmorarka · 2024-01-18T03:31:01Z

It seems CI has been passed(except the irrelevant errors), @raunaqmorarka can you please help merge, thanks~

Please fix the compile error on the first commit

hackeryang · 2024-01-18T08:02:35Z

It seems CI has been passed(except the irrelevant errors), @raunaqmorarka can you please help merge, thanks~

Please fix the compile error on the first commit

@raunaqmorarka I have fixed the error, please cc again~

raunaqmorarka · 2024-01-18T08:20:00Z

It seems CI has been passed(except the irrelevant errors), @raunaqmorarka can you please help merge, thanks~

Please fix the compile error on the first commit

@raunaqmorarka I have fixed the error, please cc again~

We still need a fix to the failing test TestJdbcConnection.testSession

hackeryang · 2024-01-18T09:56:02Z

It seems CI has been passed(except the irrelevant errors), @raunaqmorarka can you please help merge, thanks~

Please fix the compile error on the first commit

@raunaqmorarka I have fixed the error, please cc again~

We still need a fix to the failing test TestJdbcConnection.testSession

@raunaqmorarka Thanks for pointing this out~ I have fixed this test failure, please review again~

The error this time is about Kudu, seems not related to our PR

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java

wendigo

LGTM % two comments

cla-bot bot added the cla-signed label Jan 4, 2024

hackeryang self-assigned this Jan 4, 2024

hackeryang added the performance label Jan 4, 2024

hackeryang requested review from findepi, wendigo and losipiuk January 4, 2024 04:55

hackeryang force-pushed the use_zstd_in_serialization branch from 4cf6a5f to f0ee5f4 Compare January 4, 2024 09:37

hackeryang changed the title ~~Use ZSTD in page serialization and deserialization to improve performance~~ Add compression kind configuration for page serializations, exchanges and spills Jan 4, 2024

hackeryang requested a review from raunaqmorarka January 4, 2024 10:47

wendigo requested changes Jan 4, 2024

View reviewed changes

hackeryang changed the title ~~Add compression kind configuration for page serializations, exchanges and spills~~ Add compression codec configurations for data in exchanges and spills Jan 4, 2024

hackeryang force-pushed the use_zstd_in_serialization branch from f0ee5f4 to 3a271b5 Compare January 4, 2024 12:21

hackeryang requested a review from wendigo January 4, 2024 15:52

wendigo requested changes Jan 4, 2024

View reviewed changes

losipiuk reviewed Jan 4, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/buffer/CompressionCodec.java Outdated Show resolved Hide resolved

losipiuk reviewed Jan 4, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/buffer/PageSerializer.java Outdated Show resolved Hide resolved

losipiuk reviewed Jan 4, 2024

View reviewed changes

hackeryang force-pushed the use_zstd_in_serialization branch from 3a271b5 to 1052860 Compare January 5, 2024 07:19

hackeryang requested review from losipiuk and wendigo January 5, 2024 08:30

raunaqmorarka reviewed Jan 5, 2024

View reviewed changes

hackeryang requested a review from findinpath January 12, 2024 07:38

raunaqmorarka reviewed Jan 15, 2024

View reviewed changes

hackeryang force-pushed the use_zstd_in_serialization branch 4 times, most recently from cb42cb6 to 83c07ef Compare January 16, 2024 10:39

hackeryang requested a review from raunaqmorarka January 17, 2024 03:53

raunaqmorarka approved these changes Jan 17, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java Outdated Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java Outdated Show resolved Hide resolved

hackeryang force-pushed the use_zstd_in_serialization branch from 83c07ef to 0df8a6b Compare January 17, 2024 06:05

hackeryang force-pushed the use_zstd_in_serialization branch from 0df8a6b to 09c1c4f Compare January 18, 2024 06:43

hackeryang force-pushed the use_zstd_in_serialization branch from 09c1c4f to 52b50a7 Compare January 18, 2024 08:41

github-actions bot added the jdbc Relates to Trino JDBC driver label Jan 18, 2024

hackeryang force-pushed the use_zstd_in_serialization branch from 52b50a7 to 3b10b19 Compare January 18, 2024 09:13

wendigo reviewed Jan 18, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java Outdated Show resolved Hide resolved

wendigo reviewed Jan 18, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/buffer/PagesSerdeFactory.java Outdated Show resolved Hide resolved

wendigo approved these changes Jan 18, 2024

View reviewed changes

hackeryang added 2 commits January 19, 2024 10:17

Add compression codec configurations for data in exchanges

c3c13e0

Add compression codec configuration for data in spills

5938b7a

hackeryang force-pushed the use_zstd_in_serialization branch from 3b10b19 to 5938b7a Compare January 19, 2024 02:18

raunaqmorarka merged commit 1bb096b into trinodb:master Jan 19, 2024
91 checks passed

github-actions bot added this to the 437 milestone Jan 19, 2024

hackeryang deleted the use_zstd_in_serialization branch January 19, 2024 04:53

colebow mentioned this pull request Jan 24, 2024

Add Trino 437 release notes #20404

Merged

mosabua mentioned this pull request Jan 24, 2024

Fix docs from compression codec use #20458

Merged

soenkeliebau mentioned this pull request Jan 31, 2024

[NEW VERSION] Trino 438 stackabletech/trino-operator#524

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compression codec configurations for data in exchanges and spills #20274

Add compression codec configurations for data in exchanges and spills #20274

hackeryang commented Jan 4, 2024 •

edited by raunaqmorarka

Loading

hackeryang commented Jan 4, 2024

wendigo commented Jan 4, 2024

wendigo commented Jan 4, 2024

findepi commented Jan 4, 2024

raunaqmorarka commented Jan 4, 2024

raunaqmorarka commented Jan 4, 2024 •

edited

Loading

wendigo commented Jan 4, 2024

hackeryang commented Jan 4, 2024 •

edited

Loading

losipiuk left a comment

hackeryang commented Jan 5, 2024 •

edited

Loading

raunaqmorarka Jan 5, 2024

raunaqmorarka Jan 9, 2024

hackeryang Jan 9, 2024

hackeryang commented Jan 17, 2024

raunaqmorarka commented Jan 18, 2024

hackeryang commented Jan 18, 2024

raunaqmorarka commented Jan 18, 2024

hackeryang commented Jan 18, 2024

wendigo left a comment

Add compression codec configurations for data in exchanges and spills #20274

Add compression codec configurations for data in exchanges and spills #20274

Conversation

hackeryang commented Jan 4, 2024 • edited by raunaqmorarka Loading

Description

Additional context and related issues

Lz4 Compression

Zstd Compression

Release notes

hackeryang commented Jan 4, 2024

wendigo commented Jan 4, 2024

wendigo commented Jan 4, 2024

findepi commented Jan 4, 2024

raunaqmorarka commented Jan 4, 2024

raunaqmorarka commented Jan 4, 2024 • edited Loading

wendigo commented Jan 4, 2024

hackeryang commented Jan 4, 2024 • edited Loading

losipiuk left a comment

Choose a reason for hiding this comment

hackeryang commented Jan 5, 2024 • edited Loading

raunaqmorarka Jan 5, 2024

Choose a reason for hiding this comment

raunaqmorarka Jan 9, 2024

Choose a reason for hiding this comment

hackeryang Jan 9, 2024

Choose a reason for hiding this comment

hackeryang commented Jan 17, 2024

raunaqmorarka commented Jan 18, 2024

hackeryang commented Jan 18, 2024

raunaqmorarka commented Jan 18, 2024

hackeryang commented Jan 18, 2024

wendigo left a comment

Choose a reason for hiding this comment

hackeryang commented Jan 4, 2024 •

edited by raunaqmorarka

Loading

raunaqmorarka commented Jan 4, 2024 •

edited

Loading

hackeryang commented Jan 4, 2024 •

edited

Loading

hackeryang commented Jan 5, 2024 •

edited

Loading