Reuse compressionBuffer when serializing pages #13232

yingsu00 · 2019-08-15T07:56:08Z

BenchmarkPartitionedOutputOperator shows 8% gain in elapsed time:

Always allocate compressionBuffer (existing behavior):
Benchmark Mode Cnt Score Error Units
BenchmarkPartitionedOutputOperator.addPage avgt 20 4749.447 ± 380.463 ms/op

Reuse compressionBuffer (current commit):
Benchmark Mode Cnt Score Error Units
BenchmarkPartitionedOutputOperator.addPage avgt 30 4380.171 ± 148.011 ms/op

BenchmarkCompressToByteBuffer shows using ByteBuffer and byte[] as compression
buffer has similar performance. We used byte[] because the code is cleaner.

Benchmark Mode Cnt Score Error Units
BenchmarkCompressToByteBuffer.compressToByteArray avgt 60 181.677 ± 9.927 us/op
BenchmarkCompressToByteBuffer.compressToByteBuffer avgt 60 189.371 ± 3.747 us/op

== RELEASE NOTES ==
General Changes
* Reuse compression buffer in PagesSerde when serializing pages.

mbasmanova

typo in commit message: elpased -> elapsed

mbasmanova

@yingsu00 Nice.

CC: @pettyjamesm

mbasmanova · 2019-08-15T09:01:45Z

presto-main/src/main/java/com/facebook/presto/execution/buffer/PagesSerde.java

-            if ((((double) compressionBuffer.remaining()) / uncompressedSize) <= MINIMUM_COMPRESSION_RATIO) {
-                slice = Slices.wrappedBuffer(compressionBuffer);
+            int maxCompressedSize = compressor.get().maxCompressedLength(uncompressedSize);
+            ensureCompressionBufferCapacity(maxCompressedSize);


would you use Arrays.ensureCapacity?

Hi Masha, I couldn't find this method in Arrays. Did you mean ArrayList.ensureCapacity? We're not using ArrayList here though..

You may need to rebase to latest master. The class is com.facebook.presto.orc.reader.Arrays.

Oh I see. But this class is in presto-orc and PagesSerde is in presto-main and can't import that class. It seems this class could be moved to presto-spi or airlift-equivalent package since other operators might be able to use the methods in them.

That's a good point. How would you feel about moving it to presto-arrays?

Actually presto-arrays looks a better place.

Moved it to presto-arrays. #13284
Once it's in, I'll change the ensureCompressionBufferCapacity() call to use Arrays class.

Updated the PR to use Arrays.ensureCapacity()

presto-main/src/test/java/com/facebook/presto/operator/BenchmarkByteBuffer.java

mbasmanova · 2019-08-15T09:07:07Z

@yingsu00 I feel like this change is worth mentioning in the release notes. Would you add a line in the PR's description?

pettyjamesm · 2019-08-15T16:13:29Z

presto-main/src/test/java/com/facebook/presto/operator/BenchmarkByteBuffer.java

+        @Setup
+        public void setup()
+        {
+            RANDOM.nextBytes(byteValues);


It's worth noting that using random bytes for the test data will trigger LZ4's "uncompressible fast-path", so it's likely your benchmark isn't accurately including normal compression performance characteristics.

Regarding byte[] vs ByteBuffer, I also found the performance to be comparable between the two and using either should be fine.

Thanks for pointing out the "uncompressible fast-path". I changed it to interleaved runs of random values and 0's and now the benchmark results are

Benchmark Mode Cnt Score Error Units
BenchmarkCompressToByteBuffer.compressToByteArray avgt 60 181.677 ± 9.927 us/op
BenchmarkCompressToByteBuffer.compressToByteBuffer avgt 60 189.371 ± 3.747 us/op

pettyjamesm · 2019-08-15T16:19:01Z

presto-main/src/main/java/com/facebook/presto/execution/buffer/PagesSerde.java

-                slice = Slices.wrappedBuffer(compressionBuffer);
+            int maxCompressedSize = compressor.get().maxCompressedLength(uncompressedSize);
+            ensureCompressionBufferCapacity(maxCompressedSize);
+            int compressedSize = compressor.get().compress((byte[]) slice.getBase(), 0, uncompressedSize, compressionBuffer, 0, maxCompressedSize);


While using 0 as the offset may be safe in this particular context, you might consider changing it to this code so it's more resilient to future changes or someone copy/pasting this code without understanding it.

pettyjamesm · 2019-08-15T16:36:43Z

presto-main/src/main/java/com/facebook/presto/execution/buffer/PagesSerde.java

@@ -47,6 +47,8 @@
    private final Optional<Decompressor> decompressor;
    private final Optional<SpillCipher> spillCipher;

+    private byte[] compressionBuffer;


My worry about this approach is basically the same as I mentioned on a previous PR: this buffer will grow to the compressor.maxCompressedLength(uncompressedSize) of the largest input it receives and will not be eligible for garbage collection until the whole PagesSerde instance is. This will be true for all PagesSerde instances, so while adding this buffer reduces GC pressure by allocating less (and spending less time spent zeroing memory), it could risk increasing overall memory usage and cause out of memory problems when heap space is constrained.

This is less of a concern if all pages serialized are small, but serializing one large page will having a lasting effect on the untracked memory footprint of a PagesSerde.

I see a few options to mitigate this problem:

Restrict the maximum size of serialized pages to set an upper bound on memory pressure

Restrict the maximum size of the compressionBuffer. If maxCompressedLength is greater than this limit, a new byte array is created and not stored for future use.

Add a method that allows callers to signal that the buffer should be released. This will require code changes for all usages of PagesSerde but would make it easier to account for query memory tracking.

Pass a shared buffer pool into the PagesSerde#serialize method. This would allow buffer reuse between instances (even fewer total allocations) and decouple the lifespan of the buffer from the lifespan of the PagesSerde.

@mbasmanova thoughts on the options mentioned above?

This is a legit point. "Restrict the maximum size" sounds a reasonable approach. Can we profile the distribution of maxCompressedLength in production? This can give us the insight to decide the max size we may need.

@pettyjamesm @highker I'm seeing that we limit the size of the pages being serialized to 1MB. Perhaps, we could add a checkArgument(page.getSizeInBytes() <= DEFAULT_MAX_PAGE_SIZE_IN_BYTES) to serialize method. What do you think?

public static final int DEFAULT_MAX_PAGE_SIZE_IN_BYTES = 1024 * 1024; List<SerializedPage> serializedPages = splitPage(page, DEFAULT_MAX_PAGE_SIZE_IN_BYTES).stream() .map(serde::serialize) .collect(toImmutableList());

Hi all, thanks for your inputs.

Currently PagesSerde objects are used by PartitionedOutputOperator, TaskOutputOperator, ExchangeOperator, MergeOperator and Query class. The first two operators deal with the serialization while the last 3 deals with deserialization and not in the scope of this PR. By looking at the usse cases of PartitionedOutputOperator and TaskOutputOperator, I don't think keeping the compression buffer as a member of PagesSerde a big problem:

There is only one PagesSerde object per operator, not per table column or per destination partition. And there's only one compressionBuffer per PagesSerde.

The sizes of the SerializedPage being compressed would be pretty much uniform, closing to DEFAULT_MAX_PAGE_SIZE_IN_BYTES (1MB by default). For the new OptimizedPartitionedOutputOperator this size is almost always close to DEFAULT_MAX_PAGE_SIZE_IN_BYTES. The old PartitionedOutputOperator has slightly higher variance but the size is still quite predictable. This is because we always accumulate data up to this size before the serialization and flushing happens, except for the last page. And the compressed size should be less than or equal to the uncompressed size, which is bounded by 1MB almost all time. There are only a fixed number of threads on each worker node and the overhead would be at the level of hundreds of MB per node.

The PagesSerde objects only lives through these operators but not the global life time of Presto process. The PagesSerde objects and the compression buffer will be collected after the lifetime of these operators (stage execution) ends. If we don't keep the compression buffer as a member, the operators would nonetheless keep trying to allocate this 1MB buffer for every SerializedPage over and over again throughout the lifetime of the operator. while the proposed change only maintains this 1MB buffer during the same period of time. Sure the compression buffer becomes long lived memory and can't be GCed after the compression is done and before the operator dies, but overhead is small and predictable as we discussed in previous 2 points. I would expect the thousands or even millions of allocation of this buffer contribute to OOM problem more than what materializing this buffer does.

I agree we should track the memory in systemMemoryContext. I will make the change in next iteration.

With the limitation on size and the addition of adding memory tracking, I’m no longer especially concerned with the approach.

I’d rather see a buffer reuse abstraction passed, but that can come later. I’m still in favor of that strategy for a couple reasons:

it can easily extend the benefit to encryption as well as compression (assuming the encryption implementation in prestosql is ported back over)

it can reduce the total allocation rate by sharing and recycling buffers between instances as well as centrally manage amount of memory used

some deployments may be less interested in the performance benefit and more averse to the (admittedly low in this case) risk of cross-query data access. Those deployments could opt to use a “never reuse” pool.

Hi @pettyjamesm By "buffer reuse abstraction passed", did you mean pass a shared buffer pool into the PagesSerde#serialize method? We have some work on the global array pool, and hopefully we'll have time to pick it up this half. For this PR, I can create a config property "experimental.repartition_reuse_compression_buffer" and default it to true.

However for this PR, there was no cross-query data access by reusing the compression buffer. The buffer is owned by the PagesSerde object, which is in term owned by the operator, and the operator is created per query and not reused by many queries.

Yeah, that’s correct. I don’t think adding the config flag is necessary in this PR since the PagesSerDe instance has an implicit per query lifespan already, but it seems obvious that the caller has more context about the lifespan of buffers than embedding the buffer as a field as you’ve done here. For instance, the shared compression buffer needn’t make a copy out of the shared buffer in the case of say, spill to disk since the spiller knows that it can be released after being written to disk but the PagesSerDe doesn’t know that context.

Thanks @pettyjamesm. With the global array pool things will make more sense but that requires much more careful work. Back to this PR, do you think we need to do anything else?

Nope, I’m satisfied with the current state of tracking the buffer memory and enforcing an upper bound on its size.

pettyjamesm

Left some small comments, the only major concern is the memory hazard this shared buffer may pose.

highker

one comment only

presto-main/src/main/java/com/facebook/presto/execution/buffer/PagesSerde.java

highker · 2019-08-15T18:11:12Z

presto-main/src/main/java/com/facebook/presto/execution/buffer/PagesSerde.java

@@ -47,6 +47,8 @@
    private final Optional<Decompressor> decompressor;
    private final Optional<SpillCipher> spillCipher;

+    private byte[] compressionBuffer;


This is a legit point. "Restrict the maximum size" sounds a reasonable approach. Can we profile the distribution of maxCompressedLength in production? This can give us the insight to decide the max size we may need.

BenchmarkPartitionedOutputOperator shows 8% gain in elapsed time: Always allocate compressionBuffer (existing behavior): Benchmark Mode Cnt Score Error Units BenchmarkPartitionedOutputOperator.addPage avgt 20 4749.447 ± 380.463 ms/op Reuse compressionBuffer (current commit): Benchmark Mode Cnt Score Error Units BenchmarkPartitionedOutputOperator.addPage avgt 30 4380.171 ± 148.011 ms/op BenchmarkCompressToByteBuffer shows using ByteBuffer and byte[] as compression buffer has similar performance. We used byte[] because the code is cleaner. Benchmark Mode Cnt Score Error Units BenchmarkCompressToByteBuffer.compressToByteArray avgt 60 181.677 ± 9.927 us/op BenchmarkCompressToByteBuffer.compressToByteBuffer avgt 60 189.371 ± 3.747 us/op

yingsu00 · 2019-08-31T07:13:26Z

@mbasmanova Masha, I have rebased and updated this PR to use Arrays.ensureCapacity. Will you please take a look and merge it if you think it looks good? Thank you!

mbasmanova · 2019-09-03T14:32:42Z

@yingsu00 Thank you.

facebook-github-bot added the CLA Signed label Aug 15, 2019

yingsu00 mentioned this pull request Aug 15, 2019

Optimize PartitionedOutputOperator #13183

Merged

yingsu00 requested a review from a team August 15, 2019 07:57

yingsu00 force-pushed the avoidCopy branch from 235d019 to e69ceff Compare August 15, 2019 07:58

mbasmanova self-assigned this Aug 15, 2019

mbasmanova added the aria Presto Aria performance improvements label Aug 15, 2019

mbasmanova reviewed Aug 15, 2019

View reviewed changes

mbasmanova approved these changes Aug 15, 2019

View reviewed changes

mbasmanova requested review from a team and highker August 15, 2019 09:07

pettyjamesm reviewed Aug 15, 2019

View reviewed changes

highker reviewed Aug 15, 2019

View reviewed changes

yingsu00 force-pushed the avoidCopy branch 3 times, most recently from f6039bb to e8429f7 Compare August 20, 2019 17:40

pettyjamesm approved these changes Aug 23, 2019

View reviewed changes

yingsu00 force-pushed the avoidCopy branch from e8429f7 to 5e131e0 Compare August 31, 2019 05:42

mbasmanova approved these changes Sep 3, 2019

View reviewed changes

mbasmanova merged commit 60a5a5c into prestodb:master Sep 3, 2019

neeradsomanchi mentioned this pull request Sep 9, 2019

Release notes for 0.226 #13340

Closed

pettyjamesm mentioned this pull request Oct 21, 2020

Add PagesSerdeContext abstraction to enable temporary buffer reuse trinodb/trino#5545

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse compressionBuffer when serializing pages #13232

Reuse compressionBuffer when serializing pages #13232

yingsu00 commented Aug 15, 2019 •

edited

Loading

mbasmanova left a comment

mbasmanova left a comment

mbasmanova Aug 15, 2019

yingsu00 Aug 16, 2019

mbasmanova Aug 16, 2019

yingsu00 Aug 16, 2019

mbasmanova Aug 16, 2019

yingsu00 Aug 16, 2019

yingsu00 Aug 25, 2019

yingsu00 Aug 31, 2019

mbasmanova commented Aug 15, 2019

pettyjamesm Aug 15, 2019 •

edited

Loading

yingsu00 Aug 16, 2019 •

edited

Loading

pettyjamesm Aug 15, 2019

pettyjamesm Aug 15, 2019 •

edited

Loading

highker Aug 15, 2019

mbasmanova Aug 15, 2019

yingsu00 Aug 16, 2019 •

edited

Loading

pettyjamesm Aug 16, 2019

yingsu00 Aug 19, 2019

pettyjamesm Aug 19, 2019

yingsu00 Aug 23, 2019

pettyjamesm Aug 23, 2019

pettyjamesm left a comment

highker left a comment

highker Aug 15, 2019

yingsu00 commented Aug 31, 2019

mbasmanova commented Sep 3, 2019

Reuse compressionBuffer when serializing pages #13232

Reuse compressionBuffer when serializing pages #13232

Conversation

yingsu00 commented Aug 15, 2019 • edited Loading

mbasmanova left a comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova commented Aug 15, 2019

pettyjamesm Aug 15, 2019 • edited Loading

Choose a reason for hiding this comment

yingsu00 Aug 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pettyjamesm Aug 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yingsu00 Aug 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pettyjamesm left a comment

Choose a reason for hiding this comment

highker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yingsu00 commented Aug 31, 2019

mbasmanova commented Sep 3, 2019

yingsu00 commented Aug 15, 2019 •

edited

Loading

pettyjamesm Aug 15, 2019 •

edited

Loading

yingsu00 Aug 16, 2019 •

edited

Loading

pettyjamesm Aug 15, 2019 •

edited

Loading

yingsu00 Aug 16, 2019 •

edited

Loading