-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse compressionBuffer when serializing pages #13232
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
100 changes: 100 additions & 0 deletions
100
presto-main/src/test/java/com/facebook/presto/operator/BenchmarkCompressToByteBuffer.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
/* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.facebook.presto.operator; | ||
|
||
import io.airlift.compress.Compressor; | ||
import io.airlift.compress.lz4.Lz4Compressor; | ||
import io.airlift.slice.Slice; | ||
import io.airlift.slice.Slices; | ||
import org.openjdk.jmh.annotations.Benchmark; | ||
import org.openjdk.jmh.annotations.BenchmarkMode; | ||
import org.openjdk.jmh.annotations.Fork; | ||
import org.openjdk.jmh.annotations.Measurement; | ||
import org.openjdk.jmh.annotations.Mode; | ||
import org.openjdk.jmh.annotations.OutputTimeUnit; | ||
import org.openjdk.jmh.annotations.Scope; | ||
import org.openjdk.jmh.annotations.Setup; | ||
import org.openjdk.jmh.annotations.State; | ||
import org.openjdk.jmh.annotations.Warmup; | ||
import org.openjdk.jmh.runner.Runner; | ||
import org.openjdk.jmh.runner.RunnerException; | ||
import org.openjdk.jmh.runner.options.Options; | ||
import org.openjdk.jmh.runner.options.OptionsBuilder; | ||
import org.openjdk.jmh.runner.options.VerboseMode; | ||
|
||
import java.nio.ByteBuffer; | ||
import java.util.Random; | ||
|
||
import static java.util.concurrent.TimeUnit.MICROSECONDS; | ||
import static java.util.concurrent.TimeUnit.MILLISECONDS; | ||
|
||
@State(Scope.Thread) | ||
@OutputTimeUnit(MICROSECONDS) | ||
@Fork(3) | ||
@Warmup(iterations = 20, time = 500, timeUnit = MILLISECONDS) | ||
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS) | ||
@BenchmarkMode(Mode.AverageTime) | ||
public class BenchmarkCompressToByteBuffer | ||
{ | ||
@Benchmark | ||
public void compressToByteBuffer(BenchmarkData data) | ||
{ | ||
data.byteBuffer.mark(); | ||
data.COMPRESSOR.compress(data.slice.toByteBuffer(), data.byteBuffer); | ||
data.byteBuffer.reset(); | ||
} | ||
|
||
@Benchmark | ||
public void compressToByteArray(BenchmarkData data) | ||
{ | ||
data.COMPRESSOR.compress((byte[]) data.slice.getBase(), 0, data.slice.length(), data.bytes, 0, data.MAX_COMPRESSED_SIZE); | ||
} | ||
|
||
@State(Scope.Thread) | ||
public static class BenchmarkData | ||
{ | ||
private static final Random RANDOM = new Random(0); | ||
private static final Compressor COMPRESSOR = new Lz4Compressor(); | ||
private static final int UNCOMPRESSED_SIZE = 1_000_000; | ||
private static final int MAX_COMPRESSED_SIZE = COMPRESSOR.maxCompressedLength(UNCOMPRESSED_SIZE); | ||
|
||
private final byte[] byteValues = new byte[UNCOMPRESSED_SIZE]; | ||
private final Slice slice = Slices.wrappedBuffer(byteValues); | ||
|
||
private final byte[] bytes = new byte[MAX_COMPRESSED_SIZE]; | ||
private final ByteBuffer byteBuffer = ByteBuffer.allocate(MAX_COMPRESSED_SIZE); | ||
|
||
@Setup | ||
public void setup() | ||
{ | ||
// Generate discontinuous runs of random values and 0's to avoid LZ4 enters uncompressible fast-path | ||
int runLength = UNCOMPRESSED_SIZE / 10; | ||
byte[] randomBytes = new byte[runLength]; | ||
for (int i = 0; i < 10; i += 2) { | ||
RANDOM.nextBytes(randomBytes); | ||
System.arraycopy(randomBytes, 0, byteValues, i * runLength, runLength); | ||
} | ||
} | ||
} | ||
|
||
public static void main(String[] args) | ||
throws RunnerException | ||
{ | ||
Options options = new OptionsBuilder() | ||
.verbosity(VerboseMode.NORMAL) | ||
.include(".*" + BenchmarkCompressToByteBuffer.class.getSimpleName() + ".*") | ||
.build(); | ||
new Runner(options).run(); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry about this approach is basically the same as I mentioned on a previous PR: this buffer will grow to the
compressor.maxCompressedLength(uncompressedSize)
of the largest input it receives and will not be eligible for garbage collection until the wholePagesSerde
instance is. This will be true for allPagesSerde
instances, so while adding this buffer reduces GC pressure by allocating less (and spending less time spent zeroing memory), it could risk increasing overall memory usage and cause out of memory problems when heap space is constrained.This is less of a concern if all pages serialized are small, but serializing one large page will having a lasting effect on the untracked memory footprint of a PagesSerde.
I see a few options to mitigate this problem:
compressionBuffer
. If maxCompressedLength is greater than this limit, a new byte array is created and not stored for future use.PagesSerde
but would make it easier to account for query memory tracking.PagesSerde#serialize
method. This would allow buffer reuse between instances (even fewer total allocations) and decouple the lifespan of the buffer from the lifespan of thePagesSerde
.@mbasmanova thoughts on the options mentioned above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a legit point. "Restrict the maximum size" sounds a reasonable approach. Can we profile the distribution of
maxCompressedLength
in production? This can give us the insight to decide the max size we may need.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pettyjamesm @highker I'm seeing that we limit the size of the pages being serialized to 1MB. Perhaps, we could add a
checkArgument(page.getSizeInBytes() <= DEFAULT_MAX_PAGE_SIZE_IN_BYTES)
toserialize
method. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all, thanks for your inputs.
Currently PagesSerde objects are used by PartitionedOutputOperator, TaskOutputOperator, ExchangeOperator, MergeOperator and Query class. The first two operators deal with the serialization while the last 3 deals with deserialization and not in the scope of this PR. By looking at the usse cases of PartitionedOutputOperator and TaskOutputOperator, I don't think keeping the compression buffer as a member of PagesSerde a big problem:
There is only one PagesSerde object per operator, not per table column or per destination partition. And there's only one compressionBuffer per PagesSerde.
The sizes of the SerializedPage being compressed would be pretty much uniform, closing to DEFAULT_MAX_PAGE_SIZE_IN_BYTES (1MB by default). For the new OptimizedPartitionedOutputOperator this size is almost always close to DEFAULT_MAX_PAGE_SIZE_IN_BYTES. The old PartitionedOutputOperator has slightly higher variance but the size is still quite predictable. This is because we always accumulate data up to this size before the serialization and flushing happens, except for the last page. And the compressed size should be less than or equal to the uncompressed size, which is bounded by 1MB almost all time. There are only a fixed number of threads on each worker node and the overhead would be at the level of hundreds of MB per node.
The PagesSerde objects only lives through these operators but not the global life time of Presto process. The PagesSerde objects and the compression buffer will be collected after the lifetime of these operators (stage execution) ends. If we don't keep the compression buffer as a member, the operators would nonetheless keep trying to allocate this 1MB buffer for every SerializedPage over and over again throughout the lifetime of the operator. while the proposed change only maintains this 1MB buffer during the same period of time. Sure the compression buffer becomes long lived memory and can't be GCed after the compression is done and before the operator dies, but overhead is small and predictable as we discussed in previous 2 points. I would expect the thousands or even millions of allocation of this buffer contribute to OOM problem more than what materializing this buffer does.
I agree we should track the memory in systemMemoryContext. I will make the change in next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the limitation on size and the addition of adding memory tracking, I’m no longer especially concerned with the approach.
I’d rather see a buffer reuse abstraction passed, but that can come later. I’m still in favor of that strategy for a couple reasons:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pettyjamesm By "buffer reuse abstraction passed", did you mean pass a shared buffer pool into the PagesSerde#serialize method? We have some work on the global array pool, and hopefully we'll have time to pick it up this half. For this PR, I can create a config property "experimental.repartition_reuse_compression_buffer" and default it to true.
However for this PR, there was no cross-query data access by reusing the compression buffer. The buffer is owned by the PagesSerde object, which is in term owned by the operator, and the operator is created per query and not reused by many queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that’s correct. I don’t think adding the config flag is necessary in this PR since the PagesSerDe instance has an implicit per query lifespan already, but it seems obvious that the caller has more context about the lifespan of buffers than embedding the buffer as a field as you’ve done here. For instance, the shared compression buffer needn’t make a copy out of the shared buffer in the case of say, spill to disk since the spiller knows that it can be released after being written to disk but the PagesSerDe doesn’t know that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pettyjamesm. With the global array pool things will make more sense but that requires much more careful work. Back to this PR, do you think we need to do anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I’m satisfied with the current state of tracking the buffer memory and enforcing an upper bound on its size.