This repository has been archived by the owner on Jan 24, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 138
Use pooled direct memory allocator when decoding Pulsar entry to Kafka records #673
Merged
BewareMyPower
merged 5 commits into
streamnative:master
from
BewareMyPower:bewaremypower/direct-buffer-decode
Aug 24, 2021
Merged
Use pooled direct memory allocator when decoding Pulsar entry to Kafka records #673
BewareMyPower
merged 5 commits into
streamnative:master
from
BewareMyPower:bewaremypower/direct-buffer-decode
Aug 24, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BewareMyPower
added
the
type/enhancement
Indicates an improvement to an existing feature
label
Aug 23, 2021
BewareMyPower
changed the title
Use pooled direct memory allocator when decoding Pulsar entry to Kafka records
[WIP] Use pooled direct memory allocator when decoding Pulsar entry to Kafka records
Aug 23, 2021
BewareMyPower
changed the title
[WIP] Use pooled direct memory allocator when decoding Pulsar entry to Kafka records
Use pooled direct memory allocator when decoding Pulsar entry to Kafka records
Aug 23, 2021
Demogorgon314
approved these changes
Aug 23, 2021
BewareMyPower
added a commit
that referenced
this pull request
Aug 25, 2021
…a records (#673) ### Motivation When a Pulsar entry is decoded to Kafka record in `ByteBufUtils#decodePulsarEntryToKafkaRecords`, a NIO buffer whose initial capacity is 1 MB will be allocated from heap memory. Therefore, each time an entry is read, 1 MB heap memory will be allocated. Then the heap memory will increase very quickly and GC will happen frequently. Kafka `MemoryRecordsBuilder` uses its underlying `ByteBufferOutputStream` field as the internal buffer whose capacity can be increased in `write` method. Even if a direct buffer was allocated by Netty's pooled direct memory allocator and its underlying `ByteBuffer` was passed to `ByteBufferOutputStream`'s constructor, if the reallocation happened, the new buffer could still be allocated from heap memory. ### Modification This PR adds a `DirectBufferOutputStream` class that inherits from `ByteBufferOutputStream` and overrides some methods that can be called in `MemoryRecordsBuilder`. This class uses Pulsar's default `ByteBufAllocator` to allocate memory. The other methods' behaviors are the same with `ByteBufferOutputStream`. A unit test is added to verify that the `MemoryRecordsBuilder` will build the same records no matter the underlying `ByteBufferOutputStream` is `ByteBufferOutputStream` or `DirectBufferOutputStream`. Three cases are tested in this test: 1. The initial capacity is less than the size of records header, in this case, `position(int)` method will be called to increase the capacity. 2. The initial capacity is greater than both the size of records header and the total size of records. 3. The initial capacity is greater than the size of records header but less than the total size of records, in this case, `write()` method will increase the capacity automatically. Then, a `DirectBufferOutputStream` instance is passed to `MemoryRecordsBuilder`'s constructor in `ByteBufUtils#decodePulsarEntryToKafkaRecords` and the return value's type is changed to `DecodeResult` because we need to release the `ByteBuf` later.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
When a Pulsar entry is decoded to Kafka record in
ByteBufUtils#decodePulsarEntryToKafkaRecords
, a NIO buffer whose initial capacity is 1 MB will be allocated from heap memory. Therefore, each time an entry is read, 1 MB heap memory will be allocated. Then the heap memory will increase very quickly and GC will happen frequently.Kafka
MemoryRecordsBuilder
uses its underlyingByteBufferOutputStream
field as the internal buffer whose capacity can be increased inwrite
method. Even if a direct buffer was allocated by Netty's pooled direct memory allocator and its underlyingByteBuffer
was passed toByteBufferOutputStream
's constructor, if the reallocation happened, the new buffer could still be allocated from heap memory.Modification
This PR adds a
DirectBufferOutputStream
class that inherits fromByteBufferOutputStream
and overrides some methods that can be called inMemoryRecordsBuilder
. This class uses Pulsar's defaultByteBufAllocator
to allocate memory. The other methods' behaviors are the same withByteBufferOutputStream
.A unit test is added to verify that the
MemoryRecordsBuilder
will build the same records no matter the underlyingByteBufferOutputStream
isByteBufferOutputStream
orDirectBufferOutputStream
. Three cases are tested in this test:position(int)
method will be called to increase the capacity.write()
method will increase the capacity automatically.Then, a
DirectBufferOutputStream
instance is passed toMemoryRecordsBuilder
's constructor inByteBufUtils#decodePulsarEntryToKafkaRecords
and the return value's type is changed toDecodeResult
because we need to release theByteBuf
later.