[improve][ml] Do not switch thread to execute asyncAddEntry's core logic #23940

BewareMyPower · 2025-02-06T12:57:38Z

Motivation

In protocol handler's implementation, sometimes it needs to record the current base offset before the actual asyncAddEntry call (or the wrapped PersistentTopic#publishMessages call), for example:

    private final PersistentTopic persistentTopic;
    private final Set<Long> pendingBaseOffsets = ConcurrentHashMap.newKeySet();

    private synchronized void add(ByteBuf buffer, int batchSize) {
        final var ml = (ManagedLedgerImpl) persistentTopic.getManagedLedger();
        final var interceptor = (ManagedLedgerInterceptorImpl) ml.getManagedLedgerInterceptor();
        final var baseOffset = interceptor.getIndex(); // the base offset of the next batch to write
        pendingBaseOffsets.add(baseOffset);
        ml.asyncAddEntry(buffer, batchSize, new AsyncCallbacks.AddEntryCallback() {

            @Override
            public void addComplete(Position position, ByteBuf entryData, Object ctx) {
                if (!pendingBaseOffsets.remove(baseOffset)) {
                    log.error("Failed to remove {}", baseOffset);
                }

However, the existing implementation won't work as expected that even if the downstream protocol handler guarantees that asyncAddEntry is called in a single thread. See

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

Lines 804 to 809 in a19eaa2

    
           // Jump to specific thread to avoid contention from writers writing from different threads 
        
           executor.execute(() -> { 
        
               OpAddEntry addOperation = OpAddEntry.createNoRetainBuffer(this, buffer, numberOfMessages, callback, ctx, 
        
                       currentLedgerTimeoutTriggered); 
        
               internalAsyncAddEntry(addOperation); 
        
           });

First, the code above does not make sense because internalAsyncAddEntry itself is already synchronized.

Second, to guarantee thread safety for concurrent add operations, switching to another thread is not more efficient than synchronizing the method call. Mostly, it's less efficient because execute involves with at least two method calls (offer and poll) on a BlockingQueue, as well as some other operations (like CAS). Take ArrayBlockingQueue as example, both its offer and poll implementations need to acquire the internal lock.

Third, switching to another thread to execute the core logic is anti-intuitive, especially for modifying some fields that represent the current states. For example, to achieve the same goal at the beginning, I have to write the code like:

        ml.getExecutor().execute(() -> {
            pendingBaseOffsets.add(baseOffset);
            ml.asyncAddEntry(buffer, batchSize, callback, ctx);
        });

Modifications

Split internalAsyncAddEntry to two methods beforeAddEntryToQueue and afterAddEntryToQueue that are called before and after adding the operation to pendingAddEntries. Then simplify the code by throwing an exception and pass the exception to the callback. It's also more efficient because OpAddEntry#failed won't be called in the synchronized block.

Add testBeforeAddEntry to protect the change.

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

dao-jun · 2025-02-07T05:27:09Z

In protocol handler's implementation, sometimes it needs to record the current base offset before the actual asyncAddEntry call (or the wrapped PersistentTopic#publishMessages call)

I don't understand this part, could you pls explain more clear?

BewareMyPower · 2025-02-07T05:48:10Z

@dao-jun I found this issue when I implemented the Kafka transaction. In Kafka, analyzeAndValidateProducerState is called before writing messages, this method will update a map that maps the next message's offset (key) to an object and it will remove the key after the record batch is written.

For example, assuming there are 3 ongoing writes, the flow looks like the following pseudo code:

for (int i = 0; i < 3; i++) {
    long nextOffset = interceptor.getIndex() + 1; // LEO
    ongoingTxns.put(nextOffset, txn); // record the ongoing txn of this record batch
    asyncAddEntry().thenAccept(__ -> ongoingTxns.remove(nextOffset));
}

Assuming each record batch has only 1 message, ideally, after these 3 writes, ongoingTxns will have 3 keys (0, 1, 2). However, since asyncAddEntry switches thread to the ML's executor to call interceptor's beforeAddEntry method, there is a chance that nextOffset is always 0 in these 3 loops and ongoingTxns.remove(nextOffset) will take effect only once. Then it could hit here in the actual code, which is much different from the OSS KoP.

The details above are beyond the scope of this PR but could help you understand the motivation.

dao-jun · 2025-02-07T16:57:46Z

@dao-jun I found this issue when I implemented the Kafka transaction. In Kafka, analyzeAndValidateProducerState is called before writing messages, this method will update a map that maps the next message's offset (key) to an object and it will remove the key after the record batch is written.

For example, assuming there are 3 ongoing writes, the flow looks like the following pseudo code:
for (int i = 0; i < 3; i++) {
    long nextOffset = interceptor.getIndex() + 1; // LEO
    ongoingTxns.put(nextOffset, txn); // record the ongoing txn of this record batch
    asyncAddEntry().thenAccept(__ -> ongoingTxns.remove(nextOffset));
}
Assuming each record batch has only 1 message, ideally, after these 3 writes, ongoingTxns will have 3 keys (0, 1, 2). However, since asyncAddEntry switches thread to the ML's executor to call interceptor's beforeAddEntry method, there is a chance that nextOffset is always 0 in these 3 loops and ongoingTxns.remove(nextOffset) will take effect only once. Then it could hit here in the actual code, which is much different from the OSS KoP.

The details above are beyond the scope of this PR but could help you understand the motivation.

Thanks for your explain!

BewareMyPower · 2025-02-08T06:52:40Z

Some shadow topic related tests failed, I will fix them

codecov-commenter · 2025-02-08T09:06:28Z

Codecov Report

Attention: Patch coverage is 84.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 74.19%. Comparing base (bbc6224) to head (4f28483).
Report is 890 commits behind head on master.

Files with missing lines	Patch %	Lines
...okkeeper/mledger/impl/ShadowManagedLedgerImpl.java	25.00%	3 Missing ⚠️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java	95.23%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #23940      +/-   ##
============================================
+ Coverage     73.57%   74.19%   +0.61%     
+ Complexity    32624    32236     -388     
============================================
  Files          1877     1853      -24     
  Lines        139502   143739    +4237     
  Branches      15299    16332    +1033     
============================================
+ Hits         102638   106643    +4005     
+ Misses        28908    28692     -216     
- Partials       7956     8404     +448

Flag	Coverage Δ
inttests	`26.71% <52.00%> (+2.13%)`	⬆️
systests	`23.20% <44.00%> (-1.12%)`	⬇️
unittests	`73.71% <84.00%> (+0.86%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java	`81.87% <95.23%> (+1.21%)`	⬆️
...okkeeper/mledger/impl/ShadowManagedLedgerImpl.java	`59.70% <25.00%> (+4.04%)`	⬆️

... and 1032 files with indirect coverage changes

…gic (#23940) (cherry picked from commit 215b36d)

…gic (apache#23940)

lhotari · 2025-02-13T07:59:06Z

@BewareMyPower Please also remove the comment "Jump to specific thread to avoid contention from writers writing from different threads" that is now obsolete:

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

Line 804 in cc7b381

    
           // Jump to specific thread to avoid contention from writers writing from different threads

lhotari · 2025-02-13T08:00:48Z

@BewareMyPower This logic doesn't either make sense any more after the changes:

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

Lines 801 to 806 in cc7b381

    
           // retain buffer in this thread 
        
           buffer.retain(); 
        
           // Jump to specific thread to avoid contention from writers writing from different threads 
        
           final var addOperation = OpAddEntry.createNoRetainBuffer(this, buffer, numberOfMessages, callback, ctx, 
        
                   currentLedgerTimeoutTriggered);

lhotari · 2025-02-13T08:07:47Z

@BewareMyPower There's a high chance that this change causes performance regressions:

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java

Lines 809 to 811 in cc7b381

    
           // Use synchronized to ensure if `addOperation` is added to queue and fails later, it will be the first 
        
           // element in `pendingAddEntries`. 
        
           synchronized (this) {

In Pulsar use cases, this could possibly happen when there's a large number of producers producing to a topic.
Blocking the IO threads with synchronization will have a larger impact since it will impact Netty IO of all connections sharing the same IO thread.

BewareMyPower added 4 commits February 6, 2025 16:10

[fix][ml] Don't switch thread in asyncAddEntry

cd605fe

Refactor asyncAddEntry

bd71aa4

Add tests for beforeAddEntry

0009959

Improve the test

0ab4404

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 6, 2025

BewareMyPower requested review from lhotari, codelipenghui, gaoran10, dao-jun and Demogorgon314 February 6, 2025 12:58

BewareMyPower self-assigned this Feb 6, 2025

BewareMyPower added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker release/4.0.3 labels Feb 6, 2025

dao-jun approved these changes Feb 7, 2025

View reviewed changes

BewareMyPower marked this pull request as draft February 8, 2025 06:52

Fix shadow ML

4f28483

BewareMyPower marked this pull request as ready for review February 8, 2025 07:47

BewareMyPower merged commit 215b36d into apache:master Feb 10, 2025
52 checks passed

BewareMyPower deleted the bewaremypower/fix-async-add-entry-safety branch February 10, 2025 03:19

BewareMyPower added a commit that referenced this pull request Feb 10, 2025

[improve][ml] Do not switch thread to execute asyncAddEntry's core lo…

a316c52

…gic (#23940) (cherry picked from commit 215b36d)

BewareMyPower added the cherry-picked/branch-4.0 label Feb 10, 2025

hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025

[improve][ml] Do not switch thread to execute asyncAddEntry's core lo…

c597caa

…gic (apache#23940)

BewareMyPower mentioned this pull request Feb 13, 2025

[improve][broker] Don't call ManagedLedger#asyncAddEntry in Netty I/O thread #23983

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][ml] Do not switch thread to execute asyncAddEntry's core logic #23940

[improve][ml] Do not switch thread to execute asyncAddEntry's core logic #23940

BewareMyPower commented Feb 6, 2025

dao-jun commented Feb 7, 2025

BewareMyPower commented Feb 7, 2025 •

edited

Loading

dao-jun commented Feb 7, 2025

BewareMyPower commented Feb 8, 2025

codecov-commenter commented Feb 8, 2025 •

edited

Loading

lhotari commented Feb 13, 2025

lhotari commented Feb 13, 2025

lhotari commented Feb 13, 2025

	// Jump to specific thread to avoid contention from writers writing from different threads
	executor.execute(() -> {
	OpAddEntry addOperation = OpAddEntry.createNoRetainBuffer(this, buffer, numberOfMessages, callback, ctx,
	currentLedgerTimeoutTriggered);
	internalAsyncAddEntry(addOperation);
	});

[improve][ml] Do not switch thread to execute asyncAddEntry's core logic #23940

[improve][ml] Do not switch thread to execute asyncAddEntry's core logic #23940

Conversation

BewareMyPower commented Feb 6, 2025

Motivation

Modifications

Documentation

Matching PR in forked repository

dao-jun commented Feb 7, 2025

BewareMyPower commented Feb 7, 2025 • edited Loading

dao-jun commented Feb 7, 2025

BewareMyPower commented Feb 8, 2025

codecov-commenter commented Feb 8, 2025 • edited Loading

Codecov Report

lhotari commented Feb 13, 2025

lhotari commented Feb 13, 2025

lhotari commented Feb 13, 2025

BewareMyPower commented Feb 7, 2025 •

edited

Loading

codecov-commenter commented Feb 8, 2025 •

edited

Loading