Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][broker][PIP-195] Implement delayed message index bucket snapshot(merge/delete) - part8 #19138

Merged
merged 7 commits into from
Jan 17, 2023

Conversation

coderzc
Copy link
Member

@coderzc coderzc commented Jan 5, 2023

PIP: #16763

Motivation

PIP: #16763

Modifications

Add DelayedIndexQueue interface to abstract CombinedSegmentDelayedIndexQueue and TripleLongPriorityDelayedIndexQueue.

Implement delayed message index bucket snapshot(merge/delete).

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

testMergeSnapshot / DelayedIndexQueueTest

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: coderzc#35

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 5, 2023
@coderzc coderzc self-assigned this Jan 5, 2023
@coderzc coderzc added this to the 2.12.0 milestone Jan 5, 2023
@coderzc coderzc added type/feature The PR added a new feature or issue requested a new feature area/broker labels Jan 5, 2023
Copy link
Contributor

@poorbarcode poorbarcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After BucketDelayedDeliveryTracker.recover, part of the delayed messages will be put into the delayed message queue, but these messages may be reread by the managed cursor. Will this cause repeated delivery?

import org.apache.pulsar.common.util.collections.TripleLongPriorityQueue;

@NotThreadSafe
public class TripleLongPriorityDelayedIndexQueue implements DelayedIndexQueue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that this class is used to make peek and pop easier to use, and it is only used by MutableBucket, should it not exist as a separate public class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove public modifier.

import org.apache.pulsar.broker.delayed.proto.DelayedMessageIndexBucketSnapshotFormat.SnapshotSegment;

@NotThreadSafe
public class CombinedSegmentDelayedIndexQueue implements DelayedIndexQueue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that this class is used to make peek and pop easier to use, and it is only used by merging two queues in BucketDelayedDeliveryTracker, should it not exist as a separate public class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @coderzc

And this comment (^_^)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this

import java.util.Objects;
import org.apache.pulsar.broker.delayed.proto.DelayedMessageIndexBucketSnapshotFormat;

public interface DelayedIndexQueue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is only used by MutableBucket and is not universal. Is this interface not needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove public modifier.

CompletableFuture<Void> removeAFuture = bucketA.asyncDeleteBucketSnapshot();
CompletableFuture<Void> removeBFuture = bucketB.asyncDeleteBucketSnapshot();

return CompletableFuture.allOf(removeAFuture, removeBFuture).thenRun(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do save the newly merged bucket after finished delete the original buckets, it may be caused bucket loss. I think losing buckets is tolerable, right?

Copy link
Contributor

@gaoran10 gaoran10 Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can persist the newly merged bucket, then delete the original buckets. If failed to save the newly merged bucket, the tracker needs to rebuild the index, right?

Copy link
Member Author

@coderzc coderzc Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bucket index is lost, the tracker can rebuild the index, but if there are two overlapped buckets, it will cause data confusion.

Copy link
Contributor

@gaoran10 gaoran10 Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Delete the original bucket index first, then save the newly merged index, this may lose the bucket index, but the tracker will rebuild the index.
  2. Save the newly merged index first, then delete the original bucket index, this may cause intersecting buckets, and we need to find a way to clean up the original buckets.

OK, I think there are two ways, the first way is simple and clear.

Copy link
Member

@mattisonchao mattisonchao Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:

  • When is the index rebuilt?
  • Should we unload the topic to rebuild the index if this operation fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is the index rebuilt?

If index is lost, the tracker will rebuild index when reread by the managed cursor.

Should we unload the topic to rebuild the index if this operation fails?

Yes, we can unload the topic to rebuild the index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark the old segment as deleting or merging? So that we can continue the merge operation after the broker crashes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark the old segment as deleting or merging? So that we can continue the merge operation after the broker crashes.

This will introduce more state in metadata, we can improve it when we really need it

Copy link
Member Author

@coderzc coderzc Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a way to clean up the old overlap buckets when recovering buckets, so that we can persist the newly merged bucket first, then delete the original bucket.

The approach is as follows:
If there is range [1...10][30...40][1...40], then it will only remain [1...40]

@coderzc
Copy link
Member Author

coderzc commented Jan 11, 2023

After BucketDelayedDeliveryTracker.recover, part of the delayed messages will be put into the delayed message queue, but these messages may be reread by the managed cursor. Will this cause repeated delivery?

These messages that have already been added tracker will be skipped when reread by the managed cursor, More see: #19035

@coderzc coderzc force-pushed the bucket_delayed_merge_delete branch from aa805a1 to 621a8e8 Compare January 11, 2023 12:34
Copy link
Contributor

@gaoran10 gaoran10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! LGTM

// TODO merge bucket snapshot (synchronize operate)
try {
asyncMergeBucketSnapshot().get(AsyncOperationTimeoutSeconds, TimeUnit.SECONDS);
} catch (InterruptedException | ExecutionException | TimeoutException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please call Thread.currentThread().interrupt(); if we get InterruptedException.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mattisonchao
I wonder why we should keep the interrupted state. Is this state to be handed over to netty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. But I think we should persist in this state if we don't have any special purpose.

try {
asyncMergeBucketSnapshot().get(AsyncOperationTimeoutSeconds, TimeUnit.SECONDS);
} catch (InterruptedException | ExecutionException | TimeoutException e) {
throw new RuntimeException(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should consider the specific exception?

synchronized (immutableBuckets) {
immutableBuckets.remove(Range.closed(bucket.startLedgerId, bucket.endLedgerId));
}
bucket.asyncDeleteBucketSnapshot();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a log to show the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add exception log in asyncDeleteBucketSnapshot

Copy link
Member

@mattisonchao mattisonchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I need to review the whole logic after this PR to ensure I got all the context.

@coderzc coderzc force-pushed the bucket_delayed_merge_delete branch from 76a8619 to 525e30f Compare January 13, 2023 11:09
@coderzc coderzc force-pushed the bucket_delayed_merge_delete branch from 525e30f to 14b5ed2 Compare January 13, 2023 11:10
@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2023

Codecov Report

Merging #19138 (eb33c21) into master (4b8f447) will decrease coverage by 0.69%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #19138      +/-   ##
============================================
- Coverage     47.00%   46.30%   -0.70%     
- Complexity    10639    18434    +7795     
============================================
  Files           713     1625     +912     
  Lines         69672   132248   +62576     
  Branches       7482    14560    +7078     
============================================
+ Hits          32746    61234   +28488     
- Misses        33212    64735   +31523     
- Partials       3714     6279    +2565     
Flag Coverage Δ
unittests 46.30% <0.00%> (-0.70%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/apache/pulsar/broker/delayed/bucket/Bucket.java 0.00% <0.00%> (ø)
...r/delayed/bucket/BucketDelayedDeliveryTracker.java 0.00% <0.00%> (ø)
...layed/bucket/CombinedSegmentDelayedIndexQueue.java 0.00% <0.00%> (ø)
...ulsar/broker/delayed/bucket/DelayedIndexQueue.java 0.00% <0.00%> (ø)
.../pulsar/broker/delayed/bucket/ImmutableBucket.java 0.00% <0.00%> (ø)
...he/pulsar/broker/delayed/bucket/MutableBucket.java 0.00% <0.00%> (ø)
...ed/bucket/TripleLongPriorityDelayedIndexQueue.java 0.00% <0.00%> (ø)
...pache/pulsar/client/impl/CompactionReaderImpl.java 0.00% <0.00%> (-90.00%) ⬇️
...g/apache/pulsar/client/impl/RawBatchConverter.java 3.12% <0.00%> (-89.07%) ⬇️
.../pulsar/compaction/StrategicTwoPhaseCompactor.java 0.00% <0.00%> (-76.20%) ⬇️
... and 1066 more

@codelipenghui codelipenghui merged commit 9dc490f into apache:master Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants