Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Closed topics won't be removed from the cache #23884

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shibd
Copy link
Member

@shibd shibd commented Jan 23, 2025

Motivation

When transaction component init fails, it will try to close the topic, but maybe not remove topic from cache.

time thread createTransactionBufferSystemTopicClient thread create topic
1 topic ledger open complete and new PersistentTopic object
2 new TransactionBuffer
3 Try get AbortedTxnProcessor
4 Failed to create snapshot writer and close topic
5 set topic fenced and closing
6 try remove topic from broker cache, since the topic feature is not done, will ignore it
7 Topic future success complete

Modifications

  • Make topic.close run after topicFuture complete

Verifying this change

  • You can run testNoOrphanClosedTopicIfTxnInternalFailed to reproduce this.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@shibd shibd self-assigned this Jan 23, 2025
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 23, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 48.00000% with 26 lines in your changes missing coverage. Please review.

Project coverage is 74.15%. Comparing base (bbc6224) to head (85302cb).
Report is 865 commits behind head on master.

Files with missing lines Patch % Lines
...r/impl/SnapshotSegmentAbortedTxnProcessorImpl.java 0.00% 20 Missing ⚠️
...er/impl/SingleSnapshotAbortedTxnProcessorImpl.java 66.66% 2 Missing and 1 partial ⚠️
...ransaction/buffer/impl/TopicTransactionBuffer.java 85.71% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23884      +/-   ##
============================================
+ Coverage     73.57%   74.15%   +0.58%     
+ Complexity    32624    32203     -421     
============================================
  Files          1877     1853      -24     
  Lines        139502   143659    +4157     
  Branches      15299    16315    +1016     
============================================
+ Hits         102638   106534    +3896     
+ Misses        28908    28722     -186     
- Partials       7956     8403     +447     
Flag Coverage Δ
inttests 26.66% <0.00%> (+2.07%) ⬆️
systests 23.13% <0.00%> (-1.19%) ⬇️
unittests 73.68% <48.00%> (+0.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...er/impl/SingleSnapshotAbortedTxnProcessorImpl.java 83.14% <66.66%> (-0.03%) ⬇️
...ransaction/buffer/impl/TopicTransactionBuffer.java 86.54% <85.71%> (-1.21%) ⬇️
...r/impl/SnapshotSegmentAbortedTxnProcessorImpl.java 78.81% <0.00%> (+0.29%) ⬆️

... and 1022 files with indirect coverage changes

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reduce code duplication

Comment on lines +60 to +70
// Don't directly use the topic object to close, because the topicFuture might not
// be completed at that time, which could leave closed topics in the cache(at BrokerService).
CompletableFuture<Optional<Topic>> topicFuture = topic.getBrokerService().getTopics().get(topic.getName());
if (topicFuture != null) {
topicFuture.thenAccept(t -> t.ifPresent(v -> {
v.close(true).exceptionally(ec -> {
log.error("Close topic {} exception", v.getName(), ec);
return null;
});
}));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is now duplicated in multiple locations. Instead of adding code duplication,
I'd suggest to create a new public method directly in BrokerService which called closeTopicForcefullyIfExists. The javadoc should describe the purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +73 to +76
BrokerService brokerService = pulsar.getBrokerService();
Field pulsarStatsField = BrokerService.class.getDeclaredField("pulsarStats");
pulsarStatsField.setAccessible(true);
PulsarStats pulsarStats = brokerService.getPulsarStats();
Copy link
Contributor

@BewareMyPower BewareMyPower Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using reflection as much as possible in tests. It makes refactoring very hard. My experience refactoring some code of Pulsar is really terrible due to the reflection everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you want to delay the openLedgerComplete method by sleeping 1 second in recordTopicLoadTimeValue so that topicFuture.complete will happen after the transaction failure.

But I think you can customize the TopicFactory to achieve the same goal. You can override the create method to create a PersistentTopic's subclass whose initialize() method will be delayed for some time

topic.close();
return null;
});
log.error("{} Failed to create snapshot writer", topic.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add ex to the log message here?

// be completed at that time, which could leave closed topics in the cache(at BrokerService).
CompletableFuture<Optional<Topic>> topicFuture = topic.getBrokerService().getTopics().get(topic.getName());
if (topicFuture != null) {
topicFuture.thenAccept(t -> t.ifPresent(v -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the value of topics is an Optional. I just have a question that when could the value be Optional.empty()? If so, an empty Optional will be cached and never removed.

if (topicFuture != null) {
topicFuture.thenAccept(t -> t.ifPresent(v -> {
v.close(true).exceptionally(ec -> {
log.error("Close topic {} exception", v.getName(), ec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If close failed, the topic will still be cached. Should you call BrokerService#removeTopicFromCache in this case?

Comment on lines +135 to +149
TopicTransactionBuffer(PersistentTopic topic, AbortedTxnProcessor snapshotAbortedTxnProcessor,
AbortedTxnProcessor.SnapshotType snapshotType) {
super(State.None);
this.topic = topic;
this.timer = topic.getBrokerService().getPulsar().getTransactionTimer();
this.takeSnapshotIntervalNumber = topic.getBrokerService().getPulsar()
.getConfiguration().getTransactionBufferSnapshotMaxTransactionCount();
this.takeSnapshotIntervalTime = topic.getBrokerService().getPulsar()
.getConfiguration().getTransactionBufferSnapshotMinTimeInMillis();
this.maxReadPosition = topic.getManagedLedger().getLastConfirmedEntry();
this.snapshotAbortedTxnProcessor = snapshotAbortedTxnProcessor;
this.snapshotType = snapshotType;
this.maxReadPositionCallBack = topic.getMaxReadPositionCallBack();
this.recover();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new constructor overload has much duplicated code with the original constructor. Please reuse the code. It only just reduces duplicated code, but also simulate the real case more likely.

Here is a bad example. Assuming there is a Demo class whose constructor is:

public Demo(Argument arg) {
    this.field1 = arg.f();
    this.field2 = arg.g();
}

If you wanted to pass a mocked field2 and added a new constructor;

Demo(Argument arg, Field2 field2) {
    this.field1 = arg.h();
    this.field2 = field2;
}

Then the new constructor will be meaningless. Because now field is not arg.f() anymore.

@shibd
Copy link
Member Author

shibd commented Jan 24, 2025

hi, @lhotari @BewareMyPower @dao-jun Thanks for details review,

After discussing with @poorbarcode , we've decided to pass createFuture into the topic and then wrap the close method. I'll push a new version later.

@shibd shibd marked this pull request as draft January 24, 2025 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test release/3.0.10 release/3.3.5 release/4.0.3 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants