-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Replication] Fix race condition in stopping replicator while it is starting #13412
[Replication] Fix race condition in stopping replicator while it is starting #13412
Conversation
- there was a chance for a race condition that resulted in the replicator not disconnecting if disconnect was called when the replicator was starting.
@lhotari:Thanks for your contribution. For this PR, do we need to update docs? |
@lhotari:Thanks for providing doc info! |
the code in closeProducerAsync is still not correct pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java Line 160 in a5830ae
because we are setting "producer" to null in a different thread. every access o the "producer" field should be inside a we could move producer to a AtomicReference variable, or in any case we have to rework how we are accessing that variable. alternatively it is better to assign a non-null value only once and never set the value back to null again |
producer is a volatile field: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java Line 48 in a5830ae
I guess it's intentional that synchronized isn't used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhotari you are right
this patch makes sense and it fixes your problem.
I had never read the code for this part of Pulsar and I was surprised.
- there was a chance for a race condition that resulted in the replicator not disconnecting if disconnect was called when the replicator was starting. upstream pull request apache#13412
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…ng (apache#13412) - there was a chance for a race condition that resulted in the replicator not disconnecting if disconnect was called when the replicator was starting.
- there was a chance for a race condition that resulted in the replicator not disconnecting if disconnect was called when the replicator was starting. upstream pull request apache#13412 (cherry picked from commit a11da0f)
Motivation
similar motivation as [pulsar-broker] Add stop replicator producer logic when start replicator cluster failed #12724
there was a chance for a race condition that resulted in the replicator not disconnecting if disconnect was called when the replicator was starting.
Modifications
Modify the logic to move the chance for the race condition that leaves the replicator producer running