Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Replication] Fix race condition in stopping replicator while it is starting #13412

Merged

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Dec 20, 2021

Motivation

Modifications

Modify the logic to move the chance for the race condition that leaves the replicator producer running

- there was a chance for a race condition that resulted in the replicator not disconnecting
  if disconnect was called when the replicator was starting.
@lhotari lhotari added type/bug The PR fixed a bug or issue reported a bug area/geo-replication labels Dec 20, 2021
@lhotari lhotari added this to the 2.10.0 milestone Dec 20, 2021
@lhotari lhotari self-assigned this Dec 20, 2021
@github-actions
Copy link

@lhotari:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@lhotari lhotari added the doc-not-needed Your PR changes do not impact docs label Dec 20, 2021
@github-actions
Copy link

@lhotari:Thanks for providing doc info!

@eolivelli
Copy link
Contributor

the code in closeProducerAsync is still not correct

because we are setting "producer" to null in a different thread.

every access o the "producer" field should be inside a synchronized block

we could move producer to a AtomicReference variable, or in any case we have to rework how we are accessing that variable.

alternatively it is better to assign a non-null value only once and never set the value back to null again

@lhotari
Copy link
Member Author

lhotari commented Dec 20, 2021

because we are setting "producer" to null in a different thread.

every access o the "producer" field should be inside a synchronized block

producer is a volatile field:

I guess it's intentional that synchronized isn't used.
The goal of this PR was to fix the immediate issue which I believe gets fixed when the producer != null condition is removed.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhotari you are right

this patch makes sense and it fixes your problem.

I had never read the code for this part of Pulsar and I was surprised.

lhotari added a commit to datastax/pulsar that referenced this pull request Dec 20, 2021
- there was a chance for a race condition that resulted in the replicator not disconnecting
  if disconnect was called when the replicator was starting.

upstream pull request apache#13412
Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lhotari lhotari merged commit d7ef99b into apache:master Dec 21, 2021
Technoboy- pushed a commit to Technoboy-/pulsar that referenced this pull request Dec 29, 2021
…ng (apache#13412)

- there was a chance for a race condition that resulted in the replicator not disconnecting
  if disconnect was called when the replicator was starting.
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Feb 28, 2022
- there was a chance for a race condition that resulted in the replicator not disconnecting
  if disconnect was called when the replicator was starting.

upstream pull request apache#13412

(cherry picked from commit a11da0f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/geo-replication doc-not-needed Your PR changes do not impact docs type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants