Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix local processed checkpoint update #2576

Conversation

Poojita-Raj
Copy link
Contributor

Signed-off-by: Poojita Raj [email protected]

Description

For segment replication, previously the local processed checkpoint was not being updated. This is now fixed along with unit tests for the same.

Issues Resolved

Resolves #2358

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Poojita-Raj Poojita-Raj requested a review from a team as a code owner March 23, 2022 21:06
@Poojita-Raj Poojita-Raj force-pushed the feature/segment-replication branch 2 times, most recently from a6c6630 to 52bf531 Compare March 23, 2022 21:15
@Poojita-Raj Poojita-Raj requested review from kartg and mch2 March 23, 2022 21:20
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure e1ff2b4c54a5e6a9b8d1e93424bf3ccaf02175bc
Log 3717

Reports 3717

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure a6c66306313d64689f03b7168a7f2a345756af90
Log 3718

Reports 3718

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 52bf531bb130251fd3caf79e75ffc6cd369c292a
Log 3719

Reports 3719

@Poojita-Raj Poojita-Raj force-pushed the feature/segment-replication branch from 52bf531 to 6251b8e Compare March 23, 2022 21:43
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 6251b8effc07c20369d1d87f14a12db45471d1b7
Log 3721

Reports 3721

Comment on lines 148 to 158
assert Thread.holdsLock(this);
advanceMaxSeqNo(seqNo);
if ((seqNo > persistedCheckpoint.get()) || (seqNo <= processedCheckpoint.get())) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair amount of this code is duplicated from the existing markSeqNo method. Since the first check is simply a bounds check, we can write a shouldUpdateSeqNo method that does this and returns a boolean to decide if further processing is necessary. This can be invoked by wrapper methods that perform the update by compareAndSet (as below) or via updateCheckpoint (as with the original method)

return;
}
try {
processedCheckpoint.compareAndSet(processedCheckpoint.get(), seqNo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be using updateCheckpoint to ensure that the persistedSeqNo is kept up to date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to updateCheckpoint to ensure persistedSeqNo is up to date still takes place when markSeqNoAsPersisted is called. In the case of segrep, we update processedSeqNo just once at the end since indexing doesn't take place on the replica - we omit the call to updateCheckpoint since it checks that all seq numbers till that point are consecutively processed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I think we should capture this bit in the Javadoc for fastForwardProcessedSeqNo:

In the case of segrep, we update processedSeqNo just once at the end since indexing doesn't take place on the replica - we omit the call to updateCheckpoint since it checks that all seq numbers till that point are consecutively processed.

tracker = createEmptyTracker();
}

public void testSimpleSegrepPrimaryProcessed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should really be 3 separate tests since you're testing different use-cases

  1. Base case
  2. Idempotency
  3. Persisted-vs-processed being out-of-sync, checkpoint update for seg-rep, and finally idempotency again

*
* @param seqNo the sequence number to mark as processed
*/
public synchronized void segrepMarkSeqNoAsProcessed(final long seqNo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion here, can we name this after what it's doing (fast forwarding to seqNo) vs the replication strategy that invokes it?

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 0b1dc1ddfba991abb32bbded801fa125b12561c2
Log 3744

Reports 3744

@Poojita-Raj Poojita-Raj force-pushed the feature/segment-replication branch from 0b1dc1d to a428afd Compare March 24, 2022 23:21
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure a428afd8d713c7ac2e86c62f475282cb1a71cbb2
Log 3745

Reports 3745

@Poojita-Raj Poojita-Raj requested review from mch2 and kartg March 24, 2022 23:49
bitSet.set(offset);
if (seqNo == checkPoint.get() + 1) {
updateCheckpoint(checkPoint, bitSetMap);
if (segrep) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this segrep specific logic and it can be inside of fastForwardProcessedSeqNo leaving this method as is. This class shouldn't care about segrep vs non segrep.

It also looks like waitForProcessedOpsToComplete is called from tests only, I'm not sure we have a use case for it with this call path? If not then we won't need to wait/notify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the wait/notify and separated out the segrep specific logic

}

private boolean shouldUpdateSeqNo(final long seqNo, boolean segrep, final AtomicLong checkpoint) {
return !((seqNo <= checkpoint.get()) || (segrep && seqNo > persistedCheckpoint.get()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can put the check against the persisted checkpoint inside of fastForwardProcessedSeqNo and avoid the segrep specific flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the logic in the check to avoid segrep specific flag

}

private void markSeqNo(final long seqNo, final AtomicLong checkPoint, final LongObjectHashMap<CountedBitSet> bitSetMap) {
/**
* Updates the processed checkpoint to the provided sequence number if segment replication is enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - Since there's no check here for seg-rep, the wording of "if segment replication is enabled" is confusing. Rephrase to:

Updates the processed sequence checkpoint to the given value. This does not update the persisted checkpoint value. This method is only used for segment replication.

bitSet.set(offset);
if (seqNo == checkPoint.get() + 1) {
updateCheckpoint(checkPoint, bitSetMap);
if (segrep) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this was done in response to my comment, but this may be too far in the direction of unification 😄

What do you think of the following structure?

  1. Instead of branching on this boolean value, have this method return the boolean from shouldUpdateSeqNo (or in-line that method's logic here).
  2. Have a single method that executes the else part of this - both markSeqNoAsPersisted and markSeqNoAsProcessed can invoke this method. This will cover the existing code flows.
  3. Finally, move the if clause portion directly into fastForwardProcessedSeqNo - this will cover the seg-rep code flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldUpdateSeqNo doesn't have the logic to return a boolean representing segrep, but followed the other suggestions to have separate code flows

markSeqNo(seqNo, processedCheckpoint, null, true);
}

private boolean shouldUpdateSeqNo(final long seqNo, boolean segrep, final AtomicLong checkpoint) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @mch2 alluded to in an earlier comment - if you're having to write variables to model use-cases, then that's usually a sign of an overfit. Consider having this method signature be:

private boolean shouldUpdateSeqNo(final long seqNo, final AtomicLong lowerBound, @Nullable final AtomicLong upperBound)

Also see my other comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the signature suggested

return;
}
try {
processedCheckpoint.compareAndSet(processedCheckpoint.get(), seqNo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I think we should capture this bit in the Javadoc for fastForwardProcessedSeqNo:

In the case of segrep, we update processedSeqNo just once at the end since indexing doesn't take place on the replica - we omit the call to updateCheckpoint since it checks that all seq numbers till that point are consecutively processed.

@mch2 mch2 mentioned this pull request Mar 28, 2022
5 tasks
@Poojita-Raj Poojita-Raj force-pushed the feature/segment-replication branch from a428afd to c35bac2 Compare March 31, 2022 03:02
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c35bac2
Log 3954

Reports 3954

@Poojita-Raj Poojita-Raj merged commit 997ef5a into opensearch-project:feature/segment-replication Apr 1, 2022
Poojita-Raj added a commit to Poojita-Raj/OpenSearch that referenced this pull request Apr 12, 2022
* fix local processed checkpoint update

Signed-off-by: Poojita Raj <[email protected]>

* separated tests + wrapper function

Signed-off-by: Poojita Raj <[email protected]>
Poojita-Raj added a commit that referenced this pull request Apr 13, 2022
…t replication] (#2576) (#2883)

* fix local processed checkpoint update (#2576)

Signed-off-by: Poojita Raj <[email protected]>

* separated tests + wrapper function

Signed-off-by: Poojita Raj <[email protected]>

* moved tests + compareAndSet change

Signed-off-by: Poojita Raj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants