Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCR: Optimize indexing ops using seq_no on followers #34099

Merged
merged 14 commits into from
Sep 29, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Sep 27, 2018

This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.

/cc @martijnvg @jasontedor

This change introduces the indexing optimization using sequence numbers
on the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader, and replicated to replicas
and followers.
@dnhatn dnhatn added the :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features label Sep 27, 2018
@dnhatn dnhatn requested review from s1monw and bleskes September 27, 2018 00:27
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn changed the title CCR: Optimize indexing operations using seq_no on followers CCR: Optimize indexing ops using seq_no on followers Sep 27, 2018
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some nits. Are you planning to add retry tests in a follow up?

* Checks if the given operation has been processed in this engine or not.
* @return true if the given operation was processed; otherwise false.
*/
protected boolean containsOperation(Operation op) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: call this "hasBeenProcessedBefore"?

* 1. Indexing operations are processed concurrently in an engine. However, operations of the same docID are processed
* one by one under the docID lock.
*
* 2. An engine itself can resolve correctly if an operation is delivered multiple times. However, if an operation is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this note is correct? we don't execute if we see we did before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push something? this statement is not correct. There is no notion of an "optimized op" (for replicas) just an op with a seq# about the MSU. Also "However, if an operation is optimized and delivered multiple times, it will be appended into Lucene more than once." reads weird. Maybe as simple as "Operations that are optimized using the MSU optimization may not be processed twice as this will create duplicates in lucene. To avoid it we check the local checkpoint tracker to see if an operation was already processed".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied your suggestion.

* 4. The following proves that docID(O) does not exist on a follower when operation O is applied if MSU(O) <= LCP < seqno(O):
*
* 4.1) If such operation O' with docID(O’) = docID(O), and LCP < seqno(O’), then MSU(O) >= MSU(O') because O' was
* delivered to the follower before O. MUS(0') on the leader is at least seqno(O) or seqno(0') and both > LCP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0' (zero) -> O'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is MUS(0') on the leader is at least seqno(O) or seqno(0') ?

@dnhatn
Copy link
Member Author

dnhatn commented Sep 28, 2018

@bleskes Thanks for reviewing. I've addressed your comments. Would you please have another look?

@dnhatn dnhatn requested a review from bleskes September 28, 2018 01:53
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codes looks great. I left more comments.

* 1. Indexing operations are processed concurrently in an engine. However, operations of the same docID are processed
* one by one under the docID lock.
*
* 2. An engine itself can resolve correctly if an operation is delivered multiple times. However, if an operation is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push something? this statement is not correct. There is no notion of an "optimized op" (for replicas) just an op with a seq# about the MSU. Also "However, if an operation is optimized and delivered multiple times, it will be appended into Lucene more than once." reads weird. Maybe as simple as "Operations that are optimized using the MSU optimization may not be processed twice as this will create duplicates in lucene. To avoid it we check the local checkpoint tracker to see if an operation was already processed".

*
* 4. The following proves that docID(O) does not exist on a follower when operation O is applied if MSU_r(O) <= LCP < seqno(O):
*
* 4.1) Given two operations O and O' with docID(O’) = docID(O) and seqno(O) < seqno(O’) then MSU_p(O') on the primary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean MSU_p(O') must be at least seqno(O’) (as O' is an update)

* 4. The following proves that docID(O) does not exist on a follower when operation O is applied if MSU_r(O) <= LCP < seqno(O):
*
* 4.1) Given two operations O and O' with docID(O’) = docID(O) and seqno(O) < seqno(O’) then MSU_p(O') on the primary
* must be at least seqno(O). Moreover, the MSU_r on a follower >= min(seqno(O), seqno(O')) after these operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, the MSU_r on a follower >= min(seqno(O), seqno(O')) after these operations

I still don't follow this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I break it into two cases. I hope this is clear now.

@dnhatn
Copy link
Member Author

dnhatn commented Sep 28, 2018

@bleskes I've addressed your comments. Could you please take another look? Thank you!

@dnhatn dnhatn requested a review from bleskes September 28, 2018 15:09
@dnhatn
Copy link
Member Author

dnhatn commented Sep 28, 2018

Discussed with Boaz on another channel, we prefer to use the proof from @DaveCTurner and @ywelsch (thank you for making this 👍).

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @dnhatn for the hard work.

* 2. Also MSU(O) <= MSU <= LCP < seqno(O) (we discard O if seqno(O) ≤ LCP) so the second invariant applies,
* meaning that the O' was a delete.
* <p>
* Moreover, operations that are optimized using the MSU optimization will not be processed twice as this will create duplicates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a sentence - "Therefore, if MSU<= LCP < seqno(O) we know that O can safely be optimized with and added to lucene with addDocument. Moreover, operations"...

@dnhatn
Copy link
Member Author

dnhatn commented Sep 29, 2018

Thanks @bleskes and @s1monw.

@dnhatn dnhatn merged commit ad61398 into elastic:master Sep 29, 2018
@dnhatn dnhatn deleted the optimize-follower branch September 29, 2018 00:42
dnhatn added a commit that referenced this pull request Sep 29, 2018
This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.

Relates #33656
dnhatn added a commit that referenced this pull request Oct 1, 2018
Since #34099, the FollowingEngine will skip an operation which was
already processed before. With that change, it should be okay to unmute
testFollowIndexAndCloseNode.
dnhatn added a commit that referenced this pull request Oct 1, 2018
Since #34099, the FollowingEngine will skip an operation which was
already processed before. With that change, it should be okay to unmute
testFollowIndexAndCloseNode.
kcm pushed a commit that referenced this pull request Oct 30, 2018
This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.

Relates #33656
kcm pushed a commit that referenced this pull request Oct 30, 2018
Since #34099, the FollowingEngine will skip an operation which was
already processed before. With that change, it should be okay to unmute
testFollowIndexAndCloseNode.
dnhatn added a commit that referenced this pull request Jul 5, 2019
This PR enables the indexing optimization using sequence numbers on
replicas. With this optimization, indexing on replicas should be faster
and use less memory as it can forgo the version lookup when possible.
This change also deactivates the append-only optimization on replicas.

Relates #34099
dnhatn added a commit that referenced this pull request Jul 6, 2019
This PR enables the indexing optimization using sequence numbers on
replicas. With this optimization, indexing on replicas should be faster
and use less memory as it can forgo the version lookup when possible.
This change also deactivates the append-only optimization on replicas.

Relates #34099
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants