CCR: Optimize indexing ops using seq_no on followers #34099

dnhatn · 2018-09-27T00:27:08Z

This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.

/cc @martijnvg @jasontedor

This change introduces the indexing optimization using sequence numbers on the FollowingEngine. This optimization uses the max_seq_no_updates which is tracked on the primary of the leader, and replicated to replicas and followers.

elasticmachine · 2018-09-27T00:27:10Z

Pinging @elastic/es-distributed

bleskes

LGTM. Left some nits. Are you planning to add retry tests in a follow up?

bleskes · 2018-09-27T14:44:12Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+     * Checks if the given operation has been processed in this engine or not.
+     * @return true if the given operation was processed; otherwise false.
+     */
+    protected boolean containsOperation(Operation op) {


nit: call this "hasBeenProcessedBefore"?

bleskes · 2018-09-27T14:46:03Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngine.java

+         * 1. Indexing operations are processed concurrently in an engine. However, operations of the same docID are processed
+         *    one by one under the docID lock.
+         *
+         * 2. An engine itself can resolve correctly if an operation is delivered multiple times. However, if an operation is


I'm not sure this note is correct? we don't execute if we see we did before?

I've updated this.

Did you forget to push something? this statement is not correct. There is no notion of an "optimized op" (for replicas) just an op with a seq# about the MSU. Also "However, if an operation is optimized and delivered multiple times, it will be appended into Lucene more than once." reads weird. Maybe as simple as "Operations that are optimized using the MSU optimization may not be processed twice as this will create duplicates in lucene. To avoid it we check the local checkpoint tracker to see if an operation was already processed".

I've applied your suggestion.

bleskes · 2018-09-27T14:47:45Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngine.java

+         * 4. The following proves that docID(O) does not exist on a follower when operation O is applied if MSU(O) <= LCP < seqno(O):
+         *
+         *    4.1) If such operation O' with docID(O’) = docID(O), and LCP < seqno(O’), then MSU(O) >= MSU(O') because O' was
+         *         delivered to the follower before O. MUS(0') on the leader is at least seqno(O) or seqno(0') and both > LCP.


0' (zero) -> O'

why is MUS(0') on the leader is at least seqno(O) or seqno(0') ?

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngine.java

.../plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngineTests.java

dnhatn · 2018-09-28T01:53:17Z

@bleskes Thanks for reviewing. I've addressed your comments. Would you please have another look?

s1monw

👍

bleskes

Codes looks great. I left more comments.

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngine.java

bleskes · 2018-09-28T12:18:26Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngine.java

+         * 1. Indexing operations are processed concurrently in an engine. However, operations of the same docID are processed
+         *    one by one under the docID lock.
+         *
+         * 2. An engine itself can resolve correctly if an operation is delivered multiple times. However, if an operation is


Did you forget to push something? this statement is not correct. There is no notion of an "optimized op" (for replicas) just an op with a seq# about the MSU. Also "However, if an operation is optimized and delivered multiple times, it will be appended into Lucene more than once." reads weird. Maybe as simple as "Operations that are optimized using the MSU optimization may not be processed twice as this will create duplicates in lucene. To avoid it we check the local checkpoint tracker to see if an operation was already processed".

bleskes · 2018-09-28T12:20:33Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngine.java

+         *
+         * 4. The following proves that docID(O) does not exist on a follower when operation O is applied if MSU_r(O) <= LCP < seqno(O):
+         *
+         *    4.1) Given two operations O and O' with docID(O’) = docID(O) and seqno(O) < seqno(O’) then MSU_p(O') on the primary


I think you mean MSU_p(O') must be at least seqno(O’) (as O' is an update)

bleskes · 2018-09-28T12:22:40Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngine.java

+         * 4. The following proves that docID(O) does not exist on a follower when operation O is applied if MSU_r(O) <= LCP < seqno(O):
+         *
+         *    4.1) Given two operations O and O' with docID(O’) = docID(O) and seqno(O) < seqno(O’) then MSU_p(O') on the primary
+         *         must be at least seqno(O). Moreover, the MSU_r on a follower >= min(seqno(O), seqno(O')) after these operations


Moreover, the MSU_r on a follower >= min(seqno(O), seqno(O')) after these operations

I still don't follow this.

I break it into two cases. I hope this is clear now.

.../plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/index/engine/FollowingEngineTests.java

dnhatn · 2018-09-28T15:09:18Z

@bleskes I've addressed your comments. Could you please take another look? Thank you!

dnhatn · 2018-09-28T19:44:53Z

Discussed with Boaz on another channel, we prefer to use the proof from @DaveCTurner and @ywelsch (thank you for making this 👍).

bleskes

LGTM. Thanks @dnhatn for the hard work.

bleskes · 2018-09-28T19:46:08Z

server/src/main/java/org/elasticsearch/index/engine/Engine.java

+     * 2. Also MSU(O) <= MSU <= LCP < seqno(O) (we discard O if seqno(O) ≤ LCP) so the second invariant applies,
+     * meaning that the O' was a delete.
+     * <p>
+     * Moreover, operations that are optimized using the MSU optimization will not be processed twice as this will create duplicates


Can we add a sentence - "Therefore, if MSU<= LCP < seqno(O) we know that O can safely be optimized with and added to lucene with addDocument. Moreover, operations"...

dnhatn · 2018-09-29T00:41:45Z

Thanks @bleskes and @s1monw.

This change introduces the indexing optimization using sequence numbers in the FollowingEngine. This optimization uses the max_seq_no_updates which is tracked on the primary of the leader and replicated to replicas and followers. Relates #33656

Since #34099, the FollowingEngine will skip an operation which was already processed before. With that change, it should be okay to unmute testFollowIndexAndCloseNode.

This change introduces the indexing optimization using sequence numbers in the FollowingEngine. This optimization uses the max_seq_no_updates which is tracked on the primary of the leader and replicated to replicas and followers. Relates #33656

Since #34099, the FollowingEngine will skip an operation which was already processed before. With that change, it should be okay to unmute testFollowIndexAndCloseNode.

This PR enables the indexing optimization using sequence numbers on replicas. With this optimization, indexing on replicas should be faster and use less memory as it can forgo the version lookup when possible. This change also deactivates the append-only optimization on replicas. Relates #34099

CCR: Optimize indexing ops using seqno on follower

be60aa6

This change introduces the indexing optimization using sequence numbers on the FollowingEngine. This optimization uses the max_seq_no_updates which is tracked on the primary of the leader, and replicated to replicas and followers.

dnhatn added the :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features label Sep 27, 2018

dnhatn requested review from s1monw and bleskes September 27, 2018 00:27

dnhatn changed the title ~~CCR: Optimize indexing operations using seq_no on followers~~ CCR: Optimize indexing ops using seq_no on followers Sep 27, 2018

dnhatn added 2 commits September 26, 2018 20:29

wording

7d25e69

Merge branch 'master' into optimize-follower

a81d1df

bleskes approved these changes Sep 27, 2018

View reviewed changes

dnhatn added 2 commits September 27, 2018 21:01

Merge branch 'master' into optimize-follower

34f6f81

feedback

6cd688e

dnhatn requested a review from bleskes September 28, 2018 01:53

wording

adb9933

s1monw approved these changes Sep 28, 2018

View reviewed changes

bleskes suggested changes Sep 28, 2018

View reviewed changes

dnhatn added 3 commits September 28, 2018 09:20

shuffle snapshot

be86b4f

Merge branch 'master' into optimize-follower

319dbb8

update note

91f419b

dnhatn requested a review from bleskes September 28, 2018 15:09

dnhatn added 3 commits September 28, 2018 15:24

Merge branch 'master' into optimize-follower

5a0f464

use proof from david and yannick

f34b447

typo

516c99e

bleskes approved these changes Sep 28, 2018

View reviewed changes

dnhatn added 2 commits September 28, 2018 15:54

add conclusion

6f807cf

stylecheck

d79d26e

dliappis mentioned this pull request Sep 28, 2018

[CCR] Re-evaluate shard follow parameter defaults #31717

Closed

dnhatn merged commit ad61398 into elastic:master Sep 29, 2018

dnhatn deleted the optimize-follower branch September 29, 2018 00:42

dnhatn added the backport pending label Sep 29, 2018

dnhatn mentioned this pull request Sep 29, 2018

Uses auto generated timestamp with soft-deletes #33656

Closed

dnhatn removed the backport pending label Sep 29, 2018

dnhatn added a commit that referenced this pull request Oct 1, 2018

TEST: Unmute testFollowIndexAndCloseNode

6091e6d

Since #34099, the FollowingEngine will skip an operation which was already processed before. With that change, it should be okay to unmute testFollowIndexAndCloseNode.

dnhatn added a commit that referenced this pull request Oct 1, 2018

TEST: Unmute testFollowIndexAndCloseNode

a02deba

Since #34099, the FollowingEngine will skip an operation which was already processed before. With that change, it should be okay to unmute testFollowIndexAndCloseNode.

dnhatn mentioned this pull request Oct 1, 2018

[CI] CCR: testFollowIndexAndCloseNode fails #33337

Closed

kcm pushed a commit that referenced this pull request Oct 30, 2018

TEST: Unmute testFollowIndexAndCloseNode

81d0ad7

Since #34099, the FollowingEngine will skip an operation which was already processed before. With that change, it should be okay to unmute testFollowIndexAndCloseNode.

dnhatn mentioned this pull request Jun 26, 2019

Enable indexing optimization using sequence numbers on replicas #43616

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCR: Optimize indexing ops using seq_no on followers #34099

CCR: Optimize indexing ops using seq_no on followers #34099

dnhatn commented Sep 27, 2018

elasticmachine commented Sep 27, 2018

bleskes left a comment

bleskes Sep 27, 2018

bleskes Sep 27, 2018

dnhatn Sep 28, 2018

bleskes Sep 28, 2018

dnhatn Sep 28, 2018

bleskes Sep 27, 2018

bleskes Sep 27, 2018

dnhatn commented Sep 28, 2018

s1monw left a comment

bleskes left a comment

bleskes Sep 28, 2018

bleskes Sep 28, 2018

bleskes Sep 28, 2018

dnhatn Sep 28, 2018

dnhatn commented Sep 28, 2018

dnhatn commented Sep 28, 2018

bleskes left a comment

bleskes Sep 28, 2018

dnhatn commented Sep 29, 2018

CCR: Optimize indexing ops using seq_no on followers #34099

CCR: Optimize indexing ops using seq_no on followers #34099

Conversation

dnhatn commented Sep 27, 2018

elasticmachine commented Sep 27, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Sep 28, 2018

s1monw left a comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Sep 28, 2018

dnhatn commented Sep 28, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Sep 29, 2018