Add local session timeouts to leader node #37438

Tim-Brooks · 2019-01-14T21:52:57Z

This is related to #35975. This commit adds timeout functionality to
the local session on a leader node. When a session is started, a timeout
is scheduled using a repeatable runnable. If the session is not accessed
in between two runs the session is closed. When the sssion is closed,
the repeating task is cancelled.

Additionally, this commit moves session uuid generation to the leader
cluster. And renames the PutCcrRestoreSessionRequest to
StartCcrRestoreSessionRequest to reflect that change.

This is related to elastic#35975. This commit adds timeout functionality to the local session on a leader node. When a session is started, a timeout is scheduled using a repeatable runnable. If the session is not accessed in between two runs the session is closed. When the sssion is closed, the repeating task is cancelled.

elasticmachine · 2019-01-14T21:52:59Z

Pinging @elastic/es-distributed

Tim-Brooks · 2019-01-14T21:55:06Z

This is POC for local leader session timeouts. It takes an approach of an "idle" timeout. Which means that the session will stay alive as long it is actively in use in between the timeout periods. Another alternative would be to have a total restore timeout on the leader side.

Second, is the question about configurability. Do we want to introduce a setting for this and what do we want the setting to be labelled as? If we going with total restore timeout I think that we definitely want a configurable setting.

Thoughts?

ywelsch · 2019-01-15T10:47:25Z

The "idle" timeout approach makes sense to me (it's the one we use for peer recovery as well).

Second, is the question about configurability. Do we want to introduce a setting for this and what do we want the setting to be labelled as?

yes. For recovery we have indices.recovery.recovery_activity_timeout. I wonder if we need a special setting for CCR or whether we can just reuse the same setting. This leaves the question of default value for this, though. For peer recoveries, the value of this setting is 30 minutes by default, which is way too high imo, and is probably only explained by the fact that peer recoveries used to work quite differently in the ES 2.x times, namely that the node itself (and not the master) was responsible for scheduling recoveries and making sure that not too many recoveries run at once, which required a larger timeout here because a recovery might need to wait for other recoveries to finish first before it got to execute.

Additionally, this commit moves session uuid generation to the leader cluster. And renames the PutCcrRestoreSessionRequest to StartCcrRestoreSessionRequest to reflect that change.

Can you undo that change, as it's unrelated to this PR, and I'm not even sure we should be doing that change.

bleskes · 2019-01-15T10:50:01Z

+1 to the same handling as recoveries. i.e., - activity timeouts + settings.

jasontedor · 2019-01-15T12:18:47Z

+1

…timeouts

ywelsch · 2019-01-15T22:51:25Z

...plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceService.java


-    public CcrRestoreSourceService(Settings settings) {
+    public CcrRestoreSourceService(Settings settings, ThreadPool threadPool) {
+        this(settings, threadPool, RecoverySettings.INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING.get(settings));


it might not have been clear from my initial comment, but I would prefer to go with a separate setting for now (e.g. ccr.recovery.recovery_activity_timeout), as the default on INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING is absurdly high, and I want to investigate lowering this timeout in 7.0. We can still decide to leave the new setting undocumented for now.

I made a new setting

ywelsch · 2019-01-15T22:53:10Z

...plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceService.java


-    public CcrRestoreSourceService(Settings settings) {
+    public CcrRestoreSourceService(Settings settings, ThreadPool threadPool) {
+        this(settings, threadPool, RecoverySettings.INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING.get(settings));


the new setting should also be dynamic, just as INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING, and would best live in CCRSettings, following the same model as I've explained in #37449

I hooked it up to CcrSettings

...plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceService.java

...n/ccr/src/test/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceServiceTests.java

…timeouts

Tim-Brooks · 2019-01-18T02:05:35Z

@ywelsch I made your suggested changes and merged in the CcrSettings work from rate limiting.

ywelsch · 2019-01-18T14:12:36Z

...n/ccr/src/test/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceServiceTests.java

-        restoreSourceService = new CcrRestoreSourceService();
+        Settings settings = Settings.builder().put(NODE_NAME_SETTING.getKey(), "node").build();
+        taskQueue = new DeterministicTaskQueue(settings, random());
+        Set<Setting<?>> registeredSettings = new HashSet<>(Arrays.asList(CcrSettings.INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING,


we have a convenience method Sets.newHashSet(...) (does not save that many chars, just so you're aware)

ywelsch · 2019-01-18T14:16:33Z

...n/ccr/src/test/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceServiceTests.java

+        assertFalse(taskQueue.hasDeferredTasks());
+
+        try (CcrRestoreSourceService.SessionReader reader = restoreSourceService.getSessionReader(sessionUUID)) {
+            fail("Should have timed out.");


this can be written more concisely with expectThrows. Can you also assert something on the exception message?

Tim-Brooks · 2019-01-18T18:42:32Z

run gradle build tests 1

Tim-Brooks · 2019-01-18T19:27:17Z

run gradle build tests 1

Tim-Brooks · 2019-01-18T20:09:25Z

run gradle build tests 1

* elastic/master: Remove Watcher Account "unsecure" settings (elastic#36736) Add cache cleaning task for ML snapshot (elastic#37505) Update jdk used by the docker builds (elastic#37621) Remove an unused constant in PutMappingRequest. Update get users to allow unknown fields (elastic#37593) Do not add index event listener if CCR disabled (elastic#37432) Add local session timeouts to leader node (elastic#37438)

* elastic/master: (104 commits) Permission for restricted indices (elastic#37577) Remove Watcher Account "unsecure" settings (elastic#36736) Add cache cleaning task for ML snapshot (elastic#37505) Update jdk used by the docker builds (elastic#37621) Remove an unused constant in PutMappingRequest. Update get users to allow unknown fields (elastic#37593) Do not add index event listener if CCR disabled (elastic#37432) Add local session timeouts to leader node (elastic#37438) Add some deprecation optimizations (elastic#37597) refactor inner geogrid classes to own class files (elastic#37596) Remove obsolete deprecation checks (elastic#37510) ML: Add support for single bucket aggs in Datafeeds (elastic#37544) ML: creating ML State write alias and pointing writes there (elastic#37483) Deprecate types in the put mapping API. (elastic#37280) [ILM] Add unfollow action (elastic#36970) Packaging: Update marker used to allow ELASTIC_PASSWORD (elastic#37243) Fix setting openldap realm ssl config Document the need for JAVA11_HOME (elastic#37589) SQL: fix object extraction from sources (elastic#37502) Nit in settings.gradle for Eclipse ...

This is related to elastic#35975. This commit adds timeout functionality to the local session on a leader node. When a session is started, a timeout is scheduled using a repeatable runnable. If the session is not accessed in between two runs the session is closed. When the sssion is closed, the repeating task is cancelled. Additionally, this commit moves session uuid generation to the leader cluster. And renames the PutCcrRestoreSessionRequest to StartCcrRestoreSessionRequest to reflect that change.

Tim-Brooks added 2 commits January 14, 2019 14:43

Fix checkstyle

9696fe3

Tim-Brooks added >non-issue v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 labels Jan 14, 2019

Tim-Brooks requested review from martijnvg, bleskes, ywelsch and jasontedor January 14, 2019 21:52

Tim-Brooks mentioned this pull request Jan 15, 2019

Implement CCR bootstrap from remote #35975

Closed

32 tasks

Tim-Brooks added 2 commits January 15, 2019 09:20

Merge remote-tracking branch 'upstream/master' into add_local_leader_…

9163523

…timeouts

Changes

b0324a4

ywelsch suggested changes Jan 15, 2019

View reviewed changes

Tim-Brooks added 5 commits January 16, 2019 17:14

Changes

c9abe5d

Merge remote-tracking branch 'upstream/master' into add_local_leader_…

28d0fc5

…timeouts

Fix

c223763

Merge remote-tracking branch 'upstream/master' into add_local_leader_…

c500923

…timeouts

Changes

1c1b8ff

Tim-Brooks requested a review from ywelsch January 18, 2019 02:04

ywelsch approved these changes Jan 18, 2019

View reviewed changes

Changes

440eb2b

Tim-Brooks merged commit cd41289 into elastic:master Jan 18, 2019

Tim-Brooks added the backport pending label Jan 18, 2019

Tim-Brooks removed the backport pending label Jan 22, 2019

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Tim-Brooks deleted the add_local_leader_timeouts branch December 18, 2019 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local session timeouts to leader node #37438

Add local session timeouts to leader node #37438

Tim-Brooks commented Jan 14, 2019

elasticmachine commented Jan 14, 2019

Tim-Brooks commented Jan 14, 2019

ywelsch commented Jan 15, 2019

bleskes commented Jan 15, 2019

jasontedor commented Jan 15, 2019

ywelsch Jan 15, 2019

Tim-Brooks Jan 18, 2019

ywelsch Jan 15, 2019

Tim-Brooks Jan 18, 2019

Tim-Brooks commented Jan 18, 2019

ywelsch Jan 18, 2019

ywelsch Jan 18, 2019

Tim-Brooks commented Jan 18, 2019

Tim-Brooks commented Jan 18, 2019

Tim-Brooks commented Jan 18, 2019

Add local session timeouts to leader node #37438

Add local session timeouts to leader node #37438

Conversation

Tim-Brooks commented Jan 14, 2019

elasticmachine commented Jan 14, 2019

Tim-Brooks commented Jan 14, 2019

ywelsch commented Jan 15, 2019

bleskes commented Jan 15, 2019

jasontedor commented Jan 15, 2019

ywelsch Jan 15, 2019

Choose a reason for hiding this comment

Tim-Brooks Jan 18, 2019

Choose a reason for hiding this comment

ywelsch Jan 15, 2019

Choose a reason for hiding this comment

Tim-Brooks Jan 18, 2019

Choose a reason for hiding this comment

Tim-Brooks commented Jan 18, 2019

ywelsch Jan 18, 2019

Choose a reason for hiding this comment

ywelsch Jan 18, 2019

Choose a reason for hiding this comment

Tim-Brooks commented Jan 18, 2019

Tim-Brooks commented Jan 18, 2019

Tim-Brooks commented Jan 18, 2019