-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local session timeouts to leader node #37438
Add local session timeouts to leader node #37438
Conversation
This is related to elastic#35975. This commit adds timeout functionality to the local session on a leader node. When a session is started, a timeout is scheduled using a repeatable runnable. If the session is not accessed in between two runs the session is closed. When the sssion is closed, the repeating task is cancelled.
Pinging @elastic/es-distributed |
This is POC for local leader session timeouts. It takes an approach of an "idle" timeout. Which means that the session will stay alive as long it is actively in use in between the timeout periods. Another alternative would be to have a total restore timeout on the leader side. Second, is the question about configurability. Do we want to introduce a setting for this and what do we want the setting to be labelled as? If we going with total restore timeout I think that we definitely want a configurable setting. Thoughts? |
The "idle" timeout approach makes sense to me (it's the one we use for peer recovery as well).
yes. For recovery we have
Can you undo that change, as it's unrelated to this PR, and I'm not even sure we should be doing that change. |
+1 to the same handling as recoveries. i.e., - activity timeouts + settings. |
+1 |
|
||
public CcrRestoreSourceService(Settings settings) { | ||
public CcrRestoreSourceService(Settings settings, ThreadPool threadPool) { | ||
this(settings, threadPool, RecoverySettings.INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING.get(settings)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might not have been clear from my initial comment, but I would prefer to go with a separate setting for now (e.g. ccr.recovery.recovery_activity_timeout
), as the default on INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING
is absurdly high, and I want to investigate lowering this timeout in 7.0. We can still decide to leave the new setting undocumented for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a new setting
|
||
public CcrRestoreSourceService(Settings settings) { | ||
public CcrRestoreSourceService(Settings settings, ThreadPool threadPool) { | ||
this(settings, threadPool, RecoverySettings.INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING.get(settings)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new setting should also be dynamic, just as INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING
, and would best live in CCRSettings, following the same model as I've explained in #37449
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hooked it up to CcrSettings
...plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceService.java
Outdated
Show resolved
Hide resolved
...n/ccr/src/test/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceServiceTests.java
Outdated
Show resolved
Hide resolved
...n/ccr/src/test/java/org/elasticsearch/xpack/ccr/repository/CcrRestoreSourceServiceTests.java
Outdated
Show resolved
Hide resolved
@ywelsch I made your suggested changes and merged in the |
restoreSourceService = new CcrRestoreSourceService(); | ||
Settings settings = Settings.builder().put(NODE_NAME_SETTING.getKey(), "node").build(); | ||
taskQueue = new DeterministicTaskQueue(settings, random()); | ||
Set<Setting<?>> registeredSettings = new HashSet<>(Arrays.asList(CcrSettings.INDICES_RECOVERY_ACTIVITY_TIMEOUT_SETTING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a convenience method Sets.newHashSet(...)
(does not save that many chars, just so you're aware)
assertFalse(taskQueue.hasDeferredTasks()); | ||
|
||
try (CcrRestoreSourceService.SessionReader reader = restoreSourceService.getSessionReader(sessionUUID)) { | ||
fail("Should have timed out."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be written more concisely with expectThrows
. Can you also assert something on the exception message?
run gradle build tests 1 |
1 similar comment
run gradle build tests 1 |
run gradle build tests 1 |
* elastic/master: Remove Watcher Account "unsecure" settings (elastic#36736) Add cache cleaning task for ML snapshot (elastic#37505) Update jdk used by the docker builds (elastic#37621) Remove an unused constant in PutMappingRequest. Update get users to allow unknown fields (elastic#37593) Do not add index event listener if CCR disabled (elastic#37432) Add local session timeouts to leader node (elastic#37438)
* elastic/master: (104 commits) Permission for restricted indices (elastic#37577) Remove Watcher Account "unsecure" settings (elastic#36736) Add cache cleaning task for ML snapshot (elastic#37505) Update jdk used by the docker builds (elastic#37621) Remove an unused constant in PutMappingRequest. Update get users to allow unknown fields (elastic#37593) Do not add index event listener if CCR disabled (elastic#37432) Add local session timeouts to leader node (elastic#37438) Add some deprecation optimizations (elastic#37597) refactor inner geogrid classes to own class files (elastic#37596) Remove obsolete deprecation checks (elastic#37510) ML: Add support for single bucket aggs in Datafeeds (elastic#37544) ML: creating ML State write alias and pointing writes there (elastic#37483) Deprecate types in the put mapping API. (elastic#37280) [ILM] Add unfollow action (elastic#36970) Packaging: Update marker used to allow ELASTIC_PASSWORD (elastic#37243) Fix setting openldap realm ssl config Document the need for JAVA11_HOME (elastic#37589) SQL: fix object extraction from sources (elastic#37502) Nit in settings.gradle for Eclipse ...
This is related to elastic#35975. This commit adds timeout functionality to the local session on a leader node. When a session is started, a timeout is scheduled using a repeatable runnable. If the session is not accessed in between two runs the session is closed. When the sssion is closed, the repeating task is cancelled. Additionally, this commit moves session uuid generation to the leader cluster. And renames the PutCcrRestoreSessionRequest to StartCcrRestoreSessionRequest to reflect that change.
This is related to #35975. This commit adds timeout functionality to
the local session on a leader node. When a session is started, a timeout
is scheduled using a repeatable runnable. If the session is not accessed
in between two runs the session is closed. When the sssion is closed,
the repeating task is cancelled.
Additionally, this commit moves session uuid generation to the leader
cluster. And renames the
PutCcrRestoreSessionRequest
toStartCcrRestoreSessionRequest
to reflect that change.