-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds plugin version sweep background job #434
Adds plugin version sweep background job #434
Conversation
Codecov Report
@@ Coverage Diff @@
## main #434 +/- ##
=========================================
Coverage 75.94% 75.95%
- Complexity 2480 2492 +12
=========================================
Files 315 316 +1
Lines 14500 14547 +47
Branches 2243 2248 +5
=========================================
+ Hits 11012 11049 +37
- Misses 2239 2246 +7
- Partials 1249 1252 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general questions:
- can we disable the trigger logic in skipExecution since we now have this background loop.
trigger logic I am referring to
override fun clusterChanged(event: ClusterChangedEvent) {
if (event.nodesChanged() || event.isNewCluster) {
sweepISMPluginVersion()
}
}
in SkipExecution
val SWEEP_SKIP_PERIOD: Setting<TimeValue> = Setting.timeSetting( | ||
"opendistro.index_state_management.coordinator.sweep_skip_period", | ||
TimeValue.timeValueMinutes(10), | ||
TimeValue.timeValueMinutes(5), | ||
Setting.Property.NodeScope, | ||
Setting.Property.Dynamic, | ||
Setting.Property.Deprecated | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to have this if we are adding a new setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Tnx!
if (!skipExecution.flag) { | ||
logger.info("Canceling sweep ism plugin version job") | ||
scheduledSkipExecution?.cancel() | ||
} else { | ||
skipExecution.sweepISMPluginVersion() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to cancel this job or let it run forever?
…he case of version discrepancy Signed-off-by: Stevan Buzejic <[email protected]>
…r scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <[email protected]>
027e78e
to
151fec9
Compare
private fun isIndexStateManagementEnabled(): Boolean = indexStateManagementEnabled == true | ||
|
||
companion object { | ||
private const val RETRY_PERIOD_IN_MINUTES = 5L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this same as sweepSkipPeriod
? If so, should we use sweepSkipPeriod instead?
Good question. And you are right - I am thinking the same. SkipExecution class should do only sweepISMPluginVersion, while the caller class will be responsible for triggering the request. So, my proposal is: Caller class, PluginVersionSweepCoordinator, will listen for cluster changed events and will be responsible for calling the sweepISM method. This class already has a scheduled job that can be canceled optionally (ie. if the skip flag is being set to true). ie.
|
…lag up to 5 mins Signed-off-by: Stevan Buzejic <[email protected]>
85cca3c
to
47b7a24
Compare
Signed-off-by: Stevan Buzejic <[email protected]>
* [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <[email protected]> * Empty-Commit Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> (cherry picked from commit 4d844fa)
* [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <[email protected]> * Empty-Commit Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> (cherry picked from commit 4d844fa) Co-authored-by: Clay Downs <[email protected]>
* [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <[email protected]> * Empty-Commit Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> (cherry picked from commit 4d844fa)
* initial framework Signed-off-by: Joanne Wang <[email protected]> * Removed recursion from Explain Action to avoid stackoverflow in some situations (#419) Signed-off-by: Petar Dzepina <[email protected]> Signed-off-by: Joanne Wang <[email protected]> * enabled by default integrated Signed-off-by: Joanne Wang <[email protected]> * cleaned up comments and logs, created unit test and updated previous integration tests Signed-off-by: Joanne Wang <[email protected]> * added delete validation logic Signed-off-by: Joanne Wang <[email protected]> * fixed rollover validation unit tests Signed-off-by: Joanne Wang <[email protected]> * added validation info field to ManagedIndexMetaData Signed-off-by: Joanne Wang <[email protected]> * removed step context as input Signed-off-by: Joanne Wang <[email protected]> * added validationmetadata class Signed-off-by: Joanne Wang <[email protected]> * restored old integration tests and changed validation service output Signed-off-by: Joanne Wang <[email protected]> * before integrated validation meta data into managed index meta data Signed-off-by: Joanne Wang <[email protected]> * integrated validation meta data Signed-off-by: Joanne Wang <[email protected]> * working version Signed-off-by: Joanne Wang <[email protected]> * added validation mapping Signed-off-by: Joanne Wang <[email protected]> * fixed integ tests Signed-off-by: Joanne Wang <[email protected]> * renamed some values Signed-off-by: Joanne Wang <[email protected]> * before removing from managed index meta data Signed-off-by: Joanne Wang <[email protected]> * created validation result object in explain Signed-off-by: Joanne Wang <[email protected]> * testing Signed-off-by: Joanne Wang <[email protected]> * run fails Signed-off-by: Joanne Wang <[email protected]> * integration test for delete + added framework for force merge Signed-off-by: Joanne Wang <[email protected]> * removed step validation metadata and still testing explain results Signed-off-by: Joanne Wang <[email protected]> * before removing from managed index runner Signed-off-by: Joanne Wang <[email protected]> * removed from managed index runner Signed-off-by: Joanne Wang <[email protected]> * clean up and tests Signed-off-by: Joanne Wang <[email protected]> * all validation tests pass Signed-off-by: Joanne Wang <[email protected]> * removed validation result from all managed index meta data Signed-off-by: Joanne Wang <[email protected]> * restored old IT tests Signed-off-by: Joanne Wang <[email protected]> * fixed it tests, set explain validation to false Signed-off-by: Joanne Wang <[email protected]> * clean up Signed-off-by: Joanne Wang <[email protected]> * Change test page size to avoid index/search TimeInMillis < 1 issue. (#460) * Change test page size to avoid indexTimeInMillis < 1 issue. Signed-off-by: Angie Zhang <[email protected]> * Change test page size to avoid indexTimeInMillis < 1 issue. Signed-off-by: Angie Zhang <[email protected]> Signed-off-by: Angie Zhang <[email protected]> * Transform maxclauses fix (#477) * transform maxClauses fix Signed-off-by: Petar Dzepina <[email protected]> * added bucket log to track processed buckets Signed-off-by: Petar Dzepina <[email protected]> * various renames/changes Signed-off-by: Petar Dzepina <[email protected]> * fixed detekt issues Signed-off-by: Petar Dzepina <[email protected]> * added comments to test Signed-off-by: Petar Dzepina <[email protected]> * removed debug logging Signed-off-by: Petar Dzepina <[email protected]> * empty commit to trigger checks Signed-off-by: Petar Dzepina <[email protected]> * reduced pageSize to 1 in few ITs to avoid flaky tests; fixed bug where pagesProcessed was calculated incorrectly Signed-off-by: Petar Dzepina <[email protected]> * reverted pagesProcessed change; fixed few ITs Signed-off-by: Petar Dzepina <[email protected]> Signed-off-by: Petar Dzepina <[email protected]> * 483: Updated detekt plugin and snakeyaml dependency. Updated a code t… (#485) * 483: Updated detekt plugin and snakeyaml dependency. Updated a code to reduce the number of issues after static analysis Signed-off-by: Stevan Buzejic <[email protected]> * 483: Updated snakeyaml version to use the latest Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> * Remove HOST_DENY_LIST usage as Notification plugin will own it (#471) (#107) Signed-off-by: Xuesong Luo <[email protected]> Signed-off-by: Xuesong Luo <[email protected]> * Disable detekt because of the CVE (#497) Signed-off-by: bowenlan-amzn <[email protected]> Signed-off-by: bowenlan-amzn <[email protected]> * Deprecate Master nonmenclature (#501) Signed-off-by: bowenlan-amzn <[email protected]> Signed-off-by: bowenlan-amzn <[email protected]> * [AUTO] Increment version to 2.3.0-SNAPSHOT (#484) (#503) * fix#921-README-forum-link-index_mgmnt (#499) Signed-off-by: cwillum <[email protected]> Signed-off-by: cwillum <[email protected]> * 64: Added rounding when using aggreagate script for avg metric. Added… (#490) * 64: Added rounding when using aggreagate script for avg metric. Added unit tests for checking average aggregations against the target rollup index Signed-off-by: Stevan Buzejic <[email protected]> * 64: Rollup job renamed Signed-off-by: Stevan Buzejic <[email protected]> * 64: Removed unrelevant metrics for the avg calculation test Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> * Revert Disable detekt and force choose snakeyml 1.32 (#528) * Revert Disable detekt: 50ac1e9 Signed-off-by: Siddhant Deshmukh <[email protected]> * Remove force choosing snakeyml 1.31 Signed-off-by: Siddhant Deshmukh <[email protected]> * Force snakeyaml 1.32 Signed-off-by: Siddhant Deshmukh <[email protected]> * Empty commit Signed-off-by: Siddhant Deshmukh <[email protected]> Signed-off-by: Siddhant Deshmukh <[email protected]> * Added 2.3 release note (#507) (#515) (#517) * Update 2.3 release note Signed-off-by: Angie Zhang <[email protected]> * Update 2.3 release note Signed-off-by: Angie Zhang <[email protected]> * Update 2.3 release note Signed-off-by: Angie Zhang <[email protected]> * Update 2.3 release note Signed-off-by: Angie Zhang <[email protected]> * Update 2.3 release note Signed-off-by: Angie Zhang <[email protected]> Signed-off-by: Angie Zhang <[email protected]> (cherry picked from commit d9793ac) Signed-off-by: Angie Zhang <[email protected]> Signed-off-by: Angie Zhang <[email protected]> (cherry picked from commit 7217b5b) Co-authored-by: Angie Zhang <[email protected]> * Add 2.2 release note (#450) (#452) (#516) * Add 2.2 release note Signed-off-by: Angie Zhang <[email protected]> * Add 2.2 release note Signed-off-by: Angie Zhang <[email protected]> Co-authored-by: Angie Zhang <[email protected]> (cherry picked from commit 8eb5da6) Signed-off-by: Angie Zhang <[email protected]> Signed-off-by: Angie Zhang <[email protected]> Co-authored-by: Ashish Agrawal <[email protected]> * Adds plugin version sweep background job (#434) * [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <[email protected]> * Empty-Commit Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> * flaky transform test fix attempt (#542) * flaky transform test fix attempt Signed-off-by: Petar Dzepina <[email protected]> * accidental paste fix Signed-off-by: Petar Dzepina <[email protected]> Signed-off-by: Petar Dzepina <[email protected]> Co-authored-by: Petar Dzepina <[email protected]> Signed-off-by: Joanne Wang <[email protected]> Signed-off-by: Petar Dzepina <[email protected]> Signed-off-by: Angie Zhang <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Xuesong Luo <[email protected]> Signed-off-by: bowenlan-amzn <[email protected]> Signed-off-by: cwillum <[email protected]> Signed-off-by: Siddhant Deshmukh <[email protected]> Signed-off-by: Petar Dzepina <[email protected]> Co-authored-by: Petar <[email protected]> Co-authored-by: Angie Zhang <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> Co-authored-by: xluo-aws <[email protected]> Co-authored-by: bowenlan-amzn <[email protected]> Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com> Co-authored-by: Chris Moore <[email protected]> Co-authored-by: Siddhant Deshmukh <[email protected]> Co-authored-by: Angie Zhang <[email protected]> Co-authored-by: Ashish Agrawal <[email protected]> Co-authored-by: Clay Downs <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> Co-authored-by: Petar Dzepina <[email protected]>
…ensearch-project#539) * [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <[email protected]> * Empty-Commit Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> (cherry picked from commit 4d844fa) Co-authored-by: Clay Downs <[email protected]>
…ensearch-project#539) * [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <[email protected]> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <[email protected]> * Empty-Commit Signed-off-by: Stevan Buzejic <[email protected]> Signed-off-by: Stevan Buzejic <[email protected]> Co-authored-by: Stevan Buzejic <[email protected]> (cherry picked from commit 4d844fa) Co-authored-by: Clay Downs <[email protected]> Signed-off-by: Ronnak Saxena <[email protected]>
Issue #, if available:
#207
Description of changes:
Index Management currently skips all job executions when there are two differing versions of Index Management on the cluster. The plugin currently does this by performing a NodesInfoRequest to get and compare plugin versions whenever there is a node added or a new cluster, and set a flag, SkipExecution, to true when there are multiple plugin versions. We have seen cases where the SkipExecution flag is still set to true even though the upgrade process (early ES 7.x to later ES 7.x) has finished and the cluster is on the latest version w/ all nodes containing the same version of IM plugin.
From analyzing the code, we can see race conditions that would allow multiple requests to overwrite each other in the wrong order. Though the cluster changed events would come in order, the NodesInfoRequests may actually overwrite the flag out of order.
To resolve this race condition, this PR adds a background job which will run every five minutes to poll the plugin versions if the flag is currently set to true.
This is an alternative strategy to #423 and is also entirely by Stevan Buzejic, @stevanbz, I am just raising the PR for an early review.
CheckList:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.