-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using time out in cluster state observer as we are reusing the observer #215
Conversation
Signed-off-by: Gaurav Bafna <[email protected]>
Logs after fix
Logs before fix
|
@@ -125,7 +125,7 @@ suspend fun ClusterStateObserver.waitForNextChange(reason: String, predicate: (C | |||
override fun onTimeout(timeout: TimeValue?) { | |||
cont.resumeWithException(OpenSearchTimeoutException("timed out waiting for $reason")) | |||
} | |||
}, predicate) | |||
}, predicate, TimeValue(60000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the timeout is not set, Is the default value under cluster service not taken into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is taken, but it is measured since startTimeMS
which is initialized only once across multiple waitForNextChange
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global object cso
object in index replication task seems to be creating the issue. Can we check the previous cluster state tracker in this observer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct fix here would be to set startTimeMS
and timeoutTimeLeftMS
irrespective of whether the timeOutValue
is null or not. That'll require changes in OS repo though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @ankitkala , but that will change behavior for other use cases and may break their use case as well.
@saikaranam-amazon : not sure i understand your point. will sync up offline and update it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, Concern was around the reference to the previous cluster state. Based on the logic from observer code, it is updating the latest state.
LGTM. |
…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>
…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>
…er (#215) (#216) Signed-off-by: Gaurav Bafna <[email protected]>
…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>
…er (#215) (#217) Signed-off-by: Gaurav Bafna <[email protected]>
Description
This makes
waitForNextChange
wait till time out value every time it is called. Without this change, the cluster state observer doesn't updatecso.startTimeMS
. So it waits for totaltimeout
across multiple calls . For ex : 60 sec on first time and after that sincecso.startTimeMS
is not updated , thewaitForNextChange
returns immediately . This results in unnecessary CPU cycles and log flood as well.Issues Resolved
#207
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.