Using time out in cluster state observer as we are reusing the observer #215

gbbafna · 2021-10-27T04:44:26Z

Description

This makes waitForNextChange wait till time out value every time it is called. Without this change, the cluster state observer doesn't update cso.startTimeMS . So it waits for total timeout across multiple calls . For ex : 60 sec on first time and after that since cso.startTimeMS is not updated , the waitForNextChange returns immediately . This results in unnecessary CPU cycles and log flood as well.

Issues Resolved

#207

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gaurav Bafna <[email protected]>

gbbafna · 2021-10-27T04:54:13Z

Logs after fix

[2021-10-27T10:00:05,559][INFO ][o.o.r.t.i.IndexReplicationTask] [followCluster-0] [grab] In restoring state
[2021-10-27T10:00:05,569][DEBUG][o.o.c.c.PublicationTransportHandler] [followCluster-0] received diff cluster state version [10] with uuid [wGnixzedSougu_Atbb-PSQ], diff size [598]
[2021-10-27T10:00:05,789][DEBUG][o.o.c.c.C.CoordinatorPublication] [followCluster-0] publication ended successfully: Publication{term=1, version=10}
[2021-10-27T10:01:05,800][INFO ][o.o.r.t.i.IndexReplicationTask] [followCluster-0] [grab] Timed out while waiting for restore to complete.
[2021-10-27T10:02:05,803][INFO ][o.o.r.t.i.IndexReplicationTask] [followCluster-0] [grab] Timed out while waiting for restore to complete.
[2021-10-27T10:03:05,807][INFO ][o.o.r.t.i.IndexReplicationTask] [followCluster-0] [grab] Timed out while waiting for restore to complete.
[2021-10-27T10:04:05,811][INFO ][o.o.r.t.i.IndexReplicationTask] [followCluster-0] [grab] Timed out while waiting for restore to complete.

Logs before fix

[2021-10-13T07:49:33,977][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] In restoring state

[2021-10-13T07:50:33,170][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] Timed out while waiting for restore to complete.

[2021-10-13T07:50:33,170][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] Timed out while waiting for restore to complete.
[2021-10-13T07:50:33,170][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] Timed out while waiting for restore to complete.
[2021-10-13T07:50:33,170][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] Timed out while waiting for restore to complete.
[2021-10-13T07:50:33,171][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] Timed out while waiting for restore to complete.
[2021-10-13T07:50:33,171][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] Timed out while waiting for restore to complete.
[2021-10-13T07:50:33,171][INFO ][c.a.e.r.t.i.IndexReplicationTask] [a133cc96509ad479b0161b22d54d2199] [abcd] Timed out while waiting for restore to complete.

saikaranam-amazon · 2021-10-27T05:29:19Z

src/main/kotlin/org/opensearch/replication/util/Coroutines.kt

@@ -125,7 +125,7 @@ suspend fun ClusterStateObserver.waitForNextChange(reason: String, predicate: (C
            override fun onTimeout(timeout: TimeValue?) {
                cont.resumeWithException(OpenSearchTimeoutException("timed out waiting for $reason"))
            }
-        }, predicate)
+        }, predicate,  TimeValue(60000))


If the timeout is not set, Is the default value under cluster service not taken into account?

it is taken, but it is measured since startTimeMS which is initialized only once across multiple waitForNextChange.

The global object cso object in index replication task seems to be creating the issue. Can we check the previous cluster state tracker in this observer?

Correct fix here would be to set startTimeMS and timeoutTimeLeftMS irrespective of whether the timeOutValue is null or not. That'll require changes in OS repo though.

yes @ankitkala , but that will change behavior for other use cases and may break their use case as well.

@saikaranam-amazon : not sure i understand your point. will sync up offline and update it here.

As discussed, Concern was around the reference to the previous cluster state. Based on the logic from observer code, it is updating the latest state.

ankitkala · 2021-10-27T08:34:31Z

LGTM.

…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>

…er (#215) (#216) Signed-off-by: Gaurav Bafna <[email protected]>

…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>

…er (#215) (#217) Signed-off-by: Gaurav Bafna <[email protected]>

Using time out in cluster state observer as we are reusing the observer

27c3d52

Signed-off-by: Gaurav Bafna <[email protected]>

gbbafna force-pushed the cso-fix branch from 51ab270 to 27c3d52 Compare October 27, 2021 04:44

gbbafna requested review from ankitkala and saikaranam-amazon October 27, 2021 04:46

saikaranam-amazon reviewed Oct 27, 2021

View reviewed changes

ankitkala approved these changes Oct 27, 2021

View reviewed changes

saikaranam-amazon approved these changes Oct 27, 2021

View reviewed changes

gbbafna merged commit ced0d66 into opensearch-project:main Oct 27, 2021

gbbafna added a commit to gbbafna/cross-cluster-replication that referenced this pull request Oct 27, 2021

Using time out in cluster state observer as we are reusing the observ…

774a644

…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>

gbbafna added a commit to gbbafna/cross-cluster-replication that referenced this pull request Oct 27, 2021

Using time out in cluster state observer as we are reusing the observ…

08bdce7

…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>

gbbafna mentioned this pull request Oct 27, 2021

Using time out in cluster state observer as we are reusing the observ… #216

Merged

5 tasks

gbbafna added a commit that referenced this pull request Oct 27, 2021

Using time out in cluster state observer as we are reusing the observ…

71096cf

…er (#215) (#216) Signed-off-by: Gaurav Bafna <[email protected]>

gbbafna added a commit to gbbafna/cross-cluster-replication that referenced this pull request Oct 27, 2021

Using time out in cluster state observer as we are reusing the observ…

914f7fa

…er (opensearch-project#215) Signed-off-by: Gaurav Bafna <[email protected]>

gbbafna added a commit that referenced this pull request Oct 27, 2021

Using time out in cluster state observer as we are reusing the observ…

ee6b6c2

…er (#215) (#217) Signed-off-by: Gaurav Bafna <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using time out in cluster state observer as we are reusing the observer #215

Using time out in cluster state observer as we are reusing the observer #215

gbbafna commented Oct 27, 2021 •

edited

Loading

gbbafna commented Oct 27, 2021 •

edited

Loading

saikaranam-amazon Oct 27, 2021

gbbafna Oct 27, 2021

saikaranam-amazon Oct 27, 2021

ankitkala Oct 27, 2021 •

edited

Loading

gbbafna Oct 27, 2021

saikaranam-amazon Oct 27, 2021

ankitkala commented Oct 27, 2021

Using time out in cluster state observer as we are reusing the observer #215

Using time out in cluster state observer as we are reusing the observer #215

Conversation

gbbafna commented Oct 27, 2021 • edited Loading

Description

Issues Resolved

Check List

gbbafna commented Oct 27, 2021 • edited Loading

saikaranam-amazon Oct 27, 2021

Choose a reason for hiding this comment

gbbafna Oct 27, 2021

Choose a reason for hiding this comment

saikaranam-amazon Oct 27, 2021

Choose a reason for hiding this comment

ankitkala Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

gbbafna Oct 27, 2021

Choose a reason for hiding this comment

saikaranam-amazon Oct 27, 2021

Choose a reason for hiding this comment

ankitkala commented Oct 27, 2021

gbbafna commented Oct 27, 2021 •

edited

Loading

gbbafna commented Oct 27, 2021 •

edited

Loading

ankitkala Oct 27, 2021 •

edited

Loading