add Timeout in WaitForCacheSync #894

mrlihanbo · 2021-11-01T07:35:49Z

Signed-off-by: lihanbo [email protected]

What type of PR is this?
/kind bug

What this PR does / why we need it:
There exists a scenario that some of joined cluster are unhealth in karmada. At the moment, if karmada controller manager restart, it will be blocked in WaitForCacheSync process until the unhealth cluster recovers.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:
"NONE"

RainbowMango · 2021-11-01T14:50:42Z

/priority important-soon

mrlihanbo · 2021-11-04T12:08:34Z

/cc @Garrybest

pkg/util/informermanager/single-cluster-manager.go

RainbowMango · 2021-11-05T03:27:08Z

/assign @Garrybest

pkg/util/informermanager/single-cluster-manager.go

Garrybest · 2021-11-05T03:39:28Z

pkg/util/informermanager/single-cluster-manager.go

+	ctx, cancel := context.WithTimeout(s.ctx, cacheSyncTimeout)
+	defer cancel()
+	s.lock.Lock()
+	defer s.lock.Unlock()


Same as the front.

Signed-off-by: lihanbo <[email protected]>

Garrybest · 2021-11-05T07:32:11Z

/lgtm

RainbowMango · 2021-11-05T07:46:57Z

/approve

karmada-bot · 2021-11-05T07:47:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

RainbowMango · 2022-12-09T04:25:57Z

pkg/controllers/status/cluster_status_controller.go

@@ -146,9 +136,20 @@ func (c *ClusterStatusController) syncClusterStatus(cluster *clusterv1alpha1.Clu
 		klog.V(2).Infof("Cluster(%s) still offline after retry, ensuring offline is set.", cluster.Name)
 		currentClusterStatus.Conditions = generateReadyCondition(false, false)
 		setTransitionTime(&cluster.Status, &currentClusterStatus)
+		c.InformerManager.Stop(cluster.Name)


@mrlihanbo Hello Hanbo, how are you doing?

Can you remember the reason why stopping the informer here, in case of a cluster offline?

#2930 is now trying to solve an issue due to this change.

also, cc @Garrybest to help recall.

Maybe because we want to re-establish the informer after the apiserver is healthy? 🤔

The reason may be not so convincible because I don't remember this line as well. 🤣

I talked to @mrlihanbo, he said this is probably for disabling repetitive warning logs, especially for those clusters offline for a long time.

karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 1, 2021

karmada-bot requested review from pigletfly and XiShanYongYe-Chang November 1, 2021 07:35

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 1, 2021

karmada-bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 1, 2021

mrlihanbo force-pushed the cluster_status_bugfix branch from b873572 to e964f21 Compare November 4, 2021 12:04

karmada-bot requested a review from Garrybest November 4, 2021 12:08

Garrybest suggested changes Nov 4, 2021

View reviewed changes

pkg/util/informermanager/single-cluster-manager.go Outdated Show resolved Hide resolved

mrlihanbo force-pushed the cluster_status_bugfix branch 2 times, most recently from ed19c4f to 2cbb560 Compare November 5, 2021 03:25

karmada-bot assigned Garrybest Nov 5, 2021

Garrybest suggested changes Nov 5, 2021

View reviewed changes

add Timeout in WaitForCacheSync

ac3878e

Signed-off-by: lihanbo <[email protected]>

mrlihanbo force-pushed the cluster_status_bugfix branch from 2cbb560 to ac3878e Compare November 5, 2021 05:58

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2021

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2021

karmada-bot merged commit 51c911a into karmada-io:master Nov 5, 2021

mrlihanbo deleted the cluster_status_bugfix branch March 2, 2022 07:30

RainbowMango reviewed Dec 9, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Timeout in WaitForCacheSync #894

add Timeout in WaitForCacheSync #894

mrlihanbo commented Nov 1, 2021

RainbowMango commented Nov 1, 2021

mrlihanbo commented Nov 4, 2021

RainbowMango commented Nov 5, 2021

Garrybest Nov 5, 2021

Garrybest commented Nov 5, 2021

RainbowMango commented Nov 5, 2021

karmada-bot commented Nov 5, 2021

RainbowMango Dec 9, 2022

Garrybest Dec 10, 2022

RainbowMango Dec 12, 2022

add Timeout in WaitForCacheSync #894

add Timeout in WaitForCacheSync #894

Conversation

mrlihanbo commented Nov 1, 2021

RainbowMango commented Nov 1, 2021

mrlihanbo commented Nov 4, 2021

RainbowMango commented Nov 5, 2021

Garrybest Nov 5, 2021

Choose a reason for hiding this comment

Garrybest commented Nov 5, 2021

RainbowMango commented Nov 5, 2021

karmada-bot commented Nov 5, 2021

RainbowMango Dec 9, 2022

Choose a reason for hiding this comment

Garrybest Dec 10, 2022

Choose a reason for hiding this comment

RainbowMango Dec 12, 2022

Choose a reason for hiding this comment