Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Timeout in WaitForCacheSync #894

Merged
merged 1 commit into from
Nov 5, 2021

Conversation

mrlihanbo
Copy link

Signed-off-by: lihanbo [email protected]

What type of PR is this?
/kind bug

What this PR does / why we need it:
There exists a scenario that some of joined cluster are unhealth in karmada. At the moment, if karmada controller manager restart, it will be blocked in WaitForCacheSync process until the unhealth cluster recovers.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:
"NONE"

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 1, 2021
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 1, 2021
@RainbowMango
Copy link
Member

/priority important-soon

@karmada-bot karmada-bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 1, 2021
@mrlihanbo mrlihanbo force-pushed the cluster_status_bugfix branch from b873572 to e964f21 Compare November 4, 2021 12:04
@mrlihanbo
Copy link
Author

/cc @Garrybest

@karmada-bot karmada-bot requested a review from Garrybest November 4, 2021 12:08
@mrlihanbo mrlihanbo force-pushed the cluster_status_bugfix branch 2 times, most recently from ed19c4f to 2cbb560 Compare November 5, 2021 03:25
@RainbowMango
Copy link
Member

/assign @Garrybest

pkg/util/informermanager/single-cluster-manager.go Outdated Show resolved Hide resolved
ctx, cancel := context.WithTimeout(s.ctx, cacheSyncTimeout)
defer cancel()
s.lock.Lock()
defer s.lock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the front.

@mrlihanbo mrlihanbo force-pushed the cluster_status_bugfix branch from 2cbb560 to ac3878e Compare November 5, 2021 05:58
@Garrybest
Copy link
Member

/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2021
@RainbowMango
Copy link
Member

/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2021
@karmada-bot karmada-bot merged commit 51c911a into karmada-io:master Nov 5, 2021
@mrlihanbo mrlihanbo deleted the cluster_status_bugfix branch March 2, 2022 07:30
@@ -146,9 +136,20 @@ func (c *ClusterStatusController) syncClusterStatus(cluster *clusterv1alpha1.Clu
klog.V(2).Infof("Cluster(%s) still offline after retry, ensuring offline is set.", cluster.Name)
currentClusterStatus.Conditions = generateReadyCondition(false, false)
setTransitionTime(&cluster.Status, &currentClusterStatus)
c.InformerManager.Stop(cluster.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrlihanbo Hello Hanbo, how are you doing?

Can you remember the reason why stopping the informer here, in case of a cluster offline?

#2930 is now trying to solve an issue due to this change.

also, cc @Garrybest to help recall.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because we want to re-establish the informer after the apiserver is healthy? 🤔

The reason may be not so convincible because I don't remember this line as well. 🤣

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @mrlihanbo, he said this is probably for disabling repetitive warning logs, especially for those clusters offline for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants