Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaler not scaling down pods whenkind is of the resource is same between controllers #5977

Closed
shamil opened this issue Jul 24, 2023 · 4 comments · Fixed by #6412
Closed
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@shamil
Copy link
Contributor

shamil commented Jul 24, 2023

  • Which component are you using?: cluster-autoscaler
  • What version of the component are you using?: 1.25.0
  • What k8s version are you using?: v1.25.10
  • What environment is this in?: kops on AWS

We are using OpenKruise and Advanced DaemonSet.
Autoscaler seems like detects it as regular DaemonSet and trying to find the corresponding DaemonSet for the workload, and fails. This prevents from scaling down to occur.

I'm not sure whether it's the root cause or not, but I suspect that Autoscaler doesn't respect the API group. The pods created by Advanced DaemonSet have the following ownerReferenses:

  ownerReferences:
  - apiVersion: apps.kruise.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: example-advanced-daemonset
    uid: 6f0f4fa8-2694-44df-9e68-8f00630f19c1

The kind is DaemonSet, but the apiVersion is apps.kruise.io/v1alpha1. So might be that the apiVersion is being ignored, and autoscaler just looks for regular DaemonSet named example-advanced-daemonset, which obviously doesn't exist.

Here are relevant logs from autoscaler:

aws-cluster-autoscaler I0724 08:09:58.813598       1 cluster.go:170] node i-0bb670250f677c0c0 cannot be removed: daemonset for devops/example-advanced-daemonset-x44c5 is not present, err: daemonset.apps "example-advanced-daemonset" not found

Can someone advice if this expected that the apiVersion is ignored or not being respected, am I missing something?

@shamil shamil added the kind/bug Categorizes issue or PR as related to a bug. label Jul 24, 2023
@pohly
Copy link
Contributor

pohly commented Sep 20, 2023

Your suspicion seems correct to me (not an expert, I just stumbled over this for other reasons):

isDaemonSetPod = true
// don't have listener for other DaemonSet kind
// TODO: we should use a generic client for checking the reference.
if checkReferences && refKind == "DaemonSet" {
_, err := listers.DaemonSetLister().DaemonSets(controllerNamespace).Get(controllerRef.Name)
if apierrors.IsNotFound(err) {
return replicated, isDaemonSetPod, &BlockingPod{Pod: pod, Reason: ControllerNotFound}, fmt.Errorf("daemonset for %s/%s is not present, err: %v", pod.Namespace, pod.Name, err)
} else if err != nil {
return replicated, isDaemonSetPod, &BlockingPod{Pod: pod, Reason: UnexpectedError}, fmt.Errorf("error when trying to get daemonset for %s/%s , err: %v", pod.Namespace, pod.Name, err)
}
}

@songminglong
Copy link

haha, you need to write a custom codes to avoid this situation

@daimaxiaxie
Copy link
Contributor

Maybe higher version of ‘skipNodesWithCustomControllerPods’ option is useful.

@hagaibarel
Copy link

This won't help, the problem isn't with a custom controller, it's with a controller with the same kind but different group, and since the autoscaler only looks at the kind and fails to find it, it blocks scale down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
6 participants