Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler panic when deleting node #6478

Closed
daimaxiaxie opened this issue Jan 30, 2024 · 0 comments · Fixed by #6485
Closed

Cluster Autoscaler panic when deleting node #6478

daimaxiaxie opened this issue Jan 30, 2024 · 0 comments · Fixed by #6485
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@daimaxiaxie
Copy link
Contributor

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: v1.28.0

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:25:59Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.3-5", GitCommit:"dcc97265743078854c5328e30727147bdc5d1c37", GitTreeState:"clean", BuildDate:"2020-12-04T03:52:29Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

aws, aws cloud provider.

What did you expect to happen?:

delete node, no panic.

What happened instead?:

panic:

I0111 17:19:07.763679       1 drain.go:157] All pods removed from xxxxxxxxx
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x3cbe750]

goroutine 20363521 [running]:
k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.deleteNodesFromCloudProvider(0x40ac6659a8?, {0x40b8e1c328?, 0x1, 0x1})
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go:153 +0x130
k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.(*NodeDeletionBatcher).deleteNodesAndRegisterStatus(0x401a342bc0, {0x40b8e1c328?, 0x1, 0x0?}, 0x0?)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go:88 +0x3c
created by k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.(*NodeDeletionBatcher).AddNodes
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go:74 +0xbc

How to reproduce it (as minimally and precisely as possible):

After deciding to delete a node, losing nodes due to cloud vendors. Especially spot instance can be deleted at any time.

Anything else we need to know?:
Before the panic I saw the instance terminating event(due to health check) on the aws console. In fact, the instance is preempted.

@daimaxiaxie daimaxiaxie added the kind/bug Categorizes issue or PR as related to a bug. label Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
1 participant