Cluster Autoscaler panic when deleting node #6478

daimaxiaxie · 2024-01-30T03:20:08Z

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: v1.28.0

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:25:59Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.3-5", GitCommit:"dcc97265743078854c5328e30727147bdc5d1c37", GitTreeState:"clean", BuildDate:"2020-12-04T03:52:29Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

aws, aws cloud provider.

What did you expect to happen?:

delete node, no panic.

What happened instead?:

panic:

I0111 17:19:07.763679       1 drain.go:157] All pods removed from xxxxxxxxx
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x3cbe750]

goroutine 20363521 [running]:
k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.deleteNodesFromCloudProvider(0x40ac6659a8?, {0x40b8e1c328?, 0x1, 0x1})
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go:153 +0x130
k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.(*NodeDeletionBatcher).deleteNodesAndRegisterStatus(0x401a342bc0, {0x40b8e1c328?, 0x1, 0x0?}, 0x0?)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go:88 +0x3c
created by k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.(*NodeDeletionBatcher).AddNodes
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go:74 +0xbc

How to reproduce it (as minimally and precisely as possible):

After deciding to delete a node, losing nodes due to cloud vendors. Especially spot instance can be deleted at any time.

Anything else we need to know?:
Before the panic I saw the instance terminating event(due to health check) on the aws console. In fact, the instance is preempted.

The text was updated successfully, but these errors were encountered:

daimaxiaxie added the kind/bug Categorizes issue or PR as related to a bug. label Jan 30, 2024

This was referenced Jan 30, 2024

fix: panic in deleteNodesFromCloudProvider #6479

Closed

fix: panic in deleteNodesFromCloudProvider #6485

Merged

k8s-ci-robot closed this as completed in #6485 Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Autoscaler panic when deleting node #6478

Cluster Autoscaler panic when deleting node #6478

daimaxiaxie commented Jan 30, 2024

Cluster Autoscaler panic when deleting node #6478

Cluster Autoscaler panic when deleting node #6478

Comments

daimaxiaxie commented Jan 30, 2024