Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning from Hetzner about cluster-autoscaler #429

Closed
emrys90 opened this issue Aug 30, 2024 · 18 comments
Closed

Warning from Hetzner about cluster-autoscaler #429

emrys90 opened this issue Aug 30, 2024 · 18 comments

Comments

@emrys90
Copy link

emrys90 commented Aug 30, 2024

I received an email warning from Hetzner with the following message:

Subject: Important client information: Upcoming changes to cluster-autoscaler Hetzner provider and CX11 server type removal

Body:
Based on our API monitoring, you are using Kubernetes cluster-autoscaler in your Hetzner Cloud projects. The Hetzner provider in current versions of cluster-autoscaler has a bug and relies on the CX11 server type, which we will remove from our ordering options on 6 September 2024. You can learn more about the removal in the Cloud Changelog: https://docs.hetzner.cloud/changelog#2024-06-06-old-server-types-with-shared-intel-vcpus-are-deprecated

To prevent any disruptions for you, we will keep the CX11 server type available for your account. We will remove your access to the CX11 server type two weeks after the Kubernetes community releases new versions of cluster-autoscaler. We will announce the exact date in a follow up notification once the new versions are available.

The following versions of cluster-autoscaler are affected:

≤1.28.6 (including 1.27 and older)
≤1.29.4
≤1.30.2
≤1.31.0
To bridge the gap until the Kubernetes community releases the new versions, we published alternative container images of cluster-autoscaler that include a patch for the bug. You can use these in your deployment, but we will remove them one month after new cluster-autoscaler versions become available. We will not provide any other patch releases on this container image repository. Please switch back to the official images as soon as possible.

docker.io/hetznercloud/cluster-autoscaler:v1.28.6-hcloud1
docker.io/hetznercloud/cluster-autoscaler:v1.29.4-hcloud1
docker.io/hetznercloud/cluster-autoscaler:v1.30.2-hcloud1
docker.io/hetznercloud/cluster-autoscaler:v1.31.0-hcloud1
We will send you another notification once the new versions become available.

You can find more information at the following links:

https://docs.hetzner.cloud/changelog#2024-08-30-bug-cx11-removal-will-break-certain-versions-of-cluster-autoscaler
kubernetes/autoscaler#7210
https://docs.hetzner.cloud/changelog#2024-06-06-old-server-types-with-shared-intel-vcpus-are-deprecated
We will be happy to help you with any questions. Please write us a support request by logging onto your account on https://console.hetzner.cloud/support

Thank you for your understanding.

I am not using the CX11 node for my clusters. Is there anything I need to do for this? Am I at risk of having my production servers shutdown?

@vitobotta
Copy link
Owner

I also received it today. Not sure it's worth making a release to use the temp image since the problem only affects people who might try and use old instances that have been deprecated for a while already.

If you are not using CX11 you don't need to worry about this.

@emrys90
Copy link
Author

emrys90 commented Aug 30, 2024

Okay thanks!

@emrys90 emrys90 closed this as completed Aug 30, 2024
@rksm
Copy link

rksm commented Aug 30, 2024

Sorry to jump in here. I have very limited knowledge of hetzner-k3s, but to me it seems that the cx11 instance is hard coded to serve as the "draining-node-pool" (as mentioned in the linked github issue). To me that seems that it'll affect users regardless whether they use that node type or not.

@vitobotta
Copy link
Owner

Uhm, seems like I had read it too quickly. I will make a release with their temp image for now then.

@vitobotta vitobotta reopened this Aug 30, 2024
@vitobotta
Copy link
Owner

Can you guys please do a quick test of the autoscaler with 2.0.7?

@apricote
Copy link

Hello 👋

if your account is currently using cluster-autoscaler we will set a flag to your account so cx11 stays available until proper releases of cluster-autoscaler are available. You will get another notification then about the timeline for removal of cx11 from your account and the removal of the temp images.

Am I at risk of having my production servers shutdown?

This is only about the autoscaler doing active work. We will not shut down any servers. Just the cluster-autoscaler will throw an error anytime it tries to scale up your cluster.

@vitobotta
Copy link
Owner

Hello 👋

if your account is currently using cluster-autoscaler we will set a flag to your account so cx11 stays available until proper releases of cluster-autoscaler are available. You will get another notification then about the timeline for removal of cx11 from your account and the removal of the temp images.

Am I at risk of having my production servers shutdown?

This is only about the autoscaler doing active work. We will not shut down any servers. Just the cluster-autoscaler will throw an error anytime it tries to scale up your cluster.

Thanks @apricote for the clarification! I made a release with your docker image anyway for now. Just in case someone doesn't upgrade to the new and fixed version once it's out.

@vitobotta
Copy link
Owner

Closing since this has been addressed.

@emrys90
Copy link
Author

emrys90 commented Sep 1, 2024

Is it possible to update the autoscaler without updating k3s? I'd rather not risk my production system updating to something that may introduce issues, especially with how much changed in the 2.0 update.

@vitobotta
Copy link
Owner

Is it possible to update the autoscaler without updating k3s? I'd rather not risk my production system updating to something that may introduce issues, especially with how much changed in the 2.0 update.

I have it on my list to make it possible to set the docker image since we can now configure the URLs of the manifest but not the image. I will do it when I have a bit more time but the Hetzner image seems to work perfectly for me. I tested it a lot between yesterday and today and haven't seen any issues. If you upgrade just make sure you use the very latest version of hetzner-k3s since i fixed an issue with detection of the private network interface in autoscaled nodes.

@emrys90
Copy link
Author

emrys90 commented Sep 1, 2024

Is it possible to update the autoscaler without updating k3s? I'd rather not risk my production system updating to something that may introduce issues, especially with how much changed in the 2.0 update.

I have it on my list to make it possible to set the docker image since we can now configure the URLs of the manifest but not the image. I will do it when I have a bit more time but the Hetzner image seems to work perfectly for me. I tested it a lot between yesterday and today and haven't seen any issues. If you upgrade just make sure you use the very latest version of hetzner-k3s since i fixed an issue with detection of the private network interface in autoscaled nodes.

I meant my concern is on updating hetzner-k3s. I'm on version 1.1.5, and version 2.0 has some involved steps necessary for updating. I am concerned about introducing issues with my production system.

@vitobotta
Copy link
Owner

If you follow the instructions correctly you should be fine. You could also replicate your current cluster as test cluster and upgrade that one first

@emrys90
Copy link
Author

emrys90 commented Sep 1, 2024

If you follow the instructions correctly you should be fine. You could also replicate your current cluster as test cluster and upgrade that one first

There's a lot of steps, that I don't fully understand, that would be easy to screw something up. Even if I manage to do it right on the test cluster, a typo or something could mess up the production cluster when I do that next.

I would rather avoid that risk of messing up my production system...

@vitobotta
Copy link
Owner

If you follow the instructions correctly you should be fine. You could also replicate your current cluster as test cluster and upgrade that one first

There's a lot of steps, that I don't fully understand, that would be easy to screw something up. Even if I manage to do it right on the test cluster, a typo or something could mess up the production cluster when I do that next.

I would rather avoid that risk of messing up my production system...

Which steps do you find difficult? Most of it is about adapting the config file. There isn't much to it really

@jampy
Copy link

jampy commented Oct 21, 2024

Is hetzner-k3s v2.0.8 using the recently released official patched autoscaler or the temporarily patched version from Hetzner, which they will remove in ~2 weeks?

@vitobotta
Copy link
Owner

Is hetzner-k3s v2.0.8 using the recently released official patched autoscaler or the temporarily patched version from Hetzner, which they will remove in ~2 weeks?

At the moment it uses the patched version from Hetzner, docker.io/hetznercloud/cluster-autoscaler:v1.31.0-hcloud1. I will try to switch to the latest version with next release and also make the image configurable if something similar happens in the future, so you can just customize it in the config file.

@t33muki
Copy link

t33muki commented Nov 7, 2024

Got a message today from Hetzner stating that we have less than two weeks left, before the patched images are no more.

"Please switch back to the official image repositories. We will remove the alternative images on 19 November 2024. You will be unable to pull the images after that date."

@vitobotta
Copy link
Owner

Got a message today from Hetzner stating that we have less than two weeks left, before the patched images are no more.

"Please switch back to the official image repositories. We will remove the alternative images on 19 November 2024. You will be unable to pull the images after that date."

I have released 2.0.9 which includes a PR from a contributor with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants