-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for defining/using AWS ASG Lifecycle Hooks #8708
Comments
Yes, please! |
Out of curiosity, is there a reason you chose to go this way and not use something like https://github.com/pusher/k8s-spot-termination-handler? |
I'm actually working with @andersosthus on this one, so I will make an answer. Yes, there is a good reason why we don't use the component you linked to. It does only support spot termination and we need a solution that also works on AGS termination. We do NOT use the common Cluster Autoscaler because we have memory bound and spiky workloads. So we export utilization metrics from Prometheus to CloudWatch, and the we use a combination of TargetTracking and StepScaling to modify the Autoscaling Groups desired count. Also the proposed solution from @andersosthus will allow to attach custom ARNs, like a Lambda function to do arbitrary activities when machines are added or removed from an Autoscaling group. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
This sounds good to me |
/remove-lifecycle stale |
@andersosthus are you still interested in implementing this? If not I'm going to take a shot at it. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale Anyone had any luck with this issue? This is to some extent related to #7119 |
Currently we handled this using a custom systemd unit in the instance group. But the use case will not longer apply when we get this NTH implementation in place. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
@paalkr can you share/guide how the systemd unit should like? I am trying to replicate, with increased timeouts (TimeoutStopSec, TimeoutSec) but hitting 2 minutes when it stops not gracefully |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
With the support for NTH in SQS mode, are the use cases covered? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Closing it is |
@olemarkus: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We would like support for defining AWS ASG Lifecycle hooks on an InstanceGroup.
Use case:
We use AWS ASG Lifecycle hooks with a custom script that watches for AWS Spot terminations, and if detected runs a drain on the node and then sends a
COMPLETED
signal to the Lifecycle hook, that then allows the instance to be terminated.The fact that we have this Lifecycle hooks kinda "breaks"
kops rolling-update cluster
, since the instance won't be terminated until either aCOMPLETED
signal is sent or it reaches the timeout value.Our proposed solution would look something like this:
Add an
awsAsgLifecycle
property toInstanceGroup
where one can set the Lifecycle properties (name, transition, default result, heartbeat timeout, notification arn, role arn).When
kops
is doing a node drain, if theInstanceGroup
has theawsAsgLifecycle
set, it should send aCOMPLETED
signal when the drain is done.If this sounds ok, I can do the implementation (though I probably need some guidance since I'm not that familiar with the kops codebase)
The text was updated successfully, but these errors were encountered: