Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spot Instance interruption notice support #1899

Closed
Promaethius opened this issue Aug 31, 2020 · 8 comments · Fixed by #2120
Closed

Spot Instance interruption notice support #1899

Promaethius opened this issue Aug 31, 2020 · 8 comments · Fixed by #2120
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@Promaethius
Copy link
Contributor

/kind feature

Describe the solution you'd like
Spot instance support has come in with #1868 however when a spot instance terminates, it drops the workload without notice.
Augment the spot instance support with interruption notice polling. When an instance receives the normal 2 minute notice, attempt to drain it using the same lifecycle process that the provider uses when scaling down a pool.

Anything else you would like to add:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#spot-instance-event-types

Environment:

  • Cluster-api-provider-aws version: N/A
  • Kubernetes version: (use kubectl version): N/A
  • OS (e.g. from /etc/os-release): N/A
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 31, 2020
@rudoi
Copy link
Contributor

rudoi commented Aug 31, 2020

I'm interested in the end product here, but curious about some of the effects of adding this polling into CAPA:

  • does this mean we have to reconcile AWSMachines on an interval shorter than 2m?
  • what impact would this have on our current EC2 request load?

I'm curious if perhaps we could do some kind of integration with https://github.com/aws/aws-node-termination-handler? This is a DaemonSet that polls the metadata API, which does not count against EC2 request rate limits.

Node Termination Handler minimally will cordon a Node and apply some labels to it. I wonder if there's some kind of configurable lifecycle mechanism we could use in CAPI to detect certain labels on Nodes. I do have certain feelings about CAPI knowing about an AWS-specific Node label though... 🤔

@randomvariable
Copy link
Member

I think integration with the node termination handler would be better. If anything, trying to cut down the amount of polling of EC2 APIs.

We also have #1871

@ncdc
Copy link
Contributor

ncdc commented Sep 21, 2020

@randomvariable
Copy link
Member

Upstream PR kubernetes-sigs/cluster-api#3668 has discussion on how we can do this in CAPI such that nothing needs to be done here.

@ncdc
Copy link
Contributor

ncdc commented Oct 26, 2020

Further discussions in kubernetes-sigs/cluster-api#3668 resulted in kubernetes-sigs/cluster-api#3504 and kubernetes-sigs/cluster-api#3817. The requirement for CAPA is to add AWSMachine.status.interruptible (bool), and set it to true when AWSMachine.spec.spotMarketOptions is non-nil.

@ncdc
Copy link
Contributor

ncdc commented Oct 26, 2020

/help
/good-first-issue

@k8s-ci-robot
Copy link
Contributor

@ncdc:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/help
/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Oct 26, 2020
@akash-gautam
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants