Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give kOps CLI knowledge about ASG warm pools #11227

Merged
merged 1 commit into from
Apr 15, 2021

Conversation

olemarkus
Copy link
Member

  • kops get instances mark warm pool nodes
  • rolling update warm pool
    • Warm pool instances will terminate first
    • Warm pool instances will terminate without validation as they are not part of the cluster

Based on #11216
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/nodeup labels Apr 13, 2021
@k8s-ci-robot k8s-ci-robot requested review from drekle and liranp April 13, 2021 11:05
@k8s-ci-robot k8s-ci-robot added area/provider/alicloud Issues or PRs related to alicloud provider area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/digitalocean Issues or PRs related to digitalocean provider area/provider/openstack Issues or PRs related to openstack provider area/provider/spotinst Issues or PRs related to spotinst provider area/rolling-update labels Apr 13, 2021
@olemarkus olemarkus force-pushed the warm-roll branch 3 times, most recently from 9cfd739 to 69ef5bf Compare April 13, 2021 17:37
@olemarkus olemarkus requested a review from rifelpet April 13, 2021 17:46
Copy link
Member

@johngmyers johngmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This capping of maxSurge

if maxSurge > len(update) {
maxSurge = len(update)
}

needs to ignore warm instances in update, otherwise rolling update could end up detaching warm instances.

return string(i.State)
})

columns := []string{"ID", "NODE-NAME", "STATUS", "ROLES", "INTERNAL-IP", "INSTANCE-GROUP", "MACHINE-TYPE", "STATE"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think state should come after either status or roles. It's more important that machine type at least.

@@ -293,6 +293,11 @@ func (b *KubeletBuilder) buildSystemdService() *nodetasks.Service {

service.InitDefaults()

if b.ConfigurationMode == "Warming" {

service.Running = fi.Bool(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add a test to verify this behavior?

@@ -669,12 +669,15 @@ func getNodeConfigFromServer(ctx context.Context, config *nodeup.ConfigServerOpt
}

func getAWSConfigurationMode(c *model.NodeupModelContext) (string, error) {
<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you squash these out of the commit stream?

klog.Infof("waiting for %v after terminating instance", sleepAfterTerminate)
time.Sleep(sleepAfterTerminate)
} else {
klog.Infof("%q is in the warm pool. Not sleeping", u.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a need to log anything here.

@olemarkus
Copy link
Member Author

In detachInstance, I added a check for the state, so it should just return if an instance is in the warm pool. AWS will fail a detach instance call on a warm pool instance.

@olemarkus
Copy link
Member Author

@johngmyers you were right. there were dragons in my previous approach. Now I am a bit more brutish and just iterate over all the warmpool nodes and quickly terminate them immediately in the beginning.

@johngmyers
Copy link
Member

I like simple.

@olemarkus
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 15, 2021
@olemarkus olemarkus added this to the v1.21 milestone Apr 15, 2021
@olemarkus
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 15, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2021
@olemarkus
Copy link
Member Author

/test pull-kops-e2e-kubernetes-aws

@olemarkus
Copy link
Member Author

/test pull-kops-e2e-k8s-containerd

@hakman
Copy link
Member

hakman commented Apr 15, 2021

/test all

@olemarkus
Copy link
Member Author

/retest

2 similar comments
@olemarkus
Copy link
Member Author

/retest

@hakman
Copy link
Member

hakman commented Apr 15, 2021

/retest

@k8s-ci-robot k8s-ci-robot merged commit 5aa8a31 into kubernetes:master Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/nodeup area/provider/alicloud Issues or PRs related to alicloud provider area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/digitalocean Issues or PRs related to digitalocean provider area/provider/openstack Issues or PRs related to openstack provider area/provider/spotinst Issues or PRs related to spotinst provider area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants