Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster-autoscaler][clusterapi] Add support for node autodiscovery to clusterapi provider #3314

Merged
merged 1 commit into from
Sep 3, 2020

Conversation

detiber
Copy link
Member

@detiber detiber commented Jul 13, 2020

Builds on #3312

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 13, 2020
Copy link
Member Author

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to update README for node autodiscovery configuration

@detiber detiber force-pushed the autoDiscovery branch 4 times, most recently from d2b26da to 9c2f0a4 Compare July 14, 2020 21:00
@detiber detiber changed the title [WIP][cluster-autoscaler][clusterapi] Add support for node autodiscovery to clusterapi provider [cluster-autoscaler][clusterapi] Add support for node autodiscovery to clusterapi provider Jul 14, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2020
@elmiko
Copy link
Contributor

elmiko commented Jul 14, 2020

nice Jason! i know we talked about the node discovery stuff, but i'm a little unfamiliar with the process. i'll give this a test drive.

@detiber
Copy link
Member Author

detiber commented Jul 15, 2020

@elmiko The main goal was to try to leverage the existing functionality of node autodiscovery to be able to limit which MachineSets/MachineDeployments are allowed to be used for NodeGroups.

If they are not specified this falls back to the previous behavior of allowing all discovered MachineSets and MachineDeployments to be used, but we may want to reconsider that at somepoint in the future.

@elmiko
Copy link
Contributor

elmiko commented Jul 15, 2020

thanks @detiber, i appreciate the explanation =)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2020
@MaciekPytel
Copy link
Contributor

Close + reopen to trigger travis rerun after fixing tests at HEAD.

@detiber
Copy link
Member Author

detiber commented Aug 20, 2020

@elmiko @benmoss bumping since the unstructured PR has been merged.

@ncdc
Copy link
Member

ncdc commented Aug 20, 2020

Does this need to be rebased to drop the the commits from 3312?

@detiber
Copy link
Member Author

detiber commented Aug 20, 2020

Does this need to be rebased to drop the the commits from 3312?

Rebased

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jason, this is looking great so far. i have a couple questions inline, but i really appreciate the thoroughness of the tests, it makes understanding this change much easier for me.

i am still doing some tests locally, but i broke my test stand when upgrading clusterapi so i probably won't finish my tests till monday.

@@ -80,7 +86,7 @@ func indexMachineByProviderID(obj interface{}) ([]string, error) {
return nil, nil
}

providerID, found, err := unstructured.NestedString(u.Object, "spec", "providerID")
providerID, found, err := unstructured.NestedString(u.UnstructuredContent(), "spec", "providerID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question, is this change related to the autodiscovery?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly, I think I originally hit an issue while building out the tests where u.Object was nil, and it seemed like it would be good to guard against that in general.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to break this change out separately if you prefer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i'm torn. on one hand it makes it easier to follow the unstructured stuff if we keep it separate, and on the other this is a small change to pick up a few more references.

honestly, my question was just to increase my understanding of the code here. i guess i'm ok to leave it here if no one else has issues.


if found && replicas > 0 {
nodegroups = append(nodegroups, ng)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious about the removal of this check. i know we have other checks to ensure that we don't go over the node group size, but does this change mean we will be returning all node groups even if they are full?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size check has been pushed down into newNodeGroupFromScalableResource(), which will return nil if it does not have capacity to scale or has 0 replicas.

var results []*unstructured.Unstructured
tracker := map[string]bool{}
for _, spec := range c.autoDiscoverySpecs {
resources, err := listResources(lister.ByNamespace(spec.namespace), spec.clusterName, spec.labelSelector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not overly familiar with the client-go lister stuff, but i'm having trouble understanding something about the namespace here. if the user doesn't provide a namespace in the autodiscovery clause will this pass nil to the lister.ByNameSpace call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec.namespace is a string, so it will default to "" when not set, which is equivalent to all namespaces when passed to a lister. For additional context, if one uses metav1.NamespaceAll, it is defined as the empty string: https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#pkg-constants

return fmt.Errorf("size increase too large - desired:%d max:%d", nreplicas, r.maxSize)
case nreplicas < r.minSize:
return fmt.Errorf("size decrease too large - desired:%d min:%d", nreplicas, r.minSize)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like moving the size check here, ++

@elmiko
Copy link
Contributor

elmiko commented Aug 27, 2020

hey @detiber , i finally got my test stand working again and gave this PR a drive today. everything seemed to work as expected. i tested the namespace and label options, attempting both inclusion and exclusion.

my one comment is that the node group autodiscovery stuff took me a few to understand that the constraints its looking for are on the MachineSet/MachineDeployment resources. i think you mention scalable resources in the readme, but i wonder if we should be more specific?

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 27, 2020
@benmoss
Copy link
Member

benmoss commented Sep 3, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 3, 2020
@k8s-ci-robot k8s-ci-robot merged commit f0938fc into kubernetes:master Sep 3, 2020
benmoss pushed a commit to benmoss/autoscaler that referenced this pull request Sep 28, 2020
[cluster-autoscaler][clusterapi] Add support for node autodiscovery to clusterapi provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants