Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster-autoscaler] Support using --cloud-config for clusterapi provider #3203

Merged
merged 1 commit into from
Sep 21, 2020

Conversation

detiber
Copy link
Member

@detiber detiber commented Jun 5, 2020

  • Leverage --cloud-config to allow for providing a separate kubeconfig for Cluster API management and workload cluster resources
  • Allow for fallback to previous behavior when --cloud-config is not specified for backward compatibility
  • Provides a --clusterapi-cloud-config-authoritative flag to disable the above fallback behavior and allow for both the management and workload cluster clients to use the in-cluster config

Fixes: #3196

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 5, 2020
@k8s-ci-robot k8s-ci-robot requested review from hardikdr and losipiuk June 5, 2020 19:39
@detiber
Copy link
Member Author

detiber commented Jun 5, 2020

/assign @elmiko @enxebre

PTAL, based on our earlier discussion.

@detiber detiber changed the title [cluster-autoscaler] Support using --cloud-config for clusterapi provider [WIP] [cluster-autoscaler] Support using --cloud-config for clusterapi provider Jun 5, 2020
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2020
@detiber
Copy link
Member Author

detiber commented Jun 5, 2020

TODO: need to update docs for this change

@elmiko
Copy link
Contributor

elmiko commented Jun 5, 2020

thanks Jason! i'll give this a test on monday and let you know how it goes.

@detiber detiber mentioned this pull request Jun 8, 2020
@MaciekPytel
Copy link
Contributor

I'm happy to lgtm once this is ready.

@elmiko
Copy link
Contributor

elmiko commented Jun 8, 2020

hey Jason, i've just started some testing on this patch and the first thing i tried was to repeat the expected behavior for the autoscaler. i created a joined CAPI cluster and then attempted to start the autoscaler as i have done previously, unfortunately it failed on the kubeconfig detect.

i tried the same action against master and it worked, so i am guessing there is a regression or something missed. attaching the output.

ca.err.txt

I0608 12:24:43.524083  891727 flags.go:52] FLAG: --add-dir-header="false"
I0608 12:24:43.524141  891727 flags.go:52] FLAG: --address=":8085"
I0608 12:24:43.524146  891727 flags.go:52] FLAG: --alsologtostderr="false"
I0608 12:24:43.524149  891727 flags.go:52] FLAG: --aws-use-static-instance-list="false"
I0608 12:24:43.524153  891727 flags.go:52] FLAG: --balance-similar-node-groups="true"
I0608 12:24:43.524159  891727 flags.go:52] FLAG: --balancing-ignore-label="[]"
I0608 12:24:43.524163  891727 flags.go:52] FLAG: --cloud-config=""
I0608 12:24:43.524165  891727 flags.go:52] FLAG: --cloud-provider="clusterapi"
I0608 12:24:43.524168  891727 flags.go:52] FLAG: --cloud-provider-gce-l7lb-src-cidrs="130.211.0.0/22,35.191.0.0/16"
I0608 12:24:43.524173  891727 flags.go:52] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,209.85.152.0/22,209.85.204.0/22,35.191.0.0/16"
I0608 12:24:43.524178  891727 flags.go:52] FLAG: --cluster-name=""
I0608 12:24:43.524180  891727 flags.go:52] FLAG: --cores-total="0:320000"
I0608 12:24:43.524183  891727 flags.go:52] FLAG: --estimator="binpacking"
I0608 12:24:43.524186  891727 flags.go:52] FLAG: --expander="random"
I0608 12:24:43.524189  891727 flags.go:52] FLAG: --expendable-pods-priority-cutoff="-10"
I0608 12:24:43.524192  891727 flags.go:52] FLAG: --gpu-total="[]"
I0608 12:24:43.524195  891727 flags.go:52] FLAG: --ignore-daemonsets-utilization="false"
I0608 12:24:43.524198  891727 flags.go:52] FLAG: --ignore-mirror-pods-utilization="false"
I0608 12:24:43.524201  891727 flags.go:52] FLAG: --ignore-taint="[]"
I0608 12:24:43.524204  891727 flags.go:52] FLAG: --kubeconfig="/home/mike/work-cluster.kubeconfig"
I0608 12:24:43.524207  891727 flags.go:52] FLAG: --kubernetes=""
I0608 12:24:43.524210  891727 flags.go:52] FLAG: --leader-elect="true"
I0608 12:24:43.524214  891727 flags.go:52] FLAG: --leader-elect-lease-duration="15s"
I0608 12:24:43.524218  891727 flags.go:52] FLAG: --leader-elect-renew-deadline="10s"
I0608 12:24:43.524222  891727 flags.go:52] FLAG: --leader-elect-resource-lock="leases"
I0608 12:24:43.524225  891727 flags.go:52] FLAG: --leader-elect-resource-name=""
I0608 12:24:43.524228  891727 flags.go:52] FLAG: --leader-elect-resource-namespace=""
I0608 12:24:43.524231  891727 flags.go:52] FLAG: --leader-elect-retry-period="2s"
I0608 12:24:43.524234  891727 flags.go:52] FLAG: --log-backtrace-at=":0"
I0608 12:24:43.524240  891727 flags.go:52] FLAG: --log-dir=""
I0608 12:24:43.524243  891727 flags.go:52] FLAG: --log-file=""
I0608 12:24:43.524246  891727 flags.go:52] FLAG: --log-file-max-size="1800"
I0608 12:24:43.524249  891727 flags.go:52] FLAG: --logtostderr="true"
I0608 12:24:43.524252  891727 flags.go:52] FLAG: --max-autoprovisioned-node-group-count="15"
I0608 12:24:43.524255  891727 flags.go:52] FLAG: --max-bulk-soft-taint-count="10"
I0608 12:24:43.524258  891727 flags.go:52] FLAG: --max-bulk-soft-taint-time="3s"
I0608 12:24:43.524261  891727 flags.go:52] FLAG: --max-empty-bulk-delete="10"
I0608 12:24:43.524264  891727 flags.go:52] FLAG: --max-failing-time="15m0s"
I0608 12:24:43.524267  891727 flags.go:52] FLAG: --max-graceful-termination-sec="600"
I0608 12:24:43.524270  891727 flags.go:52] FLAG: --max-inactivity="10m0s"
I0608 12:24:43.524273  891727 flags.go:52] FLAG: --max-node-provision-time="10m0s"
I0608 12:24:43.524276  891727 flags.go:52] FLAG: --max-nodes-total="24"
I0608 12:24:43.524278  891727 flags.go:52] FLAG: --max-total-unready-percentage="45"
I0608 12:24:43.524282  891727 flags.go:52] FLAG: --memory-total="0:6400000"
I0608 12:24:43.524285  891727 flags.go:52] FLAG: --min-replica-count="0"
I0608 12:24:43.524288  891727 flags.go:52] FLAG: --namespace="default"
I0608 12:24:43.524295  891727 flags.go:52] FLAG: --new-pod-scale-up-delay="0s"
I0608 12:24:43.524298  891727 flags.go:52] FLAG: --node-autoprovisioning-enabled="false"
I0608 12:24:43.524301  891727 flags.go:52] FLAG: --node-deletion-delay-timeout="2m0s"
I0608 12:24:43.524304  891727 flags.go:52] FLAG: --node-group-auto-discovery="[]"
I0608 12:24:43.524307  891727 flags.go:52] FLAG: --nodes="[]"
I0608 12:24:43.524310  891727 flags.go:52] FLAG: --ok-total-unready-count="3"
I0608 12:24:43.524317  891727 flags.go:52] FLAG: --profiling="false"
I0608 12:24:43.524320  891727 flags.go:52] FLAG: --regional="false"
I0608 12:24:43.524323  891727 flags.go:52] FLAG: --scale-down-candidates-pool-min-count="50"
I0608 12:24:43.524326  891727 flags.go:52] FLAG: --scale-down-candidates-pool-ratio="0.1"
I0608 12:24:43.524329  891727 flags.go:52] FLAG: --scale-down-delay-after-add="10s"
I0608 12:24:43.524332  891727 flags.go:52] FLAG: --scale-down-delay-after-delete="10s"
I0608 12:24:43.524335  891727 flags.go:52] FLAG: --scale-down-delay-after-failure="10s"
I0608 12:24:43.524338  891727 flags.go:52] FLAG: --scale-down-enabled="true"
I0608 12:24:43.524341  891727 flags.go:52] FLAG: --scale-down-gpu-utilization-threshold="0.5"
I0608 12:24:43.524344  891727 flags.go:52] FLAG: --scale-down-non-empty-candidates-count="30"
I0608 12:24:43.524347  891727 flags.go:52] FLAG: --scale-down-unneeded-time="23s"
I0608 12:24:43.524350  891727 flags.go:52] FLAG: --scale-down-unready-time="20m0s"
I0608 12:24:43.524352  891727 flags.go:52] FLAG: --scale-down-utilization-threshold="0.5"
I0608 12:24:43.524355  891727 flags.go:52] FLAG: --scale-up-from-zero="true"
I0608 12:24:43.524360  891727 flags.go:52] FLAG: --scan-interval="10s"
I0608 12:24:43.524363  891727 flags.go:52] FLAG: --skip-headers="false"
I0608 12:24:43.524366  891727 flags.go:52] FLAG: --skip-log-headers="false"
I0608 12:24:43.524369  891727 flags.go:52] FLAG: --skip-nodes-with-local-storage="true"
I0608 12:24:43.524372  891727 flags.go:52] FLAG: --skip-nodes-with-system-pods="true"
I0608 12:24:43.524375  891727 flags.go:52] FLAG: --stderrthreshold="2"
I0608 12:24:43.524378  891727 flags.go:52] FLAG: --unremovable-node-recheck-timeout="5m0s"
I0608 12:24:43.524381  891727 flags.go:52] FLAG: --v="4"
I0608 12:24:43.524384  891727 flags.go:52] FLAG: --vmodule=""
I0608 12:24:43.524387  891727 flags.go:52] FLAG: --write-status-configmap="true"
I0608 12:24:43.524391  891727 main.go:374] Cluster Autoscaler 1.18.0
I0608 12:24:43.524400  891727 main.go:249] Using kubeconfig file: /home/mike/work-cluster.kubeconfig
I0608 12:24:43.535655  891727 leaderelection.go:242] attempting to acquire leader lease  default/cluster-autoscaler...
I0608 12:24:43.539472  891727 leaderelection.go:252] successfully acquired lease default/cluster-autoscaler
I0608 12:24:43.539577  891727 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Lease", Namespace:"default", Name:"cluster-autoscaler", UID:"c5e113e8-062d-4863-a97b-a5a4a9a8d28d", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"7060537", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' localhost.localdomain became leader
I0608 12:24:43.539780  891727 main.go:249] Using kubeconfig file: /home/mike/work-cluster.kubeconfig
I0608 12:24:43.540749  891727 main.go:249] Using kubeconfig file: /home/mike/work-cluster.kubeconfig
I0608 12:24:43.541836  891727 reflector.go:207] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188
I0608 12:24:43.541889  891727 reflector.go:207] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I0608 12:24:43.541862  891727 reflector.go:207] Starting reflector *v1beta1.PodDisruptionBudget (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
I0608 12:24:43.541907  891727 reflector.go:243] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I0608 12:24:43.541912  891727 reflector.go:207] Starting reflector *v1.Job (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:338
I0608 12:24:43.541913  891727 reflector.go:243] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
I0608 12:24:43.541924  891727 reflector.go:243] Listing and watching *v1.Job from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:338
I0608 12:24:43.541899  891727 reflector.go:243] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188
I0608 12:24:43.542026  891727 reflector.go:207] Starting reflector *v1.ReplicationController (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:329
I0608 12:24:43.542038  891727 reflector.go:243] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:329
I0608 12:24:43.542028  891727 reflector.go:207] Starting reflector *v1.DaemonSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:320
I0608 12:24:43.542053  891727 reflector.go:243] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:320
I0608 12:24:43.542186  891727 reflector.go:207] Starting reflector *v1.StatefulSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356
I0608 12:24:43.542197  891727 reflector.go:243] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356
I0608 12:24:43.542257  891727 reflector.go:207] Starting reflector *v1.ReplicaSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:347
I0608 12:24:43.542269  891727 reflector.go:243] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:347
I0608 12:24:43.541870  891727 reflector.go:207] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I0608 12:24:43.542292  891727 reflector.go:243] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I0608 12:24:43.541837  891727 reflector.go:207] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I0608 12:24:43.542415  891727 reflector.go:243] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I0608 12:24:43.561132  891727 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0608 12:24:43.561160  891727 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0608 12:24:43.561368  891727 cloud_provider_builder.go:29] Building clusterapi cloud provider.
W0608 12:24:43.561380  891727 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0608 12:24:43.561412  891727 client_config.go:557] error creating inClusterConfig, falling back to default config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
F0608 12:24:43.561522  891727 clusterapi_provider.go:148] cannot build config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

i am going to test a multicluster setup next

@elmiko
Copy link
Contributor

elmiko commented Jun 8, 2020

quick followup, if i set both --cloud-config and --kubeconfig then i am able to operate. but if i only set one, then i get the previous behavior.

setting both flags to the same kubeconfig is working for me on a joined cluster.

@detiber
Copy link
Member Author

detiber commented Jun 8, 2020

@elmiko 😬 it looks like I accidentally inverted the logic for the fallback previously. It should work now.

@elmiko
Copy link
Contributor

elmiko commented Jun 8, 2020

hehe, i found the same when i started to exercise the multi-cluster deployment. thanks for the update!

this is working for me when i use the autoscaler like this cluster-autoscaler ... --kubeconfig=workload-cluster.kubeconfig --cloud-config=management-cluster.kubeconfig. scale up and down looks normal to me.

i also confirmed that it works in joined cluster mode as well.

i think we just need a docs adjustment and then i am good to approve.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 9, 2020
@detiber detiber changed the title [WIP] [cluster-autoscaler] Support using --cloud-config for clusterapi provider [cluster-autoscaler] Support using --cloud-config for clusterapi provider Jun 9, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2020
@detiber
Copy link
Member Author

detiber commented Jun 9, 2020

@elmiko this should be good to go now, updated the README to cover the use of the --kubeconfig and --cloud-config flags

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

just need to modify the cloud-provider value to clusterapi

cluster-autoscaler/cloudprovider/clusterapi/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! thanks again Jason =)

/approve

@elmiko
Copy link
Contributor

elmiko commented Jun 9, 2020

/area provider/clusterapi

@k8s-ci-robot
Copy link
Contributor

@elmiko: The label(s) area/provider/clusterapi cannot be applied, because the repository doesn't have them

In response to this:

/area provider/clusterapi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@elmiko
Copy link
Contributor

elmiko commented Jun 9, 2020

/area provider/cluster-api
/approve

@elmiko
Copy link
Contributor

elmiko commented Jul 9, 2020

forgot to mention this but, as for the error stuff i tend to side for not wrapping them and trying to use the base errors package.

i kinda like the whitespace that the linter suggested though.

@detiber detiber changed the title [WIP][cluster-autoscaler] Support using --cloud-config for clusterapi provider [cluster-autoscaler] Support using --cloud-config for clusterapi provider Jul 14, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2020
@detiber
Copy link
Member Author

detiber commented Jul 14, 2020

Rebased on top of #3312 and #3314, both of which should be considered pre-requisites for this change now.

@benmoss
Copy link
Member

benmoss commented Jul 15, 2020

happy to test drive this once it's ready

I just got this running on my kind cluster using my other branch, this is all I needed to add to get it to work managing a cluster called bmo: 5b3a8a4

😸

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2020
@benmoss
Copy link
Member

benmoss commented Sep 16, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 16, 2020
…ider

- Leverage --cloud-config to allow for providing a separate kubeconfig for Cluster API management and workload cluster resources
- Allow for fallback to previous behavior when --cloud-config is not specified for backward compatibility
- Provides a --clusterapi-cloud-config-authoritative flag to disable the above fallback behavior and allow for both the management and workload cluster clients to use the in-cluster config
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 21, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko, MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@detiber
Copy link
Member Author

detiber commented Sep 21, 2020

Rebased, apologies about the delays.

@benmoss
Copy link
Member

benmoss commented Sep 21, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2020
@k8s-ci-robot k8s-ci-robot merged commit 2103c16 into kubernetes:master Sep 21, 2020
benmoss pushed a commit to benmoss/autoscaler that referenced this pull request Sep 25, 2020
[cluster-autoscaler] Support using --cloud-config for clusterapi provider
benmoss pushed a commit to benmoss/autoscaler that referenced this pull request Sep 28, 2020
[cluster-autoscaler] Support using --cloud-config for clusterapi provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/cluster-api Issues or PRs related to Cluster API provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster Autoscaler CAPI provider only supports merged clusters
9 participants