New expander: priority expander #1801

piontec · 2019-03-18T12:15:21Z

The PR

This commit introduces code, tests and docs for a new expander called 'priority'. The motivation is placed in this proposal. Docs are updated as well in FAQ.

Testing procedure

Full autoamted kubernetes e2e tests were not done. The testing procedure was as follows:

Unit tests
go test ./expander/priority/...
Manual e2e tests
- I started a test cluster and connected the new cluster-autoscaler binary to it
- checked for selection of the correct node group
- check for handling of ConfigMap reloads

mwielgus · 2019-03-19T17:18:55Z

cluster-autoscaler/FAQ.md

@@ -606,6 +606,26 @@ after scale-up. This is useful when you have different classes of nodes, for exa
 would match the cluster size. This expander is described in more details
 [HERE](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/pricing.md). Currently it works only for GCE and GKE (patches welcome.)

+* `priority` - selects the node group that has the highest priority assigned by the user. The priority configuration is based on the values stored in a ConfigMap. This ConfigMap has to be created before cluster autoscaler with priority expander can be started. The ConfigMap must be named `cluster-autoscaler-priority-expander` and it must be placed in the same namespace as cluster autoscaler pod. The format of the [ConfigMap](expander/priority/priority-expander-configmap.yaml) is as follows:


Can you merge the proposal and this text into a readme.md and put it into the expander directory?

Do you want it only in expander dir and linked here?

Just a brief description and a link, please.

mwielgus · 2019-03-20T09:42:39Z

cluster-autoscaler/proposals/priority_expander.md

@@ -0,0 +1,31 @@
+# Priority based expander for cluster-autoscaler


Make it expander documentation.

mwielgus · 2019-03-20T10:24:04Z

cluster-autoscaler/expander/priority/priority.go

+				continue
+			}
+			if err := res.parsePrioritiesYAMLString(prioString); err != nil {
+				klog.Warningf("Wrong configuration for priority expander: %v. Ignoring update.", err)


I would also send an event, update status config map, etc. Be extremely noisy in case of misconfiguriation.

piontec · 2019-03-21T10:28:31Z

@mwielgus Changes are in place. I added events only, as the status-map seemed not to be a good fit to me: it keeps CAS-wide info only, no matter what expander you use. And if I'm not wrong, no other expander keeps status there. Still, if you think it would be beneficial, I can work on it.

mwielgus · 2019-03-22T01:18:16Z

!!! 'golint' problems: 
/home/travis/gopath/src/k8s.io/autoscaler/cluster-autoscaler/expander/priority/priority.go:181:6: exported type EventRecorder should have comment or be unexported

mwielgus · 2019-03-22T01:20:17Z

You have too many (21) commits in this PR. Please squash them to no more than 3 commits.

mwielgus · 2019-03-22T09:34:50Z

"merge with master" commit has misleading/ambiguous name. Please rename or squash with "priority expander".

piontec · 2019-03-22T09:44:35Z

rebased on master

MaciekPytel · 2019-03-25T10:45:55Z

cluster-autoscaler/cloudprovider/alicloud/examples/cluster-autoscaler-standard.yaml

@@ -63,8 +63,8 @@ rules:
  verbs: ["create"]
 - apiGroups: [""]
  resources: ["configmaps"]
-  resourceNames: ["cluster-autoscaler-status"]
-  verbs: ["delete","get","update"]
+  resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]


Once this merges we should also update https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/rbac/cluster-autoscaler/cluster-autoscaler-rbac.yaml#L54.

mwielgus

/lgtm
/approve

k8s-ci-robot · 2019-03-25T12:36:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [mwielgus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

piontec · 2019-03-25T13:20:57Z

@mwielgus Thanks for the review! One question though: what is the best approach to backport it, we would need it back in 1.11 (1.3) kubernetes release.

MaciekPytel

Sorry I'm late to the party, but I have some comments for the implementation. In particular I think it will fail after the channel is abruptly closed after a few hours - we've already had this problem in both HPA and VPA. Can you create another PR addressing the comments?

MaciekPytel · 2019-03-25T15:25:46Z

cluster-autoscaler/expander/priority/configmap_handler.go

+		return "", nil, errors.New(errMsg)
+	}
+
+	watcher, err := maps.Watch(metav1.ListOptions{


Listers already handle watch cache and all necessary error handling and we use those for all other objects. I don't think we should re-implement this functionality. In particular if you want to use watch directly you need to handle channel closing - in both HPA and VPA we've already run into issues caused by watch channel abruptly closing and we needed to implement a retry logic for this case. I don't see such retry logic in this PR.
That being said - unless you have a very strong reason I strongly recommend using a lister or an informer, rather than directly using watch API.

I didn't know about any problems using watch API - when is the channel closed? Can you point me to some docs?

I don't know if there is any documentation on the details. IIRC @jbartosik once run into similar issue, so he may provide more details. The general recommendation is to always use higher-level abstractions provided by client-go library, rather than raw client.

MaciekPytel · 2019-03-25T15:27:34Z

cluster-autoscaler/expander/priority/priority.go

+}
+
+// EventRecorder is an interface to abstract kubernetes event recording.
+type EventRecorder interface {


Why are you redefining this here?

For testability - check unit tests. Have I missed exactly the same interface being already defined elsewhere?

After taking a closer look at unittests - they're very inconsistent with existing unittests. We have utilities we use for making fake nodegroups (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/test/test_cloud_provider.go) and there is an existing utility for faking event recording (used for example here: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/status/eventing_scale_up_processor_test.go#L87).

Please use existing tooling for your unittests.

MaciekPytel · 2019-03-25T15:29:27Z

cluster-autoscaler/expander/priority/priority.go

+		for event := range priorityChangesChan {
+			cm, ok := event.Object.(*apiv1.ConfigMap)
+			if !ok {
+				klog.Exit("Unexpected object type received on the configmap update channel in priority expander")


You should return an appropriate error (API or Internal), not crash the application.

MaciekPytel · 2019-03-25T15:35:37Z

cluster-autoscaler/expander/priority/priority.go

+	fallbackStrategy expander.Strategy
+	changesChan      <-chan watch.Event
+	priorities       map[int][]*regexp.Regexp
+	padlock          sync.RWMutex


Expander is called once per loop, at most 1 call / 10s (more likely 1 per 15-20s), after a ton of complex calculations done all over CA code. Do we really need async config update? Feels like we could just get configmap from watch cache every time and synchronously check if the config changed - the performance impact should be negligible and the complexity of this expander will greatly reduce.

Hmm, I agree, if there's no plan to make the updates/loops more frequent in the future, probably we can drop the watch and go with fetching the map every time. I'm just not sure about caching - can you point me to some docs? How is stuff cached, for how long and what is your default caching toolset for CAS?

By 'watch cache' I mean using listers or informers. Those are a higher level abstraction for Kubernetes client provided by client-go. They provide a local cache that is backed by a watch on apiserver and updated on any change. So you get fresh data, except you don't have to care about creating goroutines doing the watch and error handling - it's all provided.
The general recommendation is to always use lister or informer over raw client, unless there is a very good reason not to.

piontec · 2019-03-28T07:33:28Z

I created this PR to update RBAC YAML in kubernetes: kubernetes/kubernetes#75814

Cherry pick #1801 #1843 #1889 #1920 to 1.14: Priority Expander

They should've been fixed in kubernetes#1801, kubernetes#1920

They should've been fixed in kubernetes/autoscaler#1801, kubernetes/autoscaler#1920

They should've been fixed in kubernetes#1801, kubernetes#1920

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 18, 2019

k8s-ci-robot requested review from aleksandra-malinowska and losipiuk March 18, 2019 12:15

mwielgus suggested changes Mar 20, 2019

View reviewed changes

piontec force-pushed the feature/priority_expander branch from 4a80adc to 8a5aee6 Compare March 22, 2019 09:21

priority expander

c5ba4b3

piontec force-pushed the feature/priority_expander branch from 8a5aee6 to c5ba4b3 Compare March 22, 2019 09:43

ezalejski mentioned this pull request Mar 22, 2019

[Feature Idea] A simple expander based on static precedence #1772

Closed

MaciekPytel reviewed Mar 25, 2019

View reviewed changes

k8s-ci-robot assigned mwielgus Mar 25, 2019

mwielgus approved these changes Mar 25, 2019

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2019

k8s-ci-robot merged commit 3c64796 into kubernetes:master Mar 25, 2019

piontec deleted the feature/priority_expander branch March 25, 2019 13:21

MaciekPytel reviewed Mar 25, 2019

View reviewed changes

piontec mentioned this pull request Mar 28, 2019

add 'cluster-autoscaler-priority-expander' CM to RBAC for CAS kubernetes/kubernetes#75814

Closed

piontec mentioned this pull request Mar 29, 2019

Priority expander relies now on listers instead of direct watch api #1843

Merged

Jeffwan mentioned this pull request Aug 2, 2019

Ability to tag priorities to different ASG's for preferring particular array. #1709

Closed

mgalgs mentioned this pull request Oct 9, 2019

Priority expander 1.14 backport #2435

Closed

danielmellado mentioned this pull request Oct 24, 2019

Rebase to upstream/cluster-autoscaler-release-1.16 openshift/kubernetes-autoscaler#119

Closed

Jeffwan mentioned this pull request Nov 27, 2019

Cherry pick #1801 #1843 #1889 #1920 to 1.14: Priority Expander #2578

Merged

k8s-ci-robot added a commit that referenced this pull request Nov 28, 2019

Merge pull request #2578 from Jeffwan/1.14

9430261

Cherry pick #1801 #1843 #1889 #1920 to 1.14: Priority Expander

ghost pushed a commit to Capillary/autoscaler that referenced this pull request Jan 28, 2021

Fix priority expander permissions

1c4f7c7

They should've been fixed in kubernetes#1801, kubernetes#1920

umialpha pushed a commit to umialpha/autoscaler that referenced this pull request Mar 26, 2021

Fix priority expander permissions

583138c

They should've been fixed in kubernetes#1801, kubernetes#1920

aksentyev pushed a commit to aksentyev/autoscaler that referenced this pull request Apr 9, 2021

Fix priority expander permissions

8bb3658

They should've been fixed in kubernetes#1801, kubernetes#1920

piotrnosek pushed a commit to piotrnosek/autoscaler that referenced this pull request Nov 30, 2021

Fix priority expander permissions

afee3b1

They should've been fixed in kubernetes#1801, kubernetes#1920

himanshu-kun pushed a commit to himanshu-kun/autoscaler that referenced this pull request Apr 11, 2022

Fix priority expander permissions

2195aa1

They should've been fixed in kubernetes#1801, kubernetes#1920

galina-tochilkin pushed a commit to mtp-devops/3d-party-helm that referenced this pull request Aug 17, 2022

Fix priority expander permissions

8611b53

They should've been fixed in kubernetes/autoscaler#1801, kubernetes/autoscaler#1920

tim-smart pushed a commit to arisechurch/autoscaler that referenced this pull request Nov 22, 2022

Fix priority expander permissions

c58b41a

They should've been fixed in kubernetes#1801, kubernetes#1920

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New expander: priority expander #1801

New expander: priority expander #1801

piontec commented Mar 18, 2019

mwielgus Mar 19, 2019

piontec Mar 20, 2019

mwielgus Mar 20, 2019

mwielgus Mar 20, 2019

mwielgus Mar 20, 2019

piontec commented Mar 21, 2019

mwielgus commented Mar 22, 2019

mwielgus commented Mar 22, 2019

mwielgus commented Mar 22, 2019

piontec commented Mar 22, 2019

MaciekPytel Mar 25, 2019

mwielgus left a comment

k8s-ci-robot commented Mar 25, 2019

piontec commented Mar 25, 2019

MaciekPytel left a comment

MaciekPytel Mar 25, 2019

piontec Mar 26, 2019

MaciekPytel Mar 26, 2019

MaciekPytel Mar 25, 2019

piontec Mar 26, 2019

MaciekPytel Mar 26, 2019

MaciekPytel Mar 25, 2019

piontec Mar 26, 2019

MaciekPytel Mar 25, 2019

piontec Mar 26, 2019

MaciekPytel Mar 26, 2019

piontec commented Mar 28, 2019

		@@ -0,0 +1,31 @@
		# Priority based expander for cluster-autoscaler

New expander: priority expander #1801

New expander: priority expander #1801

Conversation

piontec commented Mar 18, 2019

The PR

Testing procedure

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piontec commented Mar 21, 2019

mwielgus commented Mar 22, 2019

mwielgus commented Mar 22, 2019

mwielgus commented Mar 22, 2019

piontec commented Mar 22, 2019

Choose a reason for hiding this comment

mwielgus left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 25, 2019

piontec commented Mar 25, 2019

MaciekPytel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piontec commented Mar 28, 2019