Add KEP for random ReplicaSet downscale #2233

damemi · 2021-01-06T18:55:35Z

This introduces the KEP for random replicaset downscale (as previously discussed in kubernetes/kubernetes#96748). Currently addresses sections for alpha designation, looking for feedback from sig-apps and sig-scheduling

re: #2185

/sig apps
/sig scheduling

damemi

/cc @alculquicondor @ahg-g @kubernetes/sig-apps-feature-requests

alculquicondor · 2021-01-06T19:46:44Z

Don't mark this as "fixes". The issue remains open until graduation to GA.

damemi · 2021-01-06T19:48:44Z

@alculquicondor thanks, updated

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

alculquicondor · 2021-01-06T19:52:26Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+- Unit and e2e tests
+
+Beta (v1.22): 
+- Enable RandomReplicaSetDownscale feature gate by default


We should have some user feedback criteria.

Would silent consensus be enough?

Asking for user feedback is probably a good idea, but otherwise I think no news is good news

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

ahg-g · 2021-01-07T17:25:26Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+For example, let's assume the base 10 is used, then we have the following
+mapping for different durations:
+
+| Duration | Scale |


perhaps clarify that scale translates to the pod rank.

rank as defined in the code? It doesn't

Not literally as defined in the code, but that it translates to the order.

It does not. It just changes how timestamps are compared, which is the last criteria for Pod comparisons.

ok, may be I am being too brief since I am commenting from the phone (you also need to give me the benefit of the doubt :)), what I meant is that this is the new sorting key.

Hopefully this gets explained with the other comment.
It can be seen as a sorting key, but saying that would create confusion, as it is not the only criteria.

there are multiple sorting keys, not just one.

ahg-g · 2021-01-07T17:26:19Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+   creation timestamp.
+
+Instead of directly comparing timestamps, the algorithm compares the elapsed
+times since the timestamp until the current time but in a logarithmic scale,


"since the timestamp"

Which timestamp? creation timestamp?

there are 2: creation and ready

so we have two sorting keys?

I'm confused by this now too, I thought this was just referring to creationTimestamp (picked this part up from Aldo's draft)

Diving into implementation details here:
There is not a single sorting key. There are sorting criteria
https://github.com/kubernetes/kubernetes/blob/cac9339/pkg/controller/controller_utils.go#L784-L809

Two of these criteria are ready time and creation time. We should affect both.
Feel free to include some of these details in the KEP, @damemi

right, so criteria 5 and 7 each will be updated so that the key for each is not the timestamp, but the "scale".

ahg-g · 2021-01-07T17:26:43Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+We propose a randomized approach to the algorithm for Pod victim selection
+during ReplicaSet downscale:
+
+1. Do a random shuffle of ReplicaSet Pods.


why is this necessary?

I'm not sure if just quick sort, with the random pivot, provides the guarantees that any order is equally likely.
If it doesn't, then we would favor removing Pods in the order provided by the lister.

the last sorting key can be a random number or the pods' uuid

uid sounds good. The random number would have to be produced before calling sort (and not in the Less function) otherwise it leads to undefined behavior.

Yes, the random numbers need to be assigned before hand, hence the uuid suggestion.

Do we want to sort the pods before or after grouping them into their scale buckets? I thought the goal was to select randomly from the youngest bucket

The goal is that Pods that belong to the same scale bucket are in a random order.

Maybe we don't need to get into the implementation detail of how we achive that in the KEP, but what it's important to note is what is the precedence of the buckets with regard to the other sorting criteria https://github.com/kubernetes/kubernetes/blob/cac9339/pkg/controller/controller_utils.go#L784-L809

The goal is that Pods that belong to the same scale bucket are in a random order.
Maybe we don't need to get into the implementation detail

Agree, just making sure I understood the intent.

ahg-g · 2021-01-07T17:29:14Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+
+### User Stories
+
+#### Story 1


focusing on the upgrade story is more convincing in my opinion

True, this story could probably just be shortened to the last 2 steps (5,6): where a new domain is added and upscaled to 3N, containing all the youngest pods. Then downscaled to 2N and all the pods from the new domain are removed. Is that what you mean?

Updated to just focus on an upgrade, please check my wording :)

Not sure if that's what Abdullah meant, but thanks for the simplification.

For completion, I would say:

A deployment could become imbalanced after:

An entire failure domain fails and becomes unavailable

A failure domain gets teared down for maintenance or upgrade

A new failure domain is added to a cluster

The imbalance cycle goes like follows:

This sounds reasonable, but the upgrade case I had in mind is not necessarily about adding a new failure domain, but about a rolling upgrade of nodes: consider a region with three zones, and the nodes get upgraded one zone at a time. As aldo mentioned, you can say that all those cases can lead to imbalance.

alculquicondor · 2021-01-13T14:21:26Z

/assign @janetkuo @kow3ns

alculquicondor · 2021-01-13T14:22:37Z

Note that this proposal is complementary to #1828

damemi · 2021-01-14T15:26:58Z

Tried to address the feedback so far in a new commit (that I intend to squash)

soltysh · 2021-01-25T20:31:44Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/kep.yaml

+status: implementable
+creation-date: 2020-12-15
+reviewers:
+  - "@janetkuo"


Add me as a reviewer.

soltysh · 2021-01-25T20:41:42Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+
+1. Sort ReplicaSet pods by pod UUID.
+2. Obtain wall time, and add it to [`ActivePodsWithRanks`](https://github.com/kubernetes/kubernetes/blob/dc39ab2417bfddcec37be4011131c59921fdbe98/pkg/controller/controller_utils.go#L815)
+2. Call sorting algorithm with a modified time comparison for start and


Did you consider algorithm which does 2 as described above but still includes node affinity like it's currently implemented. But instead of the initial sort extend the algorithm such that it will randomly pick from a bucket if len(bucket) > 1, which would limit the amount of sorting needed?

Additional problem I'm seeing here is that since you're modifying start and creation timestamp you'll end up copying the pod resources which will increase memory consumption during this phase.

We are not copying the timestamp. We just compare it differently.

You can see the proof-of-concept implementation kubernetes/kubernetes#96898

alculquicondor · 2021-01-25T20:47:40Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+
+1. Sort ReplicaSet pods by pod UUID.
+2. Obtain wall time, and add it to [`ActivePodsWithRanks`](https://github.com/kubernetes/kubernetes/blob/dc39ab2417bfddcec37be4011131c59921fdbe98/pkg/controller/controller_utils.go#L815)
+2. Call sorting algorithm with a modified time comparison for start and


We are not copying the timestamp. We just compare it differently.

You can see the proof-of-concept implementation kubernetes/kubernetes#96898

alculquicondor · 2021-01-25T20:49:09Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+We propose a randomized approach to the algorithm for Pod victim selection
+during ReplicaSet downscale:
+
+1. Sort ReplicaSet pods by pod UUID.


You include say the purpose of this: to obtain a pseudo-random shuffle.

Also, you don't have to make it as a first step. It can just be another comparison criteria for ActivePodsWithRnaks.Less after comparing timestamps.

damemi · 2021-01-26T16:08:14Z

Updated to add @soltysh to reviewers. Was there anything else I missed?

damemi · 2021-02-01T18:25:09Z

Bumping this for @kubernetes/sig-apps-pr-reviews as we approach KEP freeze

alculquicondor · 2021-02-04T16:26:07Z

/assign @wojtek-t
for PRR

wojtek-t

PRR itself looks fine, but I would like to see SIG approval first.

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

Add summary, motivation, detailed design and alternatives. Signed-off-by: Aldo Culquicondor <[email protected]>

soltysh

Nits.
/lgtm

soltysh · 2021-02-05T16:18:55Z

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md

+
+### Risks and Mitigations
+
+Certain users might be relaying in the existing downscaling heuristic. However,


Suggested change

Certain users might be relaying in the existing downscaling heuristic. However,

Certain users might be relaying on the existing downscaling heuristic. However,

kow3ns · 2021-02-05T16:36:58Z

/approve

wojtek-t · 2021-02-05T17:30:08Z

/approve for Alpha PRR (will require more for for beta)

k8s-ci-robot · 2021-02-05T17:30:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damemi, kow3ns, soltysh, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [wojtek-t]
~~keps/sig-apps/OWNERS~~ [kow3ns]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 6, 2021

k8s-ci-robot requested review from kow3ns and mattfarina January 6, 2021 18:55

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 6, 2021

damemi commented Jan 6, 2021

View reviewed changes

k8s-ci-robot requested review from ahg-g and alculquicondor January 6, 2021 18:56

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 6, 2021

damemi mentioned this pull request Jan 6, 2021

Random Pod selection on ReplicaSet downscaling #2185

Closed

12 tasks

damemi force-pushed the random-downscale branch 2 times, most recently from 2522b5f to 0ed4d54 Compare January 6, 2021 19:07

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 6, 2021

alculquicondor reviewed Jan 6, 2021

View reviewed changes

damemi force-pushed the random-downscale branch from 0ed4d54 to c1bd2f7 Compare January 6, 2021 20:46

ahg-g reviewed Jan 7, 2021

View reviewed changes

k8s-ci-robot assigned janetkuo and kow3ns Jan 13, 2021

damemi force-pushed the random-downscale branch from c1bd2f7 to db57fbd Compare January 14, 2021 15:17

damemi force-pushed the random-downscale branch 2 times, most recently from 074f9c7 to 16d8044 Compare January 14, 2021 15:37

soltysh reviewed Jan 25, 2021

View reviewed changes

alculquicondor reviewed Jan 25, 2021

View reviewed changes

damemi force-pushed the random-downscale branch 2 times, most recently from 83bf8bf to fb205f5 Compare January 26, 2021 16:08

damemi force-pushed the random-downscale branch from fb205f5 to d891ea6 Compare January 26, 2021 16:09

k8s-ci-robot assigned wojtek-t Feb 4, 2021

wojtek-t reviewed Feb 5, 2021

View reviewed changes

keps/sig-apps/2185-random-pod-select-on-replicaset-downscale/README.md Outdated Show resolved Hide resolved

Add KEP for random ReplicaSet downscale

123b9d6

Add summary, motivation, detailed design and alternatives. Signed-off-by: Aldo Culquicondor <[email protected]>

damemi force-pushed the random-downscale branch from d891ea6 to 123b9d6 Compare February 5, 2021 14:47

soltysh approved these changes Feb 5, 2021

View reviewed changes

k8s-ci-robot assigned soltysh Feb 5, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2021

k8s-ci-robot merged commit 77a84d2 into kubernetes:master Feb 5, 2021

k8s-ci-robot added this to the v1.21 milestone Feb 5, 2021


		### Risks and Mitigations

		Certain users might be relaying in the existing downscaling heuristic. However,

	Certain users might be relaying in the existing downscaling heuristic. However,
	Certain users might be relaying on the existing downscaling heuristic. However,

Add KEP for random ReplicaSet downscale #2233

Add KEP for random ReplicaSet downscale #2233

Conversation

damemi commented Jan 6, 2021 • edited Loading

damemi left a comment

Choose a reason for hiding this comment

alculquicondor commented Jan 6, 2021

damemi commented Jan 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Jan 13, 2021

alculquicondor commented Jan 13, 2021

damemi commented Jan 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Jan 26, 2021

damemi commented Feb 1, 2021 • edited Loading

alculquicondor commented Feb 4, 2021

wojtek-t left a comment

Choose a reason for hiding this comment

soltysh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kow3ns commented Feb 5, 2021

wojtek-t commented Feb 5, 2021

k8s-ci-robot commented Feb 5, 2021

damemi commented Jan 6, 2021 •

edited

Loading

damemi commented Feb 1, 2021 •

edited

Loading