Change behaviour of Garbage Collector #4425

piotrnosek · 2021-10-27T13:06:11Z

Only remove AggregateCollectionStates which don't have an existing corresponding controller (e.g. Deployment).

piotrnosek · 2021-10-29T11:35:47Z

/cc @kgolab

k8s-ci-robot · 2021-10-29T11:35:48Z

@piotrnosek: GitHub didn't allow me to request PR reviews from the following users: kgolab.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @kgolab

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jbartosik · 2021-10-29T16:24:29Z

vertical-pod-autoscaler/pkg/recommender/model/cluster.go

+// 1) It has no samples and there are no more active pods that can contribute,
+// 2) The last sample is too old to give meaningful recommendation (>8 days),
+// 3) There are no samples and the aggregate state was created >8 days ago.
+func (cluster *ClusterState) garbageCollectAggregateCollectionStates(now time.Time, controllerFetcher controllerfetcher.ControllerFetcher) {


I think this description is no longer true? Now we remove aggregates only if controller is terminated and it has o live pods inside?

Actually all points are still true, just the definition of active pod changed. Before an inactive pod would be a pod that is in a terminal state (succeeded/failed). Right now, an inactive pod is a pod which is both in a terminal state and doesn't have an existing controller. I've added a comment to reflect that.

This doesn't change the logic for old samples (>8 days old) and old aggregates.

Saying pod is active when its phase is not one of {PodSucceeded, PodFailed} makes sense to me.

Saying pod is active when its phase is not one of {PodSucceeded, PodFailed} or there is a controller for it looks unintuitive to me.

Please:

update this change to keep previous definition of active and add having a controller as a separate condition, or

pick a new word for the concept "has a controller or isn't in a terminal phase".

jbartosik · 2021-10-29T16:26:12Z

vertical-pod-autoscaler/pkg/recommender/model/cluster.go

@@ -433,6 +441,35 @@ func (cluster *ClusterState) GetMatchingPods(vpa *Vpa) []PodID {
 	return matchingPods
 }

+// GetControllerForPod returns controller associated with given Pod.


This doesn't sound right. This function will return nil for a pod which has a controller but doesn't have VPA for the controller

True, good point, though I believe for now there is no good way for getting a controller for Pod without going through VPA object controlling that Pod. I've updated name and comment to reflect that.

Signed-off-by: Shivam Sandbhor <[email protected]>

This change updates the logic for the clusterapi autoscaler provider so that the `CAPI_GROUP` environment variable will also affect the annotations keys for minimum and maximum node group size, the machine annotation, machine deletion, and the cluster name label. It also addes unit tests and an update to the readme.

…ersion

This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729

This allows the ClusterAPI provider to ignore the `topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset processor. It also adds unit tests to exercise the new processor.

…list Also add g5 instance type

Support per-ASG (scaledown) settings as permited by the cloudprovider's interface GetOptions() method.

Signed-off-by: Shivam Sandbhor <[email protected]>

Tests are flaky with VPA sometimes generating recommendations higher than 1000 mCPU. I think this is a reasonable behavior - we're asking resoirce consumer to use 1800 mCPU between 3 pods, if it gets unevenly distributed we can end up with some pods using 1000 mCPU.

Treating them both the same would cause issues when the ratio between the requests and the limits is a floating-point value, suggesting a millivalue as the limit for memory.

Signed-off-by: GitHub <[email protected]>

This change adds ascii diagrams to help illustrate the differences between the various authentication configurations for the clusterapi provider. Due to the distributed nature of Cluster API and its ability to have several Kubernetes clusters managed from a central location, the kubeconfig authentication options for it are slightly more complex than other providers.

…ound in gce cloud provider

AggregateCollectionsStates for which corresponding owner controller doesn't exist anymore.

k8s-ci-robot · 2021-11-30T15:42:39Z

@piotrnosek: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jbartosik · 2021-12-02T08:28:34Z

@piotrnosek please rebase this PR on top of current master, it looks like it has a lot of changes that shouldn't be here.

piotrnosek · 2021-12-02T13:42:27Z

Closing this PR due to running into rebase hell with git, desired changes are on a separate PR: #4488.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 27, 2021

k8s-ci-robot requested review from jbartosik and krzysied October 27, 2021 13:06

piotrnosek force-pushed the vpa-gc-controller branch from 23cfc03 to 48677f4 Compare October 27, 2021 13:07

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 27, 2021

piotrnosek force-pushed the vpa-gc-controller branch from 40b0c65 to 48677f4 Compare October 27, 2021 13:14

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 27, 2021

piotrnosek force-pushed the vpa-gc-controller branch from 48677f4 to 9b63423 Compare October 27, 2021 13:24

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2021

piotrnosek force-pushed the vpa-gc-controller branch from 9b63423 to 70d6e7a Compare October 27, 2021 13:27

piotrnosek marked this pull request as draft October 27, 2021 14:05

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 27, 2021

piotrnosek force-pushed the vpa-gc-controller branch from 70d6e7a to 249f49d Compare October 29, 2021 10:43

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 29, 2021

piotrnosek marked this pull request as ready for review October 29, 2021 11:33

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 29, 2021

jbartosik reviewed Oct 29, 2021

View reviewed changes

jbartosik added the area/vertical-pod-autoscaler label Nov 2, 2021

piotrnosek force-pushed the vpa-gc-controller branch 2 times, most recently from e5d939a to c954e2a Compare November 2, 2021 17:11

piotrnosek requested a review from jbartosik November 3, 2021 16:40

BigDarkClown and others added 23 commits November 30, 2021 15:37

Remove obsolete MigInstanceTemplatesProvider

3c699f1

Add MigInfoProvider tests

7374a47

Make GCE instance template labels & taints getters public

8c4e1f2

Mention Packet for supporting price expander

19b780c

Signed-off-by: Shivam Sandbhor <[email protected]>

Add gjtempleton to OWNERS

06a17b9

update readme and examples to keep it consistent with the community v…

1d48a5a

…ersion

removes deprecated CAPI annotations

4523efd

add ClusterAPI nodegroupset processor

51f11b4

This allows the ClusterAPI provider to ignore the `topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset processor. It also adds unit tests to exercise the new processor.

CA - Update gofmt of CAPI_nodegroup.go

bed8c1f

CA - AWS - Update StaticListLastUpdateTime on re-generating instance …

b1ef97b

…list Also add g5 instance type

Added changes to support alternative recommender

1099f7c

implement GetOptions for AWS

a7edc2b

Support per-ASG (scaledown) settings as permited by the cloudprovider's interface GetOptions() method.

Register packet provider in all builder

f5cd148

Signed-off-by: Shivam Sandbhor <[email protected]>

use docker buildx to buld multi-arch image (kubernetes#4407)

4aafed6

Separate limits scaling between CPU & memory

93e7a94

Treating them both the same would cause issues when the ratio between the requests and the limits is a floating-point value, suggesting a millivalue as the limit for memory.

Improve ScaledUpGroup event info to include current & made nodes

ad381e4

Signed-off-by: GitHub <[email protected]>

Fix typo in FAQ

fc1ac98

Changed the logging level for extractAutoscalerVarFromKubeEnv not f…

366db01

…ound in gce cloud provider

Change garbage collector behaviour, by only removing

2b896ab

AggregateCollectionsStates for which corresponding owner controller doesn't exist anymore.

piotrnosek force-pushed the vpa-gc-controller branch from 4529db4 to 2b896ab Compare November 30, 2021 15:42

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2021

piotrnosek closed this Dec 2, 2021

piotrnosek mentioned this pull request Dec 2, 2021

Change the behaviour of Garbage Collector of AggregateCollectionStates #4488

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change behaviour of Garbage Collector #4425

Change behaviour of Garbage Collector #4425

piotrnosek commented Oct 27, 2021

piotrnosek commented Oct 29, 2021

k8s-ci-robot commented Oct 29, 2021

jbartosik Oct 29, 2021

piotrnosek Nov 2, 2021

jbartosik Nov 15, 2021

jbartosik Oct 29, 2021

piotrnosek Nov 2, 2021

k8s-ci-robot commented Nov 30, 2021

jbartosik commented Dec 2, 2021

piotrnosek commented Dec 2, 2021

Change behaviour of Garbage Collector #4425

Change behaviour of Garbage Collector #4425

Conversation

piotrnosek commented Oct 27, 2021

piotrnosek commented Oct 29, 2021

k8s-ci-robot commented Oct 29, 2021

jbartosik Oct 29, 2021

Choose a reason for hiding this comment

piotrnosek Nov 2, 2021

Choose a reason for hiding this comment

jbartosik Nov 15, 2021

Choose a reason for hiding this comment

jbartosik Oct 29, 2021

Choose a reason for hiding this comment

piotrnosek Nov 2, 2021

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 30, 2021

jbartosik commented Dec 2, 2021

piotrnosek commented Dec 2, 2021