Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change behaviour of Garbage Collector #4425

Closed
wants to merge 1,043 commits into from

Conversation

piotrnosek
Copy link
Contributor

Only remove AggregateCollectionStates which don't have an existing corresponding controller (e.g. Deployment).

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 27, 2021
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 27, 2021
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 27, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2021
@piotrnosek piotrnosek marked this pull request as draft October 27, 2021 14:05
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 27, 2021
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 29, 2021
@piotrnosek piotrnosek marked this pull request as ready for review October 29, 2021 11:33
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 29, 2021
@piotrnosek
Copy link
Contributor Author

/cc @kgolab

@k8s-ci-robot
Copy link
Contributor

@piotrnosek: GitHub didn't allow me to request PR reviews from the following users: kgolab.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @kgolab

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

// 1) It has no samples and there are no more active pods that can contribute,
// 2) The last sample is too old to give meaningful recommendation (>8 days),
// 3) There are no samples and the aggregate state was created >8 days ago.
func (cluster *ClusterState) garbageCollectAggregateCollectionStates(now time.Time, controllerFetcher controllerfetcher.ControllerFetcher) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description is no longer true? Now we remove aggregates only if controller is terminated and it has o live pods inside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually all points are still true, just the definition of active pod changed. Before an inactive pod would be a pod that is in a terminal state (succeeded/failed). Right now, an inactive pod is a pod which is both in a terminal state and doesn't have an existing controller. I've added a comment to reflect that.

This doesn't change the logic for old samples (>8 days old) and old aggregates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying pod is active when its phase is not one of {PodSucceeded, PodFailed} makes sense to me.

Saying pod is active when its phase is not one of {PodSucceeded, PodFailed} or there is a controller for it looks unintuitive to me.

Please:

  • update this change to keep previous definition of active and add having a controller as a separate condition, or
  • pick a new word for the concept "has a controller or isn't in a terminal phase".

@@ -433,6 +441,35 @@ func (cluster *ClusterState) GetMatchingPods(vpa *Vpa) []PodID {
return matchingPods
}

// GetControllerForPod returns controller associated with given Pod.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right. This function will return nil for a pod which has a controller but doesn't have VPA for the controller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, good point, though I believe for now there is no good way for getting a controller for Pod without going through VPA object controlling that Pod. I've updated name and comment to reflect that.

@piotrnosek piotrnosek force-pushed the vpa-gc-controller branch 2 times, most recently from e5d939a to c954e2a Compare November 2, 2021 17:11
@piotrnosek piotrnosek requested a review from jbartosik November 3, 2021 16:40
BigDarkClown and others added 23 commits November 30, 2021 15:37
This change updates the logic for the clusterapi autoscaler provider so
that the `CAPI_GROUP` environment variable will also affect the
annotations keys for minimum and maximum node group size, the machine
annotation, machine deletion, and the cluster name label. It also addes
unit tests and an update to the readme.
This change adds the aforementioned label to the list of ignored labels
in the AWS nodegroupset processor. This change is being made in response
to the addition of this label by the aws-ebs-csi-driver. This label will
eventually be deprecated by the driver, but its use will prevent AWS
users from properly balancing similar nodes. Also adds unit test for the
AWS processor.

ref: kubernetes#3230
ref: kubernetes-sigs/aws-ebs-csi-driver#729
This allows the ClusterAPI provider to ignore the
`topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset
processor. It also adds unit tests to exercise the new processor.
Support per-ASG (scaledown) settings as permited by the
cloudprovider's interface GetOptions() method.
Tests are flaky with VPA sometimes generating recommendations higher
than 1000 mCPU.

I think this is a reasonable behavior - we're asking resoirce consumer
to use 1800 mCPU between 3 pods, if it gets unevenly distributed we can
end up with some pods using 1000 mCPU.
Treating them both the same would cause issues when the ratio
between the requests and the limits is a floating-point value,
suggesting a millivalue as the limit for memory.
This change adds ascii diagrams to help illustrate the differences
between the various authentication configurations for the clusterapi
provider. Due to the distributed nature of Cluster API and its ability
to have several Kubernetes clusters managed from a central location, the
kubeconfig authentication options for it are slightly more complex than
other providers.
AggregateCollectionsStates for which corresponding owner controller
doesn't exist anymore.
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2021
@k8s-ci-robot
Copy link
Contributor

@piotrnosek: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jbartosik
Copy link
Collaborator

@piotrnosek please rebase this PR on top of current master, it looks like it has a lot of changes that shouldn't be here.

@piotrnosek
Copy link
Contributor Author

Closing this PR due to running into rebase hell with git, desired changes are on a separate PR: #4488.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.