-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change behaviour of Garbage Collector #4425
Conversation
23cfc03
to
48677f4
Compare
40b0c65
to
48677f4
Compare
48677f4
to
9b63423
Compare
9b63423
to
70d6e7a
Compare
70d6e7a
to
249f49d
Compare
/cc @kgolab |
@piotrnosek: GitHub didn't allow me to request PR reviews from the following users: kgolab. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
// 1) It has no samples and there are no more active pods that can contribute, | ||
// 2) The last sample is too old to give meaningful recommendation (>8 days), | ||
// 3) There are no samples and the aggregate state was created >8 days ago. | ||
func (cluster *ClusterState) garbageCollectAggregateCollectionStates(now time.Time, controllerFetcher controllerfetcher.ControllerFetcher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this description is no longer true? Now we remove aggregates only if controller is terminated and it has o live pods inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually all points are still true, just the definition of active pod changed. Before an inactive pod would be a pod that is in a terminal state (succeeded/failed). Right now, an inactive pod is a pod which is both in a terminal state and doesn't have an existing controller. I've added a comment to reflect that.
This doesn't change the logic for old samples (>8 days old) and old aggregates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying pod is active when its phase is not one of {PodSucceeded, PodFailed} makes sense to me.
Saying pod is active when its phase is not one of {PodSucceeded, PodFailed} or there is a controller for it looks unintuitive to me.
Please:
- update this change to keep previous definition of active and add having a controller as a separate condition, or
- pick a new word for the concept "has a controller or isn't in a terminal phase".
@@ -433,6 +441,35 @@ func (cluster *ClusterState) GetMatchingPods(vpa *Vpa) []PodID { | |||
return matchingPods | |||
} | |||
|
|||
// GetControllerForPod returns controller associated with given Pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound right. This function will return nil
for a pod which has a controller but doesn't have VPA for the controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, good point, though I believe for now there is no good way for getting a controller for Pod without going through VPA object controlling that Pod. I've updated name and comment to reflect that.
e5d939a
to
c954e2a
Compare
Signed-off-by: Shivam Sandbhor <[email protected]>
This change updates the logic for the clusterapi autoscaler provider so that the `CAPI_GROUP` environment variable will also affect the annotations keys for minimum and maximum node group size, the machine annotation, machine deletion, and the cluster name label. It also addes unit tests and an update to the readme.
This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729
This allows the ClusterAPI provider to ignore the `topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset processor. It also adds unit tests to exercise the new processor.
…list Also add g5 instance type
Support per-ASG (scaledown) settings as permited by the cloudprovider's interface GetOptions() method.
Signed-off-by: Shivam Sandbhor <[email protected]>
Tests are flaky with VPA sometimes generating recommendations higher than 1000 mCPU. I think this is a reasonable behavior - we're asking resoirce consumer to use 1800 mCPU between 3 pods, if it gets unevenly distributed we can end up with some pods using 1000 mCPU.
Treating them both the same would cause issues when the ratio between the requests and the limits is a floating-point value, suggesting a millivalue as the limit for memory.
Signed-off-by: GitHub <[email protected]>
This change adds ascii diagrams to help illustrate the differences between the various authentication configurations for the clusterapi provider. Due to the distributed nature of Cluster API and its ability to have several Kubernetes clusters managed from a central location, the kubeconfig authentication options for it are slightly more complex than other providers.
…ound in gce cloud provider
AggregateCollectionsStates for which corresponding owner controller doesn't exist anymore.
4529db4
to
2b896ab
Compare
@piotrnosek: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@piotrnosek please rebase this PR on top of current master, it looks like it has a lot of changes that shouldn't be here. |
Closing this PR due to running into rebase hell with git, desired changes are on a separate PR: #4488. |
Only remove AggregateCollectionStates which don't have an existing corresponding controller (e.g. Deployment).