-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase to upstream/cluster-autoscaler-release-1.16 #119
Conversation
Use print() function in both Python 2 and Python 3
Run VPA e2e tests for both apis even if one fails
add test case for controller_fetcher correct comments typo
add test case for controller_fetcher and cluster_feeder
Currently we read certificate files into a buffer of 5000 bytes. This sets a max size for our certificates; above this max, we will get invalid certs, most likely, leading to runtime bad certificate errors.
Use ioutil for certificate file reading.
- SubscriptionID placeholder does not match documentation or surrounding placeholders
Add support for cronjobs in VPA
Update c5,i3en,m5,r5 instance types
xrange() was removed in Python 3 in favor of range()
Fix typo in autoscaler azure example
…rash nodeGroup judy IsNil to avoid crashed
…erver-over-heapster Use Metrics Server instead of Heapster in the FAQ
Update VPA dependencies to k8s 1.14.3
Switch VPA examples to use apps/v1 API for Deployment
…milarity_rules fix: ignore agentpool label when looking for similar node groups with Azure provider
…onfig_readme Add multinode config in readme.
"delete node" has a specific meaning in the context of the Kubernetes API: deleting the node object. However, the cluster-autoscaler never does that; it terminates the underlying instance and expects the [cloud-node-controller](https://kubernetes.io/docs/concepts/architecture/cloud-controller/#node-controller) to remove the corresponding node from Kubernetes. Replace all mentions of "deleting" a node with "terminating" to disambiguate this. Signed-off-by: Matthias Rampke <[email protected]>
This had me stumped for the better part of a day. While the cluster-autoscaler "deletes" nodes, it does not actually delete the Node object from the Kubernetes API. In normal operations, with a well-configured cluster, this is a minor point; however, when debugging why nodes do not get deleted, the inconsistent terminology can be a major headache. This FAQ entry should clarify the difference for anyone who needs to know. Signed-off-by: Matthias Rampke <[email protected]>
…examples Add required selector to VPA deployment examples for apps/v1
up when there was a multizonal pool with number of nodes exceeding limit for one zone.
Be less specific as to when the event is generated. The qualifying criteria is when the adding a node would reach the --max-nodes-total specified.
We previously only considered a MachineSet or MachineDeployment suitable for scaling if it had positive scaling bounds. For example: annotations: machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "1" machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6" On us-west-2 we have: $ kubectl get machinesets NAME DESIRED CURRENT READY AVAILABLE AGE amcdermo-ca-5ws2b-worker-us-west-2a 1 1 1 1 4h23m amcdermo-ca-5ws2b-worker-us-west-2b 1 1 1 1 4h23m amcdermo-ca-5ws2b-worker-us-west-2c 1 1 1 1 4h23m amcdermo-ca-5ws2b-worker-us-west-2d 0 0 4h23m but the autoscaler will fail to scale up any of these because the last machineset has a replica count of 0. This change modifies the criteria for where we return a nodegroup (i.e., machine set) to must have: - positive scaling bounds, and - the underlying resource must have a replica count that is > 0.
When creating a test config I had previously allowed for the number of nodes to be different to the number of replicas. We don't use this distinction so dropping that entirely from all the unit tests.
This invokes the e2e tests and is mainly invoked by CI.
This reworks the logic in DeleteNodes([]nodes) to: - annotate a node for deletion - drop the replica count by 1 immediately - fail fast on any error during those steps This is different from the current behaviour which first annotated all the nodes then dropped the replica count by len(nodes). The goal here is to not leave machines and the respective machine set replica count in an indeterminate state should any error occur.
…1 alias This is largely to be consistent with other usages (in the community) but really to be at parity with the upstream PR [1] that uses this import alias already. This also makes it easier to backport changes made from openshift/autoscaler into upstream. [1] kubernetes#1866
The original intent of the event was to convey that the ceiling had been reached. This now logs that configured maximum.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1731011 In OpenShift clusters we see very small memory differences between instances, typically 8Ki. The default comparator which is used when --balance-similar-node-groups is enabled only considers nodes in a nodegroup to be equal when the memory capacity is identical. This new comparator is identical to the default but will tolerate a 128Ki difference in memory capacity. With this change, and with --balance-similar-node-groups enabled, I now see the following: ```console $ oc get machinesets NAME DESIRED CURRENT READY AVAILABLE AGE amcdermo-w6m6z-worker-us-east-2a 3 3 3 3 24h amcdermo-w6m6z-worker-us-east-2b 4 4 4 4 24h amcdermo-w6m6z-worker-us-east-2c 4 4 4 4 24h e2e-59443-w-0 3 3 1 1 20m e2e-59443-w-1 3 3 1 1 20m e2e-59443-w-2 3 3 1 1 20m e2e-59443-w-3 3 3 1 1 20m e2e-59443-w-4 3 3 1 1 20m e2e-59443-w-5 3 3 1 1 20m e2e-59443-w-6 3 3 1 1 20m e2e-59443-w-7 3 3 1 1 20m e2e-59443-w-8 3 3 1 1 20m ```
…arget So the target can be run in CI and fail in case files are not properly goimport formated.
…caler/cloudprovider/openshiftmachineapi
…erences in memory capacity The current comparator expects memory capacity values to be identical. However across AWS, Azure and GCP I quite often see very small differences in capacity, typically 8-16Ki. When this occurs the nodegroups are considered not equal when balancing is in effect which is unfortunate because, in reality, they are identical. This change will now tolerate a 128Ki difference before memory capacity values are considered unequal.
This reverts commit b8ec486.
Pull-request updated, HEAD is now 6ddbc80 |
1 similar comment
Pull-request updated, HEAD is now 6ddbc80 |
The following users are mentioned in OWNERS file(s) but are not members of the openshift org. Once all users have been added as members of the org, you can trigger verification by writing
|
@danielmellado: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
thanks @danielmellado! closing in favour of #120 |
/close |
@enxebre: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The PR was created by first taking upstream/cluster-autoscaler-release-1.16 as the base then applying UPSTREAM: patches on top. The set of picks applied was derived from: