-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPCLOUD-2431: rebase on upstream 1.29.0 release #285
Conversation
Fix vpa-up.sh script by removing duplicate recommender
…port Adds support for startup taints and extends support for status taints
Signed-off-by: Eric Lin <[email protected]>
CA - Update k/k vendoring to 1.29.0-alpha.1
Signed-off-by: Eric Lin <[email protected]>
…belling CA - Update CAPI Cloudprovider PR labeling to existing label
Trim managedFields in shared informer factory
InitLogs unconditionally disables contextual logging, while ValidateAndApply checks the feature gate for that. InitLogs must come first, otherwise --feature-gates=ContextualLogging=true doesn't work.
fix log initialization
…le-dynamic Refactor scale-down for better integration with drainability rules
When PermissionToAddNode gets called without actually adding a new node, the node counter in the thresholdBasedEstimationLimiter gets out of sync with the actual number of new nodes. This can happen when the "is the last node empty" check triggers. The solution used here is to rearrange the checks so that PermissionToAddNode is followed by adding a new node. Alternatively, it might also be possible to pass the current number of nodes as parameter.
It's possible that an EC2 instance is in the Terminated state for multiple autoscaler loops, so to avoid trying to terminate it more than once and causing the bug seen for this initial fix, we include this check as well. Also, make the log more verbose and write out what state we found the instance in.
Update CA_with_AWS_IAM_OIDC.md
Fixing clutter-autoscaler docs to point to correct tutorial for auto-discovery
aws: check for all possible Terminated states
Bump default VPA version to 1.0.0 in master
Fix duplicate -addext when generating certificates with admission-controller/gencerts.sh
Extend BinpackingLimiter interface
@aleskandro i was looking at your pr as i was cherry-picking the commits, it seems like the code to support the environment variable is in here, do we need an extra patch? |
i read the CAO thread, i think we should just add the no-op commit as a drop here. |
Hey @elmiko, the operator is still setting the --scale-up-from-zero-default-arch parameter in the Args of the autoscaler's deployment, but with the rebase, this parameter is not defined anymore. I'm getting a doubt now as I don't remember anymore the tests that I did at the time: can the autoscaler accept unrecognised command line arguments? My commit linked above redefined the flag to let the autoscaler recognize it |
This change carries files and modifications that are used by OpenShift release infrastructure and related files. * spec file * dockerfiles * vertical-pod-autoscaler/Dockerfile.rhel * images/cluster-autoscaler/Dockerfile * images/cluster-autoscaler/Dockerfile.rhel * hack scripts (ci and build related) * Makefile * JUnit tools * update gitignore * update/remove OWNERS files * ci-operator config yaml * update vendor folders * add vendor folder for balancer * add Snyk file to exclude vendor directory on scan
…otation The delete annotation upstream has a different format, but is now inferred dynamically from the API group. If we update this in MAO to use the new format, we can drop this old key
this change removes the `go test -i ...` command from the hack/test-go.sh file. this option is deprecated in the go command and attempts to install dependencies of the test. when this is enabled with the build tags we are using, it appears to make the test builds unstable in CI.
this change adds all the cloudproviders except clusterapi to the list of ignored directories in for the cluster-autoscaler tests. this is being done so that the tests match our build tags and do not cause errors when passing the package list to the test script.
This change re-adds the machine api support for labels and taints on node groups. The code was removed upstream as it is openshift specific, see this pull request[0]. It also adds in the functionality of the upstream override annotation for labels and taints[1] to support https://issues.redhat.com/browse/MIXEDARCH-259 [0]: kubernetes#5249 [1]: kubernetes#5382
the upstream annotations for the scale from zero capacity resources is slighty different than the openshift implementation. the largest difference is the addition of a gpu type annotation. openshift does not yet utilize this annotation and thus this patch should be carried until the machineset controllers for the various providers on openshift have been modified to use the new annotations. another important change is the modification of the memory annotation. previously in openshift we expected this value to be a count of memory in Mebibytes. the conversion function and tests have been modified to allow continued openshift operation. this change can be dropped when the annotations in openshift have been updated, the progress for this effort can be followed at https://issues.redhat.com/browse/OCPCLOUD-944
This change adds the informer factory as an argument to the `buildCloudProvider` function for clusterapi so that building with tags will work properly.
adding a fix for the goimports failure |
looks like the failures are coming from the missing flag now, i will add that patch tomorrow. |
…ro-default-arch This commit reintroduces the flag --scale-up-from-zero-default-arch. It is a temporary workaround to allow a smoother transition from the downstream changes in openshift#261 to the upstream ones in kubernetes#6066. It can be removed after the cluster-autoscaler-operator is updated not to set this flag anymore. Note that the flag is a no-op because the upstream behavior is guaranteed by setting up the expected CAPI environment variable, as implemented in openshift/cluster-autoscaler-operator#297 See openshift/cluster-autoscaler-operator#311 for more details.
i added @aleskandro 's no-op patch as a drop |
@elmiko: This pull request references OCPCLOUD-2431 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@elmiko: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
tests looking good |
.snyk
Outdated
# https://docs.snyk.io/snyk-cli/commands/ignore | ||
exclude: | ||
global: | ||
- vendor/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs quoting, can we fix it up?
Fixup is minor |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cb4f7f2
into
openshift:master
[ART PR BUILD NOTIFIER] This PR has been included in build atomic-openshift-cluster-autoscaler-container-v4.16.0-202402021641.p0.gcb4f7f2.assembly.stream for distgit atomic-openshift-cluster-autoscaler. |
This commit rebases the autoscaler on top of the Kubernetes/Autoscaler 1.29.0 release. There are several commits that we carry on top of the upstream autoscaler and the rebase process allows us to preserve those. Here is a description of the process I used to create this PR.
(inspired by the commit description for the 1.18 rebase. pr #139)
Process
First we need to identify the carry commits that we currently have, this is done against our previous rebase to catch new changes. Once identified we will drop commits which have merged upstream and only carry unique commits. (see below for the carried and dropped commits).
Identify carry commits (run from the openshift/master branch), these are the commits that begin with
UPSTREAM:
up until the merge commit for the previous rebase commit (merge upstream/cluster-autoscaler-1.28.0
)After identifying the carry commits, the next step is to create the new commit-tree that will be used for the rebase and then cherry pick the carry commits into the new branch. The following commands cover these steps:
With the
merge-1.29
branch in place, I cherry picked the carry commits which applied, resolved merge conflicts, and finally tested the resulting tree against the unit test and end-to-end suite.Carried Commits
These commits are for features which have not yet been accepted upstream, are integral to our CI platform, or are specific to the releases we create for OpenShift.
Dropped Commits
These commits were squashed into the
fcc7648db
commit.This commit was dropped
Of special note in this rebase is this commit
due to the scale from zero changes being accepted upstream we can now drop our carried patch. but, the upstream implementation has differed slightly from our's (mainly around annotation names). we will need to carry this patch until we can fix all the providers to properly use the new annotations. This patch should be able to be dropped in the 4.17 release of openshift, see https://issues.redhat.com/browse/OCPCLOUD-944.