Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPCLOUD-2195: rebase on upstream 1.28.0 release #265

Merged
merged 330 commits into from
Nov 1, 2023

Conversation

elmiko
Copy link

@elmiko elmiko commented Oct 26, 2023

This commit rebases the autoscaler on top of the Kubernetes/Autoscaler 1.28.0 release. There are several commits that we carry on top of the upstream autoscaler and the rebase process allows us to preserve those. Here is a description of the process I used to create this PR.

(inspired by the commit description for the 1.18 rebase. pr #139)

Process

First we need to identify the carry commits that we currently have, this is done against our previous rebase to catch new changes. Once identified we will drop commits which have merged upstream and only carry unique commits. (see below for the carried and dropped commits).

Identify carry commits (run from the openshift/master branch)

git log -n 20 --oneline --no-merges

this is slightly different from previous rebases in that the history has gotten slightly wonky and i needed to manually inspect the carry patches.

After identifying the carry commits, the next step is to create the new commit-tree that will be used for the rebase and then cherry pick the carry commits into the new branch. The following commands cover these steps:

$ git remote update # make sure we update our refs
$ git checkout cluster-autoscaler-1.28.0
$ git checkout -b merge-tmp # create a temporary branch for our merge commit
$ git checkout openshift/master # we want to be at the tip of the openshift master branch when we run the next command
$ echo 'merge upstream/cluster-autoscaler-1.28.0' | git commit-tree merge-tmp^{tree} -p HEAD -p merge-tmp -F - # create a new merge commit for our history
deadbeef12345678 # id of new merge commit
$ git branch merge-1.28.0 deadbeef12345678 # create a new branch for the cherry-pick work
$ git checkout merge-1.28.0
$ git cherry-pick <carry commits> # cherry pick the needed commits into the new branch

With the merge-1.28.0 branch in place, I cherry picked the carry commits which applied, resolved merge conflicts, and finally tested the resulting tree against the unit test and end-to-end suite.

Carried Commits

These commits are for features which have not yet been accepted upstream, are integral to our CI platform, or are specific to the releases we create for OpenShift.

65cfc6208 UPSTREAM: 6066: Allow overriding the kubernetes.io/arch label set by the scale from zero methods via a new cmdline arg
061558ed7 UPSTREAM: <carry>: revert capacity annotations
ef2f569b8 UPSTREAM: <carry>: add machine api label and taint functionality
a3f3bfeee UPSTREAM: <carry>: Add vendor for balancer
b68ce96cf UPSTREAM: <carry>: update os::util::list_test_packages_under
b0bb5aeb0 UPSTREAM: <carry>: remove deprecated go test command
461f25adc UPSTREAM: <carry>: Have VPA ignore phantom containers named "POD"
d9b6825ce UPSTREAM: <carry>: Handle old Machine API specific machine delete annotation
14235e4ef UPSTREAM: <carry>: Rename FailureMessage to ErrorMessage
530db8027 UPSTREAM: <carry>: configure repository for OpenShift releases

Dropped Commits

These commits were squashed into the 530db8027 commit.

9fe29aaf4 (HEAD -> merge-1.28.0, origin/merge-1.28.0) UPSTREAM: <carry>: Sync OWNERS file
103eecdbe UPSTREAM: <carry>: Updating vertical-pod-autoscaler images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/da6a5af1fa96762cc8410f09fc28c0457280f59a/images/vertical-pod-autoscaler.yml
1f7bf8e71 UPSTREAM: <carry>: Use mod vendor for go test script
af3ff6883 UPSTREAM: <carry>: Updating atomic-openshift-cluster-autoscaler images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/ed337d94faaaa3782fcccf3a54d9a2256ba2aa9b/images/atomic-openshift-cluster-autoscaler.yml
967e9185b UPSTREAM: <carry>: update cluster-autoscaler dockerfile

But, of special note in this rebase is this commit

c74af5632 UPSTREAM: <carry>: revert capacity annotations

due to the scale from zero changes being accepted upstream we can now drop our carried patch. but, the upstream implementation has differed slightly from our's (mainly around annotation names). we will need to carry this patch until we can fix all the providers to properly use the new annotations.

Also of special note

808e1c9ba UPSTREAM: 6066: Allow overriding the kubernetes.io/arch label set by the scale from zero methods via a new cmdline arg

this commit is bringing in functionality for setting a default architecture on scale from zero through the command line. unfortunately, the upstream did not like our proposal and we have changed the way this works in the upstream. we will take the updated version when 1.29 release is created for the autoscaler. until then, @aleskandro is monitoring this and has a patch ready for our update in 1.29.

k8s-ci-robot and others added 30 commits May 30, 2023 00:09
…loudProvider-Labelling

CA - Correct Cloudprovider PR labelling to area/provider/<provider name>
At the time of making this commit, the package `github.com/ghodss/yaml`
is no longer actively maintained.

`sigs.k8s.io/yaml` is a permanent fork of `ghodss/yaml` and is actively
maintained by Kubernetes SIG.

Signed-off-by: Eng Zer Jun <[email protected]>
…s_linode

Added the RBAC Permission to Linode.
[volcengine] fix(volcengine): don't build all providers when volcengine tag exists
…roposal

KEP for allowing Addon Resizer 1.8 to scale based on container count
Skip healthiness check for non-existing similar node groups
chore: replace `github.com/ghodss/yaml` with `sigs.k8s.io/yaml`
…formatting

Add whitespace in comments and lists
this change adds some logging at verbosity levels 2 and 3 to help
diagnose why the cluster-autoscaler does not consider 2 or more node
groups to be similar.
- this taint leads to unexpected behavior
- users expect CA to consider the taint when autoscaling

Signed-off-by: vadasambar <[email protected]>
Bumps golang from 1.20.4 to 1.20.5.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
cluster-autoscaler/cloudprovider/brightbox

Allow scaled workers to be built from an image name pattern as well as
an image id. This deals with long running clusters where the official
image is updated with security changes over time.
Only set the registry value in the local Makefile if it is missing
AEP for support of in-place updates for VPA
Fix: correcting RBAC permission in oci oke examples
Copy link

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine once we work out what's wrong with hypershift, though given this is a manual rebase, perhaps we should squash down some of the boilerplate setup commits into a single commit, I think there are 5 that could be squashed into the first setup commit

.ci-operator.yaml Outdated Show resolved Hide resolved
hack/test-go.sh Outdated Show resolved Hide resolved
vertical-pod-autoscaler/Dockerfile.rhel Outdated Show resolved Hide resolved
OWNERS Outdated Show resolved Hide resolved
@elmiko
Copy link
Author

elmiko commented Oct 30, 2023

i can go back and squash some of the process commits

elmiko and others added 10 commits October 30, 2023 15:08
This change carries files and modifications that are used by OpenShift
release infrastructure and related files.

* spec file
* dockerfiles
  * vertical-pod-autoscaler/Dockerfile.rhel
  * images/cluster-autoscaler/Dockerfile
  * images/cluster-autoscaler/Dockerfile.rhel
* hack scripts (ci and build related)
* Makefile
* JUnit tools
* update gitignore
* update/remove OWNERS files
* ci-operator config yaml
* update vendor folders
…otation

The delete annotation upstream has a different format, but is now
inferred dynamically from the API group. If we update this in MAO to use
the new format, we can drop this old key
this change removes the `go test -i ...` command from the
hack/test-go.sh file. this option is deprecated in the go command and
attempts to install dependencies of the test. when this is enabled with
the build tags we are using, it appears to make the test builds unstable
in CI.
this change adds all the cloudproviders except clusterapi to the list of
ignored directories in for the cluster-autoscaler tests. this is being
done so that the tests match our build tags and do not cause errors when
passing the package list to the test script.
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
the upstream annotations for the scale from zero capacity resources is
slighty different than the openshift implementation. the largest
difference is the addition of a gpu type annotation. openshift does not
yet utilize this annotation and thus this patch should be carried until
the machineset controllers for the various providers on openshift have
been modified to use the new annotations.

another important change is the modification of the memory annotation.
previously in openshift we expected this value to be a count of memory
in Mebibytes. the conversion function and tests have been modified to
allow continued openshift operation.

this change can be dropped when the annotations in openshift have been
updated, the progress for this effort can be followed at
https://issues.redhat.com/browse/OCPCLOUD-944
…the scale from zero methods via a new cmdline arg

The architecture label in the build generic labels method of the Cluster API provider is now populated using the cpu.GetDefaultScaleFromZeroArchitecture().Name() method.

The method allows users deploying the cluster-autoscaler to define the default architecture to be used by the cluster-autoscaler for scale up from zero via the --scale-up-from-zero-default-arch flag. Amd64 is kept as a fallback for historical reasons and to maintain compatibility with current providers.

The `SystemArchitecture` type and the functions for parsing architecture names are copied over from the GCE implementation, to be generic across all the defined cloud providers.

However, the GCE implementation and the other cloud providers are currently left untouched.

The introduced changes will not take into account the case of nodes heterogeneous in architecture. The labels generation to infer properties like the cpu architecture from the node groups' features should be considered cloud-provider/user specific.

Cloud providers willing to opt into this implementation are expected to change their manager code by replacing
the cloudprovider.DefaultArch constant with the GetDefaultScaleFromZeroArchitecture function exposed by this package.
Usually, this should be done in the function responsible for generating the generic labels of the given
cloud provider's manager code (for example, buildGenericLabels(.)) as follows:

func (...) buildGenericLabels(...) (map[string]string) {
	result := make(map[string]string)
	result[apiv1.LabelArchStable] = cpu.GetDefaultScaleFromZeroArchitecture().Name()
	// ...
}
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 30, 2023

@elmiko: This pull request references OCPCLOUD-2195 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.15.0" version, but no target version was set.

In response to this:

This commit rebases the autoscaler on top of the Kubernetes/Autoscaler 1.28.0 release. There are several commits that we carry on top of the upstream autoscaler and the rebase process allows us to preserve those. Here is a description of the process I used to create this PR.

(inspired by the commit description for the 1.18 rebase. pr #139)

Process

First we need to identify the carry commits that we currently have, this is done against our previous rebase to catch new changes. Once identified we will drop commits which have merged upstream and only carry unique commits. (see below for the carried and dropped commits).

Identify carry commits (run from the openshift/master branch)

git log -n 20 --oneline --no-merges

this is slightly different from previous rebases in that the history has gotten slightly wonky and i needed to manually inspect the carry patches.

After identifying the carry commits, the next step is to create the new commit-tree that will be used for the rebase and then cherry pick the carry commits into the new branch. The following commands cover these steps:

$ git remote update # make sure we update our refs
$ git checkout cluster-autoscaler-1.28.0
$ git checkout -b merge-tmp # create a temporary branch for our merge commit
$ git checkout openshift/master # we want to be at the tip of the openshift master branch when we run the next command
$ echo 'merge upstream/cluster-autoscaler-1.28.0' | git commit-tree merge-tmp^{tree} -p HEAD -p merge-tmp -F - # create a new merge commit for our history
deadbeef12345678 # id of new merge commit
$ git branch merge-1.28.0 deadbeef12345678 # create a new branch for the cherry-pick work
$ git checkout merge-1.28.0
$ git cherry-pick <carry commits> # cherry pick the needed commits into the new branch

With the merge-1.28.0 branch in place, I cherry picked the carry commits which applied, resolved merge conflicts, and finally tested the resulting tree against the unit test and end-to-end suite.

Carried Commits

These commits are for features which have not yet been accepted upstream, are integral to our CI platform, or are specific to the releases we create for OpenShift.

65cfc6208 UPSTREAM: 6066: Allow overriding the kubernetes.io/arch label set by the scale from zero methods via a new cmdline arg
061558ed7 UPSTREAM: <carry>: revert capacity annotations
ef2f569b8 UPSTREAM: <carry>: add machine api label and taint functionality
a3f3bfeee UPSTREAM: <carry>: Add vendor for balancer
b68ce96cf UPSTREAM: <carry>: update os::util::list_test_packages_under
b0bb5aeb0 UPSTREAM: <carry>: remove deprecated go test command
461f25adc UPSTREAM: <carry>: Have VPA ignore phantom containers named "POD"
d9b6825ce UPSTREAM: <carry>: Handle old Machine API specific machine delete annotation
14235e4ef UPSTREAM: <carry>: Rename FailureMessage to ErrorMessage
530db8027 UPSTREAM: <carry>: configure repository for OpenShift releases

Dropped Commits

These commits were squashed into the 530db8027 commit.

9fe29aaf4 (HEAD -> merge-1.28.0, origin/merge-1.28.0) UPSTREAM: <carry>: Sync OWNERS file
103eecdbe UPSTREAM: <carry>: Updating vertical-pod-autoscaler images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/da6a5af1fa96762cc8410f09fc28c0457280f59a/images/vertical-pod-autoscaler.yml
1f7bf8e71 UPSTREAM: <carry>: Use mod vendor for go test script
af3ff6883 UPSTREAM: <carry>: Updating atomic-openshift-cluster-autoscaler images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/ed337d94faaaa3782fcccf3a54d9a2256ba2aa9b/images/atomic-openshift-cluster-autoscaler.yml
967e9185b UPSTREAM: <carry>: update cluster-autoscaler dockerfile

But, of special note in this rebase is this commit

c74af5632 UPSTREAM: <carry>: revert capacity annotations

due to the scale from zero changes being accepted upstream we can now drop our carried patch. but, the upstream implementation has differed slightly from our's (mainly around annotation names). we will need to carry this patch until we can fix all the providers to properly use the new annotations.

Also of special note

808e1c9ba UPSTREAM: 6066: Allow overriding the kubernetes.io/arch label set by the scale from zero methods via a new cmdline arg

this commit is bringing in functionality for setting a default architecture on scale from zero through the command line. unfortunately, the upstream did not like our proposal and we have changed the way this works in the upstream. we will take the updated version when 1.29 release is created for the autoscaler. until then, @aleskandro is monitoring this and has a patch ready for our update in 1.29.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@elmiko
Copy link
Author

elmiko commented Oct 30, 2023

squashed some commits and updated the description

@JoelSpeed
Copy link

Changes look good to me, are we worried about the hypershift failure? Any confirmation from HyperShift if this is a known issue?
/test e2e-hypershift

@enxebre
Copy link
Member

enxebre commented Oct 31, 2023

/test e2e-hypershift

Copy link

openshift-ci bot commented Oct 31, 2023

@elmiko: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/git-history efea62a link false /test git-history

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@elmiko
Copy link
Author

elmiko commented Oct 31, 2023

not sure if the hypershift test flaked, @enxebre had mentioned making some fixes to CI. will run again just in case.

/test e2e-hypershift

@elmiko
Copy link
Author

elmiko commented Oct 31, 2023

looks like that did the trick, we just need labels and we are good to go.

@JoelSpeed ptal

@JoelSpeed
Copy link

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 1, 2023
Copy link

openshift-ci bot commented Nov 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 1, 2023
@openshift-ci openshift-ci bot merged commit f7e7200 into openshift:master Nov 1, 2023
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-vertical-pod-autoscaler-container-v4.15.0-202311140732.p0.gf7e7200.assembly.stream for distgit vertical-pod-autoscaler.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.