-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release blocking] gcbmgr stages are failing to rev release versions due to incorrect CI version marker #1080
Comments
Adding test-infra on-call: |
Just adding some additional context here from some of the investigation I have done. Comparing the
All of this occurs before Anyways, hope this can be helpful to anyone else investigating! |
@hasheddan -- I'm back to investigating and I think I've found the issue; just trying to confirm. Will be back with more details later. |
that tag is not on a branch comparing to release branches:
You can see That means the anago release on master is not pushing the release-specific commit to the master branch when releasing on master It is pushing the changelog commit ( And the tagged commit not existing on the master branch means that git describe goes back to the nearest tag on the branch, which will be |
Hehe, what @liggitt said! |
For release branches, we create an empty release commit to avoid potential ambiguous 'git describe' logic between the official release, 'x.y.z' and the next beta of that release branch, 'x.y.(z+1)-beta.0'. We avoid doing this empty release commit on 'master', as: - there is a potential for branch conflicts as upstream/master moves ahead - we're checking out a git ref, as opposed to a branch, which means the tag will detached from 'upstream/master' A side-effect of the tag being detached from 'master' is the primary build job (ci-kubernetes-build) will build as the previous alpha, instead of the assumed tag. This causes the next anago run against 'master' to fail due to an old build version. Example: 'v1.18.0-alpha.2.663+df908c3aad70be' (should instead be 'v1.18.0-alpha.3.<commits-since-tag>+<commit-ish>') ref: - kubernetes/issues/1020 - kubernetes/pull/1030 - kubernetes/issues/1080 - kubernetes/kubernetes/pull/88074 Signed-off-by: Stephen Augustus <[email protected]>
To step back and fill in some blanks on debugging for posterity... Where to start? $ ./gcbmgr stage master --buildversion=$(curl -L https://dl.k8s.io/ci/latest.txt)
So, let's test the assumption that the curl -L https://dl.k8s.io/ci/latest.txt This is a version marker produced from the So maybe something happened during one of the build jobs? Red herring: Originally, I looked at a snippet of the $ gsutil cp gs://kubernetes-jenkins/logs/ci-kubernetes-build/jobResultsCache.json . Here's a snippet of that: {
"buildnumber": "1224580578993508352",
"job-version": "v1.18.0-alpha.2.357+76c89645c5858a",
"version": "v1.18.0-alpha.2.357+76c89645c5858a",
"result": "SUCCESS",
"passed": true
},
{
"buildnumber": "1224595935300947968",
"job-version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5",
"version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5",
"result": "FAILURE",
"passed": false
},
{
"buildnumber": "1224642742374633474",
"job-version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5",
"version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5",
"result": "SUCCESS",
"passed": true
}, Checking https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build/1224595935300947968, we see that that job failed due to an unhealthy Prow node. But jobs have since run successfully, so maybe it's not that? What does Here's a snippet of its' Prow config: So we're running a bootstrap
Here's a snippet: Somewhere in that process we're grabbing/interpreting a version tag, but where? If we look at any of the build jobs, we see something like this: I0213 08:50:14.602] Call: bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
I0213 08:50:15.139] process 305 exited with code 0 after 0.0m
I0213 08:50:15.140] Start 1227877244496515073 at v1.18.0-alpha.5.27+c099585b10cb8c... ref: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build/1227877244496515073 (That's a more recent log; I'm just using it here to illustrate.)
The line we care about is: https://github.com/kubernetes/kubernetes/blob/c099585b10cb8cb66a4d9e19b7fdfe910be56c30/hack/lib/version.sh#L69 When I ran this portion of the build scenario locally, here's what I got: $ git checkout master
Switched to branch 'master'
$ git reset --hard upstream/master
HEAD is now at df908c3aad7 Merge pull request #87975 from SataQiu/kubeadm-20200210
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.18.0-alpha.2.663+df908c3aad70be $ git checkout release-1.17
Switched to branch 'release-1.17'
Your branch is up to date with 'upstream/release-1.17'.
$ git reset --hard upstream/release-1.17
HEAD is now at aa9a28326e8 Update CHANGELOG/CHANGELOG-1.17.md for v1.17.3
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.17.4-beta.0.1+aa9a28326e8075 $ git checkout release-1.16
^[[ASwitched to branch 'release-1.16'
Your branch is up to date with 'upstream/release-1.16'.
$ git reset --hard upstream/release-1.16
HEAD is now at abdce0eac9e Update CHANGELOG/CHANGELOG-1.16.md for v1.16.7
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.16.8-beta.0.1+abdce0eac9e732 $ git checkout release-1.15
Switched to branch 'release-1.15'
Your branch is behind 'upstream/release-1.15' by 29 commits, and can be fast-forwarded.
(use "git pull" to update your local branch)
$ git reset --hard upstream/release-1.15
HEAD is now at 3b43c8064a3 Update CHANGELOG/CHANGELOG-1.15.md for v1.15.10
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.15.11-beta.0.1+3b43c8064a328d The form of the tag resulting from v<major>.<minor>.<patch>.<commits-since-tag>+<commit-ish> So we know that I was thinking we might need a manual tag ( This was about the time I started chatting with @liggitt. The root cause of this was #1030, which introduced unconditionally creating an empty release commit and tagging that. The goal was ensure that in instances where anago produces two releases (when run on release branches), we would have a unique release commit, to prevent the out-of-order releases we had at the end of last year. We discovered that releases cut from #1096 fixes this to only create empty release commits on release branches. cc: @kubernetes/release-engineering |
Both kubernetes/sig-release#985 and https://kubernetes.slack.com/archives/CJH2GBF7Y/p1581532987106700?thread_ts=1581532800.106100&cid=CJH2GBF7Y contain scrollback from the manual tag push of $ git remote add danger [email protected]:kubernetes/kubernetes.git
$ git remote update
Fetching origin
Fetching upstream
Fetching danger
From github.com:kubernetes/kubernetes
* [new branch] feature-rate-limiting -> danger/feature-rate-limiting
* [new branch] feature-serverside-apply -> danger/feature-serverside-apply
* [new branch] feature-workload-ga -> danger/feature-workload-ga
* [new branch] master -> danger/master
* [new branch] release-0.10 -> danger/release-0.10
* [new branch] release-0.12 -> danger/release-0.12
* [new branch] release-0.13 -> danger/release-0.13
* [new branch] release-0.14 -> danger/release-0.14
* [new branch] release-0.15 -> danger/release-0.15
* [new branch] release-0.16 -> danger/release-0.16
* [new branch] release-0.17 -> danger/release-0.17
* [new branch] release-0.18 -> danger/release-0.18
* [new branch] release-0.19 -> danger/release-0.19
* [new branch] release-0.20 -> danger/release-0.20
* [new branch] release-0.21 -> danger/release-0.21
* [new branch] release-0.4 -> danger/release-0.4
* [new branch] release-0.5 -> danger/release-0.5
* [new branch] release-0.6 -> danger/release-0.6
* [new branch] release-0.7 -> danger/release-0.7
* [new branch] release-0.8 -> danger/release-0.8
* [new branch] release-0.9 -> danger/release-0.9
* [new branch] release-1.0 -> danger/release-1.0
* [new branch] release-1.1 -> danger/release-1.1
* [new branch] release-1.10 -> danger/release-1.10
* [new branch] release-1.11 -> danger/release-1.11
* [new branch] release-1.12 -> danger/release-1.12
* [new branch] release-1.13 -> danger/release-1.13
* [new branch] release-1.14 -> danger/release-1.14
* [new branch] release-1.15 -> danger/release-1.15
* [new branch] release-1.16 -> danger/release-1.16
* [new branch] release-1.17 -> danger/release-1.17
* [new branch] release-1.2 -> danger/release-1.2
* [new branch] release-1.3 -> danger/release-1.3
* [new branch] release-1.4 -> danger/release-1.4
* [new branch] release-1.5 -> danger/release-1.5
* [new branch] release-1.6 -> danger/release-1.6
* [new branch] release-1.6.3 -> danger/release-1.6.3
* [new branch] release-1.7 -> danger/release-1.7
* [new branch] release-1.8 -> danger/release-1.8
* [new branch] release-1.9 -> danger/release-1.9
$ git checkout master
Already on 'master'
$ git reset --hard danger/master
HEAD is now at 50c8f73a4b2 Merge pull request #88017 from feiskyer/fix-409
$ git tag -s v1.18.0-alpha.4 HEAD^
$ git push --dry-run danger v1.18.0-alpha.4
To github.com:kubernetes/kubernetes.git
* [new tag] v1.18.0-alpha.4 -> v1.18.0-alpha.4
$ git push danger v1.18.0-alpha.4
Enumerating objects: 21957, done.
Counting objects: 100% (9472/9472), done.
Delta compression using up to 12 threads
Compressing objects: 100% (2077/2077), done.
Writing objects: 100% (6165/6165), 1.85 MiB | 2.28 MiB/s, done.
Total 6165 (delta 4734), reused 5043 (delta 3969)
remote: Resolving deltas: 100% (4734/4734), completed with 1486 local objects.
To github.com:kubernetes/kubernetes.git
* [new tag] v1.18.0-alpha.4 -> v1.18.0-alpha.4 Note the /close |
@justaugustus: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Release blocker for
v1.18.0-alpha.4
What happened:
(Discovered in #1077 (comment))
While testing a change to
lib/release
, I noticed that stage jobs were failing because thebuildversion
was stale./assign @justaugustus @hasheddan
/priority critical-urgent
/milestone v1.18
cc: @kubernetes/release-engineering @kubernetes/release-team
What you expected to happen:
Staging jobs pass
How to reproduce it (as minimally and precisely as possible):
Example 1 (against PR branch)
$ RELEASE_TOOL_REPO=https://github.com/justaugustus/release.git \ RELEASE_TOOL_BRANCH=anago-params \ ./gcbmgr stage master --buildversion=$(curl -L https://dl.k8s.io/ci/latest.txt)
Example 2 (against
master
)$ ./gcbmgr stage master --buildversion=$(curl -L https://dl.k8s.io/ci/latest.txt)
Anything else we need to know?:
I'm in the process of investigating...
I have a sneaking suspicion this is has to do with a failed run of
ci-kubernetes-build
onv1.18.0-alpha.3
's release day:Environment:
cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: