Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release blocking] gcbmgr stages are failing to rev release versions due to incorrect CI version marker #1080

Closed
justaugustus opened this issue Feb 7, 2020 · 8 comments
Assignees
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/release Categorizes an issue or PR as relevant to SIG Release.
Milestone

Comments

@justaugustus
Copy link
Member

Release blocker for v1.18.0-alpha.4

What happened:
(Discovered in #1077 (comment))
While testing a change to lib/release, I noticed that stage jobs were failing because the buildversion was stale.

/assign @justaugustus @hasheddan
/priority critical-urgent
/milestone v1.18
cc: @kubernetes/release-engineering @kubernetes/release-team

What you expected to happen:
Staging jobs pass

How to reproduce it (as minimally and precisely as possible):

Example 1 (against PR branch)

$ RELEASE_TOOL_REPO=https://github.com/justaugustus/release.git \
RELEASE_TOOL_BRANCH=anago-params \
./gcbmgr stage master --buildversion=$(curl -L https://dl.k8s.io/ci/latest.txt)
Step #2: ================================================================================
Step #2: PREPARE AND TAG TREE alpha (5/6)
Step #2: ================================================================================
Step #2: 
Step #2: The v1.18.0-alpha.3 tag already exists!
Step #2: Possible reasons for this:
Step #2: * --buildversion is old.
Step #2: * /workspace/anago-v1.18.0-alpha.2.477+64ba0bf3d6630c is unclean
Step #2: [2020-Feb-07 17:32:57 UTC] prepare_tree in 0s
Step #2: FAILED in prepare_tree.
Step #2: 
Step #2: RELEASE INCOMPLETE! Exiting...

Example 2 (against master)

$ ./gcbmgr stage master --buildversion=$(curl -L https://dl.k8s.io/ci/latest.txt)
Step #2: ================================================================================
Step #2: PREPARE AND TAG TREE alpha (5/6)
Step #2: ================================================================================
Step #2: 
Step #2: The v1.18.0-alpha.3 tag already exists!
Step #2: Possible reasons for this:
Step #2: * --buildversion is old.
Step #2: * /workspace/anago-v1.18.0-alpha.2.477+64ba0bf3d6630c is unclean
Step #2: [2020-Feb-07 17:45:53 UTC] prepare_tree in 0s
Step #2: FAILED in prepare_tree.
Step #2: 
Step #2: RELEASE INCOMPLETE! Exiting...

Anything else we need to know?:
I'm in the process of investigating...
I have a sneaking suspicion this is has to do with a failed run of ci-kubernetes-build on v1.18.0-alpha.3's release day:

Test started last Tuesday at 2:30 AM failed after 3h0m22s. (less info)

The job may have executed on an unhealthy node. Contact your prow maintainers with a link to this page or check the detailed pod information.

Status	FAILURE
Started	2020-02-04 07:30:52 +0000 UTC
Elapsed	3h0m22s
infra-commit	a98df39a6
links.resultstore.url	https://source.cloud.google.com/results/invocations/a5dfb0f2-4c10-4a53-b6b3-34dab92dc0f4/targets/test
node	gke-prow-default-pool-cf4891d4-p7gn
pod	18cdb4c4-4720-11ea-b8d7-32e01c04da64
repo	k8s.io/kubernetes
repo-commit	d52ecd5f70cdf5f13f919bab56cf08cd556a2e26
repos.k8s.io/kubernetes	master
repos.k8s.io/release	master
resultstore	https://source.cloud.google.com/results/invocations/a5dfb0f2-4c10-4a53-b6b3-34dab92dc0f4/targets/test
W0204 07:30:52.335] **************************************************************************
bootstrap.py is deprecated!
test-infra oncall does not support any job still using bootstrap.py.
Please migrate your job to podutils!
https://github.com/kubernetes/test-infra/blob/master/prow/pod-utilities.md
**************************************************************************
I0204 07:30:52.336] Args: --job=ci-kubernetes-build --service-account=/etc/service-account/service-account.json --upload=gs://kubernetes-jenkins/logs --repo=k8s.io/kubernetes --repo=k8s.io/release --root=/go/src --timeout=180 --scenario=kubernetes_build -- --allow-dup --extra-publish-file=k8s-master --hyperkube --registry=gcr.io/kubernetes-ci-images
I0204 07:30:52.337] Bootstrap ci-kubernetes-build...
I0204 07:30:52.359] Builder: gke-prow-default-pool-cf4891d4-p7gn
I0204 07:30:52.360] Image: gcr.io/k8s-testimages/bootstrap:v20200124-81ee414
I0204 07:30:52.360] Gubernator results at https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-build/1224595935300947968
I0204 07:30:52.360] Call:  gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json
W0204 07:30:54.187] Activated service account credentials for: [[email protected]]
I0204 07:30:54.434] process 208 exited with code 0 after 0.0m
I0204 07:30:54.435] Call:  gcloud config get-value account
I0204 07:30:56.099] process 221 exited with code 0 after 0.0m
I0204 07:30:56.100] Will upload results to gs://kubernetes-jenkins/logs using [email protected]
I0204 07:30:56.100] Root: /go/src
I0204 07:30:56.100] cd to /go/src
I0204 07:30:56.100] Checkout: /go/src/k8s.io/kubernetes master to /go/src/k8s.io/kubernetes
I0204 07:30:56.101] Call:  git init k8s.io/kubernetes
I0204 07:30:56.140] Initialized empty Git repository in /go/src/k8s.io/kubernetes/.git/
I0204 07:30:56.141] process 234 exited with code 0 after 0.0m
I0204 07:30:56.141] Call:  git config --local user.name 'K8S Bootstrap'
I0204 07:30:56.177] process 235 exited with code 0 after 0.0m
I0204 07:30:56.178] Call:  git config --local user.email k8s_bootstrap@localhost
I0204 07:30:56.195] process 236 exited with code 0 after 0.0m
I0204 07:30:56.197] Call:  git fetch --quiet --tags https://github.com/kubernetes/kubernetes master
I0204 07:33:25.624] process 237 exited with code 0 after 2.5m
I0204 07:33:25.625] Call:  git checkout -B test FETCH_HEAD
W0204 07:33:39.572] Switched to a new branch 'test'
I0204 07:33:40.395] process 248 exited with code 0 after 0.2m
I0204 07:33:40.396] Call:  git show -s --format=format:%ct HEAD
I0204 07:33:40.450] process 249 exited with code 0 after 0.0m
I0204 07:33:40.451] Checkout: /go/src/k8s.io/release master to /go/src/k8s.io/release
I0204 07:33:40.451] Call:  git init k8s.io/release
I0204 07:33:40.481] Initialized empty Git repository in /go/src/k8s.io/release/.git/
I0204 07:33:40.490] process 250 exited with code 0 after 0.0m
I0204 07:33:40.491] Call:  git config --local user.name 'K8S Bootstrap'
I0204 07:33:40.530] process 251 exited with code 0 after 0.0m
I0204 07:33:40.531] Call:  git config --local user.email k8s_bootstrap@localhost
I0204 07:33:40.578] process 252 exited with code 0 after 0.0m
I0204 07:33:40.579] Call:  git fetch --quiet --tags https://github.com/kubernetes/release master
I0204 07:33:43.405] process 253 exited with code 0 after 0.0m
I0204 07:33:43.405] Call:  git checkout -B test FETCH_HEAD
W0204 07:33:43.541] Switched to a new branch 'test'
I0204 07:33:43.561] process 264 exited with code 0 after 0.0m
I0204 07:33:43.562] Call:  git show -s --format=format:%ct HEAD
I0204 07:33:43.604] process 265 exited with code 0 after 0.0m
I0204 07:33:43.604] Configure environment...
I0204 07:33:43.605] Call:  git show -s --format=format:%ct HEAD
I0204 07:33:43.673] process 266 exited with code 0 after 0.0m
I0204 07:33:43.673] Call:  gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json
W0204 07:33:45.778] Activated service account credentials for: [[email protected]]
I0204 07:33:47.131] process 267 exited with code 0 after 0.1m
I0204 07:33:47.131] Call:  gcloud config get-value account
I0204 07:33:48.764] process 280 exited with code 0 after 0.0m
I0204 07:33:48.765] Will upload results to gs://kubernetes-jenkins/logs using [email protected]
I0204 07:33:48.765] Call:  bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
I0204 07:33:49.831] process 293 exited with code 0 after 0.0m
I0204 07:33:49.831] Start 1224595935300947968 at v1.18.0-alpha.2.360+d52ecd5f70cdf5...
I0204 07:33:49.834] Call:  gsutil -q -h Content-Type:application/json cp /tmp/gsutil_ovx2Uh gs://kubernetes-jenkins/logs/ci-kubernetes-build/1224595935300947968/started.json
I0204 07:33:57.590] process 326 exited with code 0 after 0.1m
I0204 07:33:57.591] Call:  /workspace/./test-infra/jenkins/../scenarios/kubernetes_build.py --allow-dup --extra-publish-file=k8s-master --hyperkube --registry=gcr.io/kubernetes-ci-images
W0204 07:33:57.752] Run: ('hack/print-workspace-status.sh',)
W0204 07:33:58.679] Run: ('gsutil', 'ls', 'gs://kubernetes-release-dev/ci/v1.18.0-alpha.2.360+d52ecd5f70cdf5')
W0204 07:34:04.053] gcs path gs://kubernetes-release-dev/ci/v1.18.0-alpha.2.360+d52ecd5f70cdf5 (or some files under it) does not exist yet, continue
W0204 07:34:04.054] Run: ('make', 'clean')
I0204 07:34:04.539] +++ [0204 07:34:04] Verifying Prerequisites....
W0204 07:34:11.954] Run: ('make', 'release')
I0204 07:34:12.459] +++ [0204 07:34:12] Verifying Prerequisites....
I0204 07:34:17.735] +++ [0204 07:34:17] Building Docker image kube-build:build-dfb435d4da-5-v1.13.6-1
I0204 07:51:53.462] +++ [0204 07:51:53] Creating data container kube-build-data-dfb435d4da-5-v1.13.6-1
I0204 07:53:50.202] +++ [0204 07:53:50] Syncing sources to container
I0204 07:55:15.867] +++ [0204 07:55:15] Running build command...
I0204 07:56:15.396] +++ [0204 07:56:15] Building go targets for linux/amd64:
I0204 07:56:15.398]     ./vendor/k8s.io/code-generator/cmd/deepcopy-gen
I0204 07:57:00.893] +++ [0204 07:57:00] Building go targets for linux/amd64:
I0204 07:57:00.893]     ./vendor/k8s.io/code-generator/cmd/defaulter-gen
I0204 07:57:34.824] +++ [0204 07:57:34] Building go targets for linux/amd64:
I0204 07:57:34.824]     ./vendor/k8s.io/code-generator/cmd/conversion-gen
I0204 07:58:23.367] +++ [0204 07:58:23] Building go targets for linux/amd64:
I0204 07:58:23.368]     ./vendor/k8s.io/kube-openapi/cmd/openapi-gen
I0204 07:59:20.384] +++ [0204 07:59:20] Building go targets for linux/amd64:
I0204 07:59:20.386]     ./vendor/github.com/go-bindata/go-bindata/go-bindata
I0204 07:59:31.088] +++ [0204 07:59:31] Multiple platforms requested and available 43G >= threshold 40G, building platforms in parallel
I0204 07:59:31.113] +++ [0204 07:59:31] Building go targets for {linux/amd64 linux/arm linux/arm64 linux/s390x linux/ppc64le} in parallel (output will appear in a burst when complete):
I0204 07:59:31.114]     cmd/kube-proxy
I0204 07:59:31.115]     cmd/kube-apiserver
I0204 07:59:31.115]     cmd/kube-controller-manager
I0204 07:59:31.115]     cmd/kubelet
I0204 07:59:31.115]     cmd/kubeadm
I0204 07:59:31.115]     cmd/kube-scheduler
I0204 07:59:31.115]     vendor/k8s.io/apiextensions-apiserver
I0204 07:59:31.115]     cluster/gce/gci/mounter
I0204 08:41:38.419] +++ [0204 07:59:31] linux/amd64: build started
I0204 08:41:38.429] +++ [0204 08:38:28] linux/amd64: build finished
I0204 08:41:38.431] +++ [0204 07:59:31] linux/arm: build started
I0204 08:41:38.431] +++ [0204 08:41:22] linux/arm: build finished
I0204 08:41:38.445] +++ [0204 07:59:31] linux/arm64: build started
I0204 08:41:38.445] +++ [0204 08:41:27] linux/arm64: build finished
I0204 08:41:38.459] +++ [0204 07:59:31] linux/s390x: build started
I0204 08:41:38.459] +++ [0204 08:40:36] linux/s390x: build finished
I0204 08:41:38.464] +++ [0204 07:59:31] linux/ppc64le: build started
I0204 08:41:38.464] +++ [0204 08:41:38] linux/ppc64le: build finished
I0204 08:47:24.508] +++ [0204 08:47:24] Multiple platforms requested and available 43G >= threshold 40G, building platforms in parallel
I0204 08:47:24.528] +++ [0204 08:47:24] Building go targets for {linux/amd64 linux/arm linux/arm64 linux/s390x linux/ppc64le windows/amd64} in parallel (output will appear in a burst when complete):
I0204 08:47:24.529]     cmd/kube-proxy
I0204 08:47:24.529]     cmd/kubeadm
I0204 08:47:24.529]     cmd/kubelet
I0204 08:53:27.172] +++ [0204 08:47:24] linux/amd64: build started
I0204 08:53:27.172] +++ [0204 08:47:41] linux/amd64: build finished
I0204 08:53:27.192] +++ [0204 08:47:24] linux/arm: build started
I0204 08:53:27.193] +++ [0204 08:47:41] linux/arm: build finished
I0204 08:53:27.203] +++ [0204 08:47:24] linux/arm64: build started
I0204 08:53:27.203] +++ [0204 08:47:41] linux/arm64: build finished
I0204 08:53:27.217] +++ [0204 08:47:24] linux/s390x: build started
I0204 08:53:27.220] +++ [0204 08:47:41] linux/s390x: build finished
I0204 08:53:27.228] +++ [0204 08:47:24] linux/ppc64le: build started
I0204 08:53:27.228] +++ [0204 08:47:41] linux/ppc64le: build finished
I0204 08:53:27.229] +++ [0204 08:47:24] windows/amd64: build started
I0204 08:53:27.229] +++ [0204 08:53:27] windows/amd64: build finished
I0204 08:54:21.531] +++ [0204 08:54:21] Multiple platforms requested and available 43G >= threshold 40G, building platforms in parallel
I0204 08:54:21.559] +++ [0204 08:54:21] Building go targets for {linux/amd64 linux/386 linux/arm linux/arm64 linux/s390x linux/ppc64le darwin/amd64 darwin/386 windows/amd64 windows/386} in parallel (output will appear in a burst when complete):
I0204 08:54:21.559]     cmd/kubectl
I0204 09:02:52.105] +++ [0204 08:54:21] linux/amd64: build started
I0204 09:02:52.107] +++ [0204 09:01:17] linux/amd64: build finished
I0204 09:02:52.128] +++ [0204 08:54:21] linux/386: build started
I0204 09:02:52.129] +++ [0204 09:02:48] linux/386: build finished
I0204 09:02:52.139] +++ [0204 08:54:21] linux/arm: build started
I0204 09:02:52.139] +++ [0204 09:01:13] linux/arm: build finished
I0204 09:02:52.154] +++ [0204 08:54:21] linux/arm64: build started
I0204 09:02:52.156] +++ [0204 09:01:10] linux/arm64: build finished
I0204 09:02:52.159] +++ [0204 08:54:21] linux/s390x: build started
I0204 09:02:52.160] +++ [0204 09:01:13] linux/s390x: build finished
I0204 09:02:52.173] +++ [0204 08:54:21] linux/ppc64le: build started
I0204 09:02:52.174] +++ [0204 09:01:12] linux/ppc64le: build finished
I0204 09:02:52.179] +++ [0204 08:54:21] darwin/amd64: build started
I0204 09:02:52.181] +++ [0204 09:02:49] darwin/amd64: build finished
I0204 09:02:52.183] +++ [0204 08:54:21] darwin/386: build started
I0204 09:02:52.184] +++ [0204 09:02:52] darwin/386: build finished
I0204 09:02:52.212] +++ [0204 08:54:21] windows/amd64: build started
I0204 09:02:52.212] +++ [0204 09:01:23] windows/amd64: build finished
I0204 09:02:52.216] +++ [0204 08:54:21] windows/386: build started
I0204 09:02:52.216] +++ [0204 09:02:49] windows/386: build finished
I0204 09:04:02.901] +++ [0204 09:04:02] Multiple platforms requested and available 43G >= threshold 40G, building platforms in parallel
I0204 09:04:02.916] +++ [0204 09:04:02] Building go targets for {linux/amd64 linux/arm linux/arm64 linux/s390x linux/ppc64le darwin/amd64 windows/amd64} in parallel (output will appear in a burst when complete):
I0204 09:04:02.919]     cmd/gendocs
I0204 09:04:02.919]     cmd/genkubedocs
I0204 09:04:02.919]     cmd/genman
I0204 09:04:02.919]     cmd/genyaml
I0204 09:04:02.919]     cmd/genswaggertypedocs
I0204 09:04:02.919]     cmd/linkcheck
I0204 09:04:02.919]     vendor/github.com/onsi/ginkgo/ginkgo
I0204 09:04:02.919]     test/e2e/e2e.test
I0204 09:04:02.920]     cluster/images/conformance/go-runner
I0204 10:03:56.863] +++ [0204 09:04:03] linux/amd64: build started
I0204 10:03:56.871] +++ [0204 10:03:13] linux/amd64: build finished
I0204 10:03:56.872] +++ [0204 09:04:03] linux/arm: build started
I0204 10:03:56.873] +++ [0204 10:03:48] linux/arm: build finished
I0204 10:03:56.891] +++ [0204 09:04:03] linux/arm64: build started
I0204 10:03:56.892] +++ [0204 10:00:10] linux/arm64: build finished
I0204 10:03:56.894] +++ [0204 09:04:03] linux/s390x: build started
I0204 10:03:56.894] +++ [0204 10:03:45] linux/s390x: build finished
I0204 10:03:56.910] +++ [0204 09:04:03] linux/ppc64le: build started
I0204 10:03:56.911] +++ [0204 10:03:56] linux/ppc64le: build finished
I0204 10:03:56.925] +++ [0204 09:04:03] darwin/amd64: build started
I0204 10:03:56.928] +++ [0204 10:03:52] darwin/amd64: build finished
I0204 10:03:56.939] +++ [0204 09:04:03] windows/amd64: build started
I0204 10:03:56.940] +++ [0204 10:03:20] windows/amd64: build finished
I0204 10:13:44.535] +++ [0204 10:13:44] Multiple platforms requested and available 41G >= threshold 40G, building platforms in parallel
I0204 10:13:44.568] +++ [0204 10:13:44] Building go targets for {linux/amd64 linux/arm linux/arm64 linux/s390x linux/ppc64le} in parallel (output will appear in a burst when complete):
I0204 10:13:44.569]     cmd/kubemark
I0204 10:13:44.569]     vendor/github.com/onsi/ginkgo/ginkgo
I0204 10:13:44.570]     test/e2e_node/e2e_node.test
I0204 10:25:36.682] +++ [0204 10:13:44] linux/amd64: build started
I0204 10:25:36.683] +++ [0204 10:24:28] linux/amd64: build finished
I0204 10:25:36.695] +++ [0204 10:13:44] linux/arm: build started
I0204 10:25:36.695] +++ [0204 10:23:26] linux/arm: build finished
I0204 10:25:36.712] +++ [0204 10:13:44] linux/arm64: build started
I0204 10:25:36.716] +++ [0204 10:25:36] linux/arm64: build finished
I0204 10:25:36.723] +++ [0204 10:13:44] linux/s390x: build started
I0204 10:25:36.723] +++ [0204 10:23:39] linux/s390x: build finished
I0204 10:25:36.739] +++ [0204 10:13:44] linux/ppc64le: build started
I0204 10:25:36.739] +++ [0204 10:22:30] linux/ppc64le: build finished
I0204 10:30:52.411] Terminate 507 on timeout
E0204 10:30:52.527] Build timed out
E0204 10:30:52.527] Command failed
I0204 10:30:52.527] process 507 exited with code -15 after 176.9m
E0204 10:30:52.528] FAIL: ci-kubernetes-build
I0204 10:30:52.536] Call:  gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json
W0204 10:30:56.281] Activated service account credentials for: [[email protected]]
I0204 10:30:56.532] process 298579 exited with code 0 after 0.1m
I0204 10:30:56.532] Call:  gcloud config get-value account
I0204 10:30:58.124] process 298593 exited with code 0 after 0.0m
I0204 10:30:58.124] Will upload results to gs://kubernetes-jenkins/logs using [email protected]
I0204 10:30:58.125] Upload result and artifacts...
I0204 10:30:58.125] Gubernator results at https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-build/1224595935300947968
W0204 10:30:58.129] Missing local artifacts : /workspace/_artifacts
W0204 10:30:58.129] metadata path /workspace/_artifacts/metadata.json does not exist
W0204 10:30:58.130] metadata not found or invalid, init with empty metadata
I0204 10:30:58.131] Call:  git rev-parse HEAD
I0204 10:30:58.402] process 298606 exited with code 0 after 0.0m
I0204 10:30:58.406] Call:  git rev-parse HEAD
I0204 10:30:58.485] process 298607 exited with code 0 after 0.0m
I0204 10:30:58.488] Call:  gsutil stat gs://kubernetes-jenkins/logs/ci-kubernetes-build/jobResultsCache.json
I0204 10:31:04.771] process 298608 exited with code 0 after 0.1m
I0204 10:31:04.772] Call:  gsutil -q cat 'gs://kubernetes-jenkins/logs/ci-kubernetes-build/jobResultsCache.json#1580797871931588'
I0204 10:31:09.236] process 298759 exited with code 0 after 0.1m
I0204 10:31:09.255] Call:  gsutil -q -h Content-Type:application/json -h x-goog-if-generation-match:1580797871931588 cp /tmp/gsutil_Z9Ff7F gs://kubernetes-jenkins/logs/ci-kubernetes-build/jobResultsCache.json
I0204 10:31:14.992] process 298908 exited with code 0 after 0.1m
I0204 10:31:14.994] Call:  gsutil -q -h Content-Type:application/json cp /tmp/gsutil_Va1IX4 gs://kubernetes-jenkins/logs/ci-kubernetes-build/1224595935300947968/finished.json
I0204 10:31:20.328] process 299090 exited with code 0 after 0.1m
I0204 10:31:20.329] Call:  gsutil -q -h Content-Type:text/plain -h 'Cache-Control:private, max-age=0, no-transform' cp /tmp/gsutil_od_Bg9 gs://kubernetes-jenkins/logs/ci-kubernetes-build/latest-build.txt
I0204 10:31:26.471] process 299276 exited with code 0 after 0.1m

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Others:
@justaugustus justaugustus added kind/bug Categorizes issue or PR as related to a bug. sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject labels Feb 7, 2020
@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 7, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 7, 2020
@justaugustus
Copy link
Member Author

Adding test-infra on-call:
/assign @BenTheElder

@hasheddan
Copy link
Contributor

Just adding some additional context here from some of the investigation I have done. Comparing the ci-kubernetes-build run mentioned above to a successful one, this is the next steps we would want to see: ci-kubernetes-build/1227491124717817856

I0212 08:04:51.309] +++ [0212 08:04:51] Syncing out of container
I0212 08:06:32.567] +++ [0212 08:06:32] Building tarball: manifests
I0212 08:06:32.567] +++ [0212 08:06:32] Building tarball: src
I0212 08:06:32.568] +++ [0212 08:06:32] Starting tarball: client darwin-386
I0212 08:06:32.572] +++ [0212 08:06:32] Starting tarball: client darwin-amd64
I0212 08:06:32.575] +++ [0212 08:06:32] Starting tarball: client linux-386
I0212 08:06:32.579] +++ [0212 08:06:32] Starting tarball: client linux-amd64
I0212 08:06:32.584] +++ [0212 08:06:32] Starting tarball: client linux-arm
I0212 08:06:32.591] +++ [0212 08:06:32] Starting tarball: client linux-arm64
I0212 08:06:32.598] +++ [0212 08:06:32] Starting tarball: client linux-ppc64le
I0212 08:06:32.630] +++ [0212 08:06:32] Starting tarball: client linux-s390x
I0212 08:06:32.645] +++ [0212 08:06:32] Starting tarball: client windows-386
I0212 08:06:32.690] +++ [0212 08:06:32] Starting tarball: client windows-amd64
I0212 08:06:32.711] +++ [0212 08:06:32] Waiting on tarballs

All of this occurs before push-build actually runs, so could it just be possible that we experienced constrained resources on that run? I believe this then may be affecting subsequent runs of push-build, but I haven't figured out how yet. I was suspicious that somehow --noupdatelatest was being set somewhere, but that does not appear to be the case. It appears that the build itself is not using the most recent git tag, so we are seeing the latest as v1.18.0-alpha.3-*.

Anyways, hope this can be helpful to anyone else investigating!

@justaugustus
Copy link
Member Author

@hasheddan -- I'm back to investigating and I think I've found the issue; just trying to confirm. Will be back with more details later.

@liggitt
Copy link
Member

liggitt commented Feb 12, 2020

$ git show v1.18.0-alpha.3
tag v1.18.0-alpha.3
Tagger: Anago GCB <[email protected]>
Date:   Tue Feb 4 16:31:11 2020 +0000
Kubernetes alpha release v1.18.0-alpha.3
commit f9534a304b499a241de06a2bbe360b44dff5250a (tag: v1.18.0-alpha.3)
Author: Anago GCB <[email protected]>
Date:   Tue Feb 4 16:31:11 2020 +0000
    Release commit for Kubernetes v1.18.0-alpha.3

$ git branch --contains f9534a304b499a241de06a2bbe360b44dff5250a
<no output>

that tag is not on a branch

comparing to release branches:

git log --format=oneline upstream/release-1.17
aa9a28326e80753059d5a8f1694ae5206aac75b9 (upstream/release-1.17) Update CHANGELOG/CHANGELOG-1.17.md for v1.17.3
4c488955a745ba911da6918a3573cbf073325cab (tag: v1.17.4-beta.0) Release commit for Kubernetes v1.17.4-beta.0
06ad960bfd03b39c8310aaf92d1e7c12ce618213 (tag: v1.17.3) Release commit for Kubernetes v1.17.3
...
git log --format=oneline v1.18.0-alpha.3
f9534a304b499a241de06a2bbe360b44dff5250a (tag: v1.18.0-alpha.3) Release commit for Kubernetes v1.18.0-alpha.3
a71586fac6007382db1c1dbe1ea192d4b21a0dde Merge pull request #87598 from sureshpalemoni/master
...
git log --format=oneline upstream/master
...
4b294079458a7216866ac6d0f7c13c617353a343 Add CHANGELOG-1.18.md for v1.18.0-alpha.3 <-- changelog commit pushed by anago
0a476eb7d4a8760e5ec82dee39878417a3c6e7c2 reduce overhead of error message formatting and allocation for scheudler NodeResource filter
2d21f16c38e704eaaa42d7d583ddd103c16d0d6b Fixed code formatting issues discovered by verify-gofmt
97185e97529ef7c006f26bb3190805ad28f15ffe Fixed problem in unit test where error expected/actual comparison was not being performed
48ee18b516b267ba062a1dd00a21e7ae10ecd805 Removed unneeded newline (moved to end of directory not found message)
f60c0af97710a5dca6f55f3c4a1480412ad50ec8 Ignore empty or blank string in path when listing plugins
78248d0c2aaca258b4210446dc38c096e83a1eb6 Fixed code formatting issues discovered by verify-gofmt
1fc80c57eeaa61d915b0e8a3f3d7c88fc8f67fb0 Autogenerated
881dde8bee873793f0cfb5947828417d0750bf72 Remove unnecessary manual conversions
e70a630dac6c0158a5f9bb571223ed5759096dc1 Added 'No resources found' message to describe and top pod commands <-- commit just after v1.18.0-alpha.3 on master
--> missing the "f9534a304b499a241de06a2bbe360b44dff5250a (tag: v1.18.0-alpha.3) Release commit for Kubernetes v1.18.0-alpha.3" commit here
a71586fac6007382db1c1dbe1ea192d4b21a0dde Merge pull request #87598 from sureshpalemoni/master <-- commit just prior to v1.18.0-alpha.3
059429ce53ae494330b2d5e3dcf795cdfda2230e kube-aggregator: increase log level of AggregationController API group logging
439f93c91b81eee29f2aa5c4cf6fff911e26e684 kubectl: allow to preselect interesting container in logs
7368862c19ed9d3293ec0343f1e1c35bc4ba49fa makes unavailableGauge metric to always reflect the current state of a service
69df8a8230f06ea086f67a04f6c9101ae9d0a0ac Add a fast path for adding new node in node_autorizer.
...

You can see a71586fac6007382db1c1dbe1ea192d4b21a0dde is in the master branch, but the next commit (the one above it) is not f9534a304b499a241de06a2bbe360b44dff5250a (tag: v1.18.0-alpha.3) but is e70a630dac6c0158a5f9bb571223ed5759096dc1

That means the anago release on master is not pushing the release-specific commit to the master branch when releasing on master

It is pushing the changelog commit (4b294079458a7216866ac6d0f7c13c617353a343), though many commits later

And the tagged commit not existing on the master branch means that git describe goes back to the nearest tag on the branch, which will be v1.18.0-alpha.2 until we fix this

@justaugustus
Copy link
Member Author

Hehe, what @liggitt said!

justaugustus added a commit to justaugustus/release that referenced this issue Feb 12, 2020
For release branches, we create an empty release commit to avoid
potential ambiguous 'git describe' logic between the official release,
'x.y.z' and the next beta of that release branch, 'x.y.(z+1)-beta.0'.

We avoid doing this empty release commit on 'master', as:
  - there is a potential for branch conflicts
    as upstream/master moves ahead
  - we're checking out a git ref, as opposed to a branch,
    which means the tag will detached from 'upstream/master'

A side-effect of the tag being detached from 'master' is the primary
build job (ci-kubernetes-build) will build as the previous alpha,
instead of the assumed tag.

This causes the next anago run against 'master' to fail
due to an old build version.

Example:
'v1.18.0-alpha.2.663+df908c3aad70be'
(should instead be 'v1.18.0-alpha.3.<commits-since-tag>+<commit-ish>')

ref:
  - kubernetes/issues/1020
  - kubernetes/pull/1030
  - kubernetes/issues/1080
  - kubernetes/kubernetes/pull/88074

Signed-off-by: Stephen Augustus <[email protected]>
@justaugustus
Copy link
Member Author

To step back and fill in some blanks on debugging for posterity...

Where to start?
We kicked off a few staging jobs from master and saw them fail with:

$ ./gcbmgr stage master --buildversion=$(curl -L https://dl.k8s.io/ci/latest.txt)
Step #2: ================================================================================
Step #2: PREPARE AND TAG TREE alpha (5/6)
Step #2: ================================================================================
Step #2: 
Step #2: The v1.18.0-alpha.3 tag already exists!
Step #2: Possible reasons for this:
Step #2: * --buildversion is old.
Step #2: * /workspace/anago-v1.18.0-alpha.2.477+64ba0bf3d6630c is unclean
Step #2: [2020-Feb-07 17:45:53 UTC] prepare_tree in 0s
Step #2: FAILED in prepare_tree.
Step #2: 
Step #2: RELEASE INCOMPLETE! Exiting...

So, let's test the assumption that the --buildversion is old.
Where did we get the build version from?

curl -L https://dl.k8s.io/ci/latest.txt

This is a version marker produced from the ci-kubernetes-build-fast job. You can read more about quirky things with version markers here: kubernetes/sig-release#759, kubernetes/sig-release#850

So maybe something happened during one of the build jobs?

Red herring: Originally, I looked at a snippet of the jobResultsCache.json for a build failure:

$ gsutil cp gs://kubernetes-jenkins/logs/ci-kubernetes-build/jobResultsCache.json .

Here's a snippet of that:

  {
    "buildnumber": "1224580578993508352", 
    "job-version": "v1.18.0-alpha.2.357+76c89645c5858a", 
    "version": "v1.18.0-alpha.2.357+76c89645c5858a", 
    "result": "SUCCESS", 
    "passed": true
  }, 
  {
    "buildnumber": "1224595935300947968", 
    "job-version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5", 
    "version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5", 
    "result": "FAILURE", 
    "passed": false
  }, 
  {
    "buildnumber": "1224642742374633474", 
    "job-version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5", 
    "version": "v1.18.0-alpha.2.360+d52ecd5f70cdf5", 
    "result": "SUCCESS", 
    "passed": true
  },

Checking https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build/1224595935300947968, we see that that job failed due to an unhealthy Prow node.

But jobs have since run successfully, so maybe it's not that?

What does ci-kubernetes-build and its' variants do?

Here's a snippet of its' Prow config:
https://github.com/kubernetes/test-infra/blob/63b4f656e444b7cb136086af91125b8a3799812a/config/jobs/kubernetes/sig-release/kubernetes-builds.yaml#L38-L51

So we're running a bootstrap scenario called kubernetes_build.
If we check that out, you'll see that that scenario does the following:

  • set pre-flight args
  • runs k/k make clean
  • runs k/k make [quick-]release
  • runs k/release ./push-build.sh

Here's a snippet:
https://github.com/kubernetes/test-infra/blob/63b4f656e444b7cb136086af91125b8a3799812a/scenarios/kubernetes_build.py#L134-L139

Somewhere in that process we're grabbing/interpreting a version tag, but where?

If we look at any of the build jobs, we see something like this:

I0213 08:50:14.602] Call:  bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
I0213 08:50:15.139] process 305 exited with code 0 after 0.0m
I0213 08:50:15.140] Start 1227877244496515073 at v1.18.0-alpha.5.27+c099585b10cb8c...

ref: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build/1227877244496515073

(That's a more recent log; I'm just using it here to illustrate.)

kube::version::get_version_vars is here:
https://github.com/kubernetes/kubernetes/blob/c099585b10cb8cb66a4d9e19b7fdfe910be56c30/hack/lib/version.sh#L34-L117

The line we care about is: https://github.com/kubernetes/kubernetes/blob/c099585b10cb8cb66a4d9e19b7fdfe910be56c30/hack/lib/version.sh#L69

When I ran this portion of the build scenario locally, here's what I got:

$ git checkout master
Switched to branch 'master'
$ git reset --hard upstream/master
HEAD is now at df908c3aad7 Merge pull request #87975 from SataQiu/kubeadm-20200210
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.18.0-alpha.2.663+df908c3aad70be
$ git checkout release-1.17
Switched to branch 'release-1.17'
Your branch is up to date with 'upstream/release-1.17'.
$ git reset --hard upstream/release-1.17
HEAD is now at aa9a28326e8 Update CHANGELOG/CHANGELOG-1.17.md for v1.17.3
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.17.4-beta.0.1+aa9a28326e8075
$ git checkout release-1.16
^[[ASwitched to branch 'release-1.16'
Your branch is up to date with 'upstream/release-1.16'.
$ git reset --hard upstream/release-1.16
HEAD is now at abdce0eac9e Update CHANGELOG/CHANGELOG-1.16.md for v1.16.7
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.16.8-beta.0.1+abdce0eac9e732
$ git checkout release-1.15
Switched to branch 'release-1.15'
Your branch is behind 'upstream/release-1.15' by 29 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ git reset --hard upstream/release-1.15
HEAD is now at 3b43c8064a3 Update CHANGELOG/CHANGELOG-1.15.md for v1.15.10
$ bash -c '
set -o errexit
set -o nounset
export KUBE_ROOT=.
source hack/lib/version.sh
kube::version::get_version_vars
echo $KUBE_GIT_VERSION
'
v1.15.11-beta.0.1+3b43c8064a328d

The form of the tag resulting from git describe is:

v<major>.<minor>.<patch>.<commits-since-tag>+<commit-ish>

So we know that v1.18.0-alpha.2.663+df908c3aad70be is 663 commits ahead of v1.18.0-alpha.2, but also more curiously that v1.18.0-alpha.3 was not the most recent tag on master.

I was thinking we might need a manual tag (v1.18.0-alpha.4) on master to trigger new ci-kubernetes-builds which would pave over the current version markers.

This was about the time I started chatting with @liggitt.
His comment describes how we know v1.18.0-alpha.3 was not tagged from master.

The root cause of this was #1030, which introduced unconditionally creating an empty release commit and tagging that. The goal was ensure that in instances where anago produces two releases (when run on release branches), we would have a unique release commit, to prevent the out-of-order releases we had at the end of last year.

We discovered that releases cut from master check out a ref, instead of just the branch.
So by adding an empty release commit and not pushing it, we're in a tree that's disconnected from master.

#1096 fixes this to only create empty release commits on release branches.

cc: @kubernetes/release-engineering

@justaugustus
Copy link
Member Author

Both kubernetes/sig-release#985 and https://kubernetes.slack.com/archives/CJH2GBF7Y/p1581532987106700?thread_ts=1581532800.106100&cid=CJH2GBF7Y contain scrollback from the manual tag push of v1.18.0-alpha.4 I did, but I'm also pasting it here:

$ git remote add danger [email protected]:kubernetes/kubernetes.git
$ git remote update
Fetching origin
Fetching upstream
Fetching danger
From github.com:kubernetes/kubernetes
 * [new branch]              feature-rate-limiting    -> danger/feature-rate-limiting
 * [new branch]              feature-serverside-apply -> danger/feature-serverside-apply
 * [new branch]              feature-workload-ga      -> danger/feature-workload-ga
 * [new branch]              master                   -> danger/master
 * [new branch]              release-0.10             -> danger/release-0.10
 * [new branch]              release-0.12             -> danger/release-0.12
 * [new branch]              release-0.13             -> danger/release-0.13
 * [new branch]              release-0.14             -> danger/release-0.14
 * [new branch]              release-0.15             -> danger/release-0.15
 * [new branch]              release-0.16             -> danger/release-0.16
 * [new branch]              release-0.17             -> danger/release-0.17
 * [new branch]              release-0.18             -> danger/release-0.18
 * [new branch]              release-0.19             -> danger/release-0.19
 * [new branch]              release-0.20             -> danger/release-0.20
 * [new branch]              release-0.21             -> danger/release-0.21
 * [new branch]              release-0.4              -> danger/release-0.4
 * [new branch]              release-0.5              -> danger/release-0.5
 * [new branch]              release-0.6              -> danger/release-0.6
 * [new branch]              release-0.7              -> danger/release-0.7
 * [new branch]              release-0.8              -> danger/release-0.8
 * [new branch]              release-0.9              -> danger/release-0.9
 * [new branch]              release-1.0              -> danger/release-1.0
 * [new branch]              release-1.1              -> danger/release-1.1
 * [new branch]              release-1.10             -> danger/release-1.10
 * [new branch]              release-1.11             -> danger/release-1.11
 * [new branch]              release-1.12             -> danger/release-1.12
 * [new branch]              release-1.13             -> danger/release-1.13
 * [new branch]              release-1.14             -> danger/release-1.14
 * [new branch]              release-1.15             -> danger/release-1.15
 * [new branch]              release-1.16             -> danger/release-1.16
 * [new branch]              release-1.17             -> danger/release-1.17
 * [new branch]              release-1.2              -> danger/release-1.2
 * [new branch]              release-1.3              -> danger/release-1.3
 * [new branch]              release-1.4              -> danger/release-1.4
 * [new branch]              release-1.5              -> danger/release-1.5
 * [new branch]              release-1.6              -> danger/release-1.6
 * [new branch]              release-1.6.3            -> danger/release-1.6.3
 * [new branch]              release-1.7              -> danger/release-1.7
 * [new branch]              release-1.8              -> danger/release-1.8
 * [new branch]              release-1.9              -> danger/release-1.9
$ git checkout master
Already on 'master'
$ git reset --hard danger/master
HEAD is now at 50c8f73a4b2 Merge pull request #88017 from feiskyer/fix-409
$ git tag -s v1.18.0-alpha.4 HEAD^
$ git push --dry-run danger v1.18.0-alpha.4 
To github.com:kubernetes/kubernetes.git
 * [new tag]                 v1.18.0-alpha.4 -> v1.18.0-alpha.4
$ git push danger v1.18.0-alpha.4 
Enumerating objects: 21957, done.
Counting objects: 100% (9472/9472), done.
Delta compression using up to 12 threads
Compressing objects: 100% (2077/2077), done.
Writing objects: 100% (6165/6165), 1.85 MiB | 2.28 MiB/s, done.
Total 6165 (delta 4734), reused 5043 (delta 3969)
remote: Resolving deltas: 100% (4734/4734), completed with 1486 local objects.
To github.com:kubernetes/kubernetes.git
 * [new tag]                 v1.18.0-alpha.4 -> v1.18.0-alpha.4

Note the HEAD^ in git tag -s v1.18.0-alpha.4 HEAD^, which is a ref pointing at the commit prior to HEAD. We did this to ensure that v1.18.0-alpha.4 and v1.18.0-alpha.5 were not tagged on the same commit.

/close

@k8s-ci-robot
Copy link
Contributor

@justaugustus: Closing this issue.

In response to this:

Both kubernetes/sig-release#985 and https://kubernetes.slack.com/archives/CJH2GBF7Y/p1581532987106700?thread_ts=1581532800.106100&cid=CJH2GBF7Y contain scrollback from the manual tag push of v1.18.0-alpha.4 I did, but I'm also pasting it here:

$ git remote add danger [email protected]:kubernetes/kubernetes.git
$ git remote update
Fetching origin
Fetching upstream
Fetching danger
From github.com:kubernetes/kubernetes
* [new branch]              feature-rate-limiting    -> danger/feature-rate-limiting
* [new branch]              feature-serverside-apply -> danger/feature-serverside-apply
* [new branch]              feature-workload-ga      -> danger/feature-workload-ga
* [new branch]              master                   -> danger/master
* [new branch]              release-0.10             -> danger/release-0.10
* [new branch]              release-0.12             -> danger/release-0.12
* [new branch]              release-0.13             -> danger/release-0.13
* [new branch]              release-0.14             -> danger/release-0.14
* [new branch]              release-0.15             -> danger/release-0.15
* [new branch]              release-0.16             -> danger/release-0.16
* [new branch]              release-0.17             -> danger/release-0.17
* [new branch]              release-0.18             -> danger/release-0.18
* [new branch]              release-0.19             -> danger/release-0.19
* [new branch]              release-0.20             -> danger/release-0.20
* [new branch]              release-0.21             -> danger/release-0.21
* [new branch]              release-0.4              -> danger/release-0.4
* [new branch]              release-0.5              -> danger/release-0.5
* [new branch]              release-0.6              -> danger/release-0.6
* [new branch]              release-0.7              -> danger/release-0.7
* [new branch]              release-0.8              -> danger/release-0.8
* [new branch]              release-0.9              -> danger/release-0.9
* [new branch]              release-1.0              -> danger/release-1.0
* [new branch]              release-1.1              -> danger/release-1.1
* [new branch]              release-1.10             -> danger/release-1.10
* [new branch]              release-1.11             -> danger/release-1.11
* [new branch]              release-1.12             -> danger/release-1.12
* [new branch]              release-1.13             -> danger/release-1.13
* [new branch]              release-1.14             -> danger/release-1.14
* [new branch]              release-1.15             -> danger/release-1.15
* [new branch]              release-1.16             -> danger/release-1.16
* [new branch]              release-1.17             -> danger/release-1.17
* [new branch]              release-1.2              -> danger/release-1.2
* [new branch]              release-1.3              -> danger/release-1.3
* [new branch]              release-1.4              -> danger/release-1.4
* [new branch]              release-1.5              -> danger/release-1.5
* [new branch]              release-1.6              -> danger/release-1.6
* [new branch]              release-1.6.3            -> danger/release-1.6.3
* [new branch]              release-1.7              -> danger/release-1.7
* [new branch]              release-1.8              -> danger/release-1.8
* [new branch]              release-1.9              -> danger/release-1.9
$ git checkout master
Already on 'master'
$ git reset --hard danger/master
HEAD is now at 50c8f73a4b2 Merge pull request #88017 from feiskyer/fix-409
$ git tag -s v1.18.0-alpha.4 HEAD^
$ git push --dry-run danger v1.18.0-alpha.4 
To github.com:kubernetes/kubernetes.git
* [new tag]                 v1.18.0-alpha.4 -> v1.18.0-alpha.4
$ git push danger v1.18.0-alpha.4 
Enumerating objects: 21957, done.
Counting objects: 100% (9472/9472), done.
Delta compression using up to 12 threads
Compressing objects: 100% (2077/2077), done.
Writing objects: 100% (6165/6165), 1.85 MiB | 2.28 MiB/s, done.
Total 6165 (delta 4734), reused 5043 (delta 3969)
remote: Resolving deltas: 100% (4734/4734), completed with 1486 local objects.
To github.com:kubernetes/kubernetes.git
* [new tag]                 v1.18.0-alpha.4 -> v1.18.0-alpha.4

Note the HEAD^ in git tag -s v1.18.0-alpha.4 HEAD^, which is a ref pointing at the commit prior to HEAD. We did this to ensure that v1.18.0-alpha.4 and v1.18.0-alpha.5 were not tagged on the same commit.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

5 participants