Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

releng: Publish fast builds to a separate subdirectory #18290

Merged
merged 4 commits into from
Jul 28, 2020

Conversation

justaugustus
Copy link
Member

(#18163 got thrashed w/ a bunch of old commits, so opening a new PR.)

  • Remove extraneous latest-{{.Version}}-cross markers

    latest-{{.Version}} are inherently cross builds, so having the
    latest-{{.Version}}-cross markers is redundant.

    (This commit matches another in [WIP] releng: Add stable4 and remove beta generated jobs #18169, so it'll get rebased out of
    whichever PR lands last.)

  • kubetest: Enable extract strategy for fast (linux/amd64-only) CI builds

  • scenarios: Allow kubernetes_build to recognize fast builds

Inverts the logic in #18158.
Explained in more detail here: kubernetes/release#1389

Signed-off-by: Stephen Augustus [email protected]

/assign @BenTheElder @spiffxp @cblecker @dims
cc: @kubernetes/release-engineering
/hold for review
ref: kubernetes/sig-release#850, kubernetes/sig-release#759

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2020
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 13, 2020
@k8s-ci-robot k8s-ci-robot requested review from cpanato and wojtek-t July 13, 2020 06:00
@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs area/kubetest area/release-eng Issues or PRs related to the Release Engineering subproject area/scenarios sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 13, 2020
@justaugustus
Copy link
Member Author

justaugustus commented Jul 13, 2020

Still some questions to answer from the previous PR:

From @BenTheElder:

I'll review this properly Monday.

I would like to request though that we settle on something and then stop moving these around, these markers are the API for our CI builds.
There's no good way to communicate moving these to all potential users and we want people to actually test these.

still not in favor of continuing to rename these markers. Doing so affects all consumers and we have no way to communicate this effectively.

where was this discussed?

From @spiffxp:

I've tried to follow the trail of issues and PR's and can't tell what the actual plan is, how conflicting version markers manifested, etc. There's too much followup/revert and no "this is the desired state we are reconciling to" that I can find?

I know we chatted on slack about moving toward version markers and job names that explicitly have the version in them. A plan of record would be useful

EDIT: I'll tie the histories together this week and put a plan up in kubernetes/sig-release#850.

@fejta-bot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/check-cla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 13, 2020
@hakman
Copy link
Member

hakman commented Jul 15, 2020

@justaugustus Thanks for trying to find a better solution for this.
At the moment the current cross-builds marker k8s-master is pointing to a 2 weeks old build. The race condition between fast and cross builds pretty much prevents any cross-build from running, which makes multi-arch testing with CI builds impossible.

> curl -s https://storage.googleapis.com/kubernetes-release-dev/ci/k8s-master.txt
v1.19.0-beta.2.607+4c853bb28f57f8

Is there any way to force cross-builds to run at least on daily basis until a decision is reached here?

@justaugustus
Copy link
Member Author

Is there any way to force cross-builds to run at least on daily basis until a decision is reached here?

@hakman -- I'd prefer to solve this instead of thinking of a workaround. I think we're close here.

@justaugustus
Copy link
Member Author

@BenTheElder @spiffxp -- More details, as requested.

Version markers are text files stored in the root of various GCS buckets:

They represent the results of different types of Kubernetes build jobs and act as sort of a public API for accessing builds. One can see them leveraged in extraction strategies for e2e tests, release engineering tooling, and user-created scripts.

Unfortunately, the way certain version markers are generated and utilized can at best be confusing, and at worst, disruptive.

There are a variety of problems, some of which are symptoms of the other ones...

FIXED - Cross builds are stored in a separate GCS bucket

(Fixed in #14030.)

This makes long-term usage of cross builds a little more difficult, since scripts utilizing version markers tend to consider only the version marker filename, while the GCS bucket name remains unparameterized.

FIXED - Generated jobs may not represent intention

(Fixed in #15564.)

As the generic version markers can shift throughout the release cycle, every time we regenerate jobs, they may not represent what we intend to test.

The best examples of this are pretty much every job using the k8s-beta version marker, and more specifically, skew and upgrade jobs.

FIXED - bazel version markers appear to be unused

(Fixed in #15612.)

Generic version markers are not explicit

We publish a set of additional generic version markers:

  • k8s-master
  • k8s-beta
  • k8s-stable1
  • k8s-stable2
  • k8s-stable3

Depending on the point in the release cycle, the meaning of these markers can change.

  • k8s-master always points to the version on master.
  • k8s-beta may represent:
    • masters build version (pre-branch cut)
    • a to-be-released build version (post-branch cut)
    • a recently released build version (post-release)

Knowing what these markers mean at any one time presumes knowledge of the build/release process or a correct interpretation of the Kubernetes versions docs, which is often out of date and in low-visibility location.

Manually created jobs using generic version markers can be inaccurate

Non-generated jobs using generic version markers do not get the same level of scrunity as ones that are generated via releng/test_config.yaml.

This leads to inaccuracies between the versions presumed to be used in test and the versions that may be displayed in testgrid.

ci-kubernetes-e2e-gce-beta-stable1-gci-kubectl-skew is a great example:

- interval: 2h
name: ci-kubernetes-e2e-gce-beta-stable1-gci-kubectl-skew
labels:
preset-service-account: "true"
preset-k8s-ssh: "true"
spec:
containers:
- args:
- --timeout=140
- --bare
- --scenario=kubernetes_e2e
- --
- --check-leaked-resources
- --check-version-skew=false
- --extract=ci/k8s-stable1
- --extract=ci/k8s-beta
- --gcp-node-image=gci
- --gcp-zone=us-west1-b
- --ginkgo-parallel
- --provider=gce
- --skew
- --test_args=--ginkgo.focus=Kubectl --ginkgo.skip=\[Serial\]|\[Deprecated\] --minStartupPods=8
- --timeout=120m
image: gcr.io/k8s-testimages/kubekins-e2e:v20200715-a03387b-master
annotations:
testgrid-dashboards: sig-release-job-config-errors
testgrid-tab-name: gce-1.14-1.13-gci-kubectl-skew

All variants of that prowjob have landed on the sig-release-job-config-errors dashboard for various misconfiguration issues that are the result of generic version markers

linux/amd64 version markers are colliding with cross builds

"Fast" (linux/amd64-only) builds run every 5 minutes, while cross builds run every hour.
They also write to the same version markers (latest.txt, latest-<major>.txt, latest-<major>.<minor>.txt).

The Kubernetes build jobs have a mechanism for checking if a build already exists and will exit early to save on test cycles.

What this means is if a "fast" build has already happened for a commit, then the corresponding cross build will exit without building.

This has been happening pretty consistently lately, so cross build consumers are using much older versions of Kubernetes than intended.

(Note that this condition only happens on master.)


I'd like to establish a rough plan of record to continue iteratively fixing some of these issues.

Plan of record

@justaugustus
Copy link
Member Author

still not in favor of continuing to rename these markers. Doing so affects all consumers and we have no way to communicate this effectively.

where was this discussed?

@BenTheElder -- it's kind of scattered across Slack across multiple release cycles 😭

The previous recent PRs were specifically meant to not impact existing consumers.

This one:

  • Moves the fast builds to a subdirectory (/fast)
  • Enables an extraction strategy for fast builds

For people using latest.txt, after merge they'd be pointing at a cross build, instead of a fast build.
This could cause a transient failure in jobs using that marker while the next cross build is in progress, though I doubt it (tests using the latest marker are linux/amd64-only already).

For people using generic version markers, there should be no impact as we haven't changed the generic markers.

After the cross builds have started building again, I'd update jobs using generic version markers to use latest or latest-x.y markers instead.

@justaugustus
Copy link
Member Author

@dims
Copy link
Member

dims commented Jul 16, 2020

Let's move the info / checklist to an issue (pretty please!) will get lost here. Also next time we should try to open an issue early so we have a record (i hate slack search!)

thanks,
Dims

@justaugustus
Copy link
Member Author

justaugustus commented Jul 16, 2020

@dims -- Already opened and ref'ed in the PR description: kubernetes/sig-release#850

I opened that last November and have been threading incremental fixes to it since.
(The plan of record and additional details are recent additions though.)

Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My overall comment on this PR is that I'm fine undoing the version markers change that was recently introduced. But I'm hesitant to move forward on the new addition "fast" path without a more thorough review of where we're headed.

Is the fast build stuff strictly required to unblock cross builds or is there a more straightforward revert that can be done to unblock cross?

@spiffxp
Copy link
Member

spiffxp commented Jul 20, 2020

Trying to recap what was discussed over zoom Thursday:

  • I would like to see this PR update versions.md to reflect the current/desired state (in anticipation of this content moving somewhere more visible)
  • We need to decide whether latest.txt should correspond to the results of a fast build (amd64-only) or a cross-build for the in-development release of kubernetes
    • cross build means waiting for ~80-120min after a PR merges before release-blocking CI jobs will start using a build containing that PR's changes - this is what all other release branches do
    • fast build means waiting only ~20-30min, but is the opposite of what all other release branches do
  • We didn't have to make this choice before because a race condition was benefitting us? Fast builds would show up first and eventually get overwritten by cross builds? Couldn't quite remember this part...

@detiber
Copy link
Member

detiber commented Jul 22, 2020

It would be great to get more generic descriptions of when to use which marker documented.

For example, It's not quite clear to me when I would use latest.txt rather than master.txt

It's also not overly clear to me when the beta transitions from a released version back to the state of master, is that only after beta builds are started for the next release?

Generic descriptions for stablen could likely be something like current stable release, n-1, etc
Though, I'm wondering why we couldn't use something a bit more clear and descriptive like:

  • stable
  • stable-1
  • stable-2
  • stable-3 (since it looks like the plan is to add another stable marker)

Then the naming itself might make the use cases a bit clearer.

Last thought, would it be possible to have a marker that represents:

  • current state of master if all current release branches are "stable"
  • current state of latest release branch if release is "pending"

That would allow for jobs to be defined in a way that they represent the latest build for the "next minor" version of kubernetes without having to potentially swap it between the yet to be released release branch and master, and target the "next patch" versions of supported released kubernetes versions using the "stable" markers.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 28, 2020
@justaugustus
Copy link
Member Author

@BenTheElder @spiffxp -- Version marker doc has been updated. PTAL.

Comment on lines +300 to +302
- `k8s-stable1`
- `k8s-stable2`
- `k8s-stable3`
Copy link
Member

@hakman hakman Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @detiber 's suggestion is a bit more intuitive:

Suggested change
- `k8s-stable1`
- `k8s-stable2`
- `k8s-stable3`
- `k8s-stable`
- `k8s-stable-1`
- `k8s-stable-2`
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documents the current state, where "current" is what exists today in addition to what would happen once this PR merges.

There is no k8s-stable marker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks.

@justaugustus
Copy link
Member Author

@detiber @jsturtevant -- Can you read through the docs in this PR and let me know if they're sufficient for your needs?

Note: I'm only aiming to unblock cross builds here... any other topics around this should be discussed in kubernetes/sig-release#850.

Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That versions rewrite is really comprehensive, looks great. I would have been satisfied with far less to unblock. My main concern is making sure release-blocking jobs get back to the low-latency merged-pr-to-new-build path this PR is going to kick them out of.

Comment on lines +51 to +53
### CI - cross build

Use `gsutil cat gs://kubernetes-release-dev/ci/latest.txt` (`master` branch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implication with this change is that any job depending on latest is going to have a 90m-2h lag time from PR merge to exercising the merged results, where before it was 20-30m. I suspect release-team will move quickly to update release-blocking jobs to use latest-fast.txt instead. A heads up to kuberentes-dev@ would be appreciated.

Comment on lines +266 to +268
**Directory:** `ci`

#### latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting nit: it's visually difficult to tell the difference between these. Maybe bullet the bolded items?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 28, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justaugustus, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 28, 2020
@justaugustus
Copy link
Member Author

Thanks @spiffxp!
I'm going to lift this so we can get the builds started, but also so I can reference the rewritten doc in the email to k-dev.

PR to follow to swap the markers on blocking jobs.
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2020
@k8s-ci-robot k8s-ci-robot merged commit 3610e67 into kubernetes:master Jul 28, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jul 28, 2020
@k8s-ci-robot
Copy link
Contributor

@justaugustus: Updated the job-config configmap in namespace default at cluster default using the following files:

  • key kubernetes-builds.yaml using file config/jobs/kubernetes/sig-release/kubernetes-builds.yaml
  • key 1.16.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.16.yaml
  • key 1.17.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.17.yaml
  • key 1.18.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.18.yaml
  • key 1.19.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.19.yaml

In response to this:

(#18163 got thrashed w/ a bunch of old commits, so opening a new PR.)

  • Remove extraneous latest-{{.Version}}-cross markers

latest-{{.Version}} are inherently cross builds, so having the
latest-{{.Version}}-cross markers is redundant.

(This commit matches another in #18169, so it'll get rebased out of
whichever PR lands last.)

  • kubetest: Enable extract strategy for fast (linux/amd64-only) CI builds
  • scenarios: Allow kubernetes_build to recognize fast builds

Inverts the logic in #18158.
Explained in more detail here: kubernetes/release#1389

Signed-off-by: Stephen Augustus [email protected]

/assign @BenTheElder @spiffxp @cblecker @dims
cc: @kubernetes/release-engineering
/hold for review
ref: kubernetes/sig-release#850, kubernetes/sig-release#759

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@justaugustus
Copy link
Member Author

Opened #18517 to fix up jobs using the ci/k8s-master version marker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/kubetest area/release-eng Issues or PRs related to the Release Engineering subproject area/scenarios cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.