Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-blocking jobs must run in dedicated cluster: ci-kubernetes-build #19483

Closed
spiffxp opened this issue Oct 7, 2020 · 24 comments
Closed
Assignees
Labels
area/jobs kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@spiffxp
Copy link
Member

spiffxp commented Oct 7, 2020

What should be cleaned up or changed:

This is part of #18549

To properly monitor the outcome of this, you should be a member of [email protected]. PR yourself into https://github.com/kubernetes/k8s.io/blob/master/groups/groups.yaml#L603-L628 if you're not a member.

NOTE: I am not tagging this as "help wanted" because it is blocked on kubernetes/k8s.io#846. I would also recommend doing ci-kubernetes-build-fast first. Here is my guess at how we could do this:

  • create a duplicate job that pushes to the new bucket writable by k8s-infra-prow-build
  • ensure it's building and pushing appropriately
  • update a release-blocking job to pull from the new bucket
  • if no problems, roll out changes progressively
    • a few more jobs in release-blocking
    • all jobs in release-blocking that use this job's results
    • a job that still runs in the "default" cluster
    • all jobs that use this job's results
  • rename jobs / get rid of the job that runs on the "default" cluster
  • do the same for release-branch variants, can probably do a faster rollout

It will be helpful to note the date/time that PR's merge. This will allow you to compare before/after behavior.

Things to watch for the job

Things to watch for the build cluster

Keep this open for at least 24h of weekday PR traffic. If everything continues to look good, then this can be closed.

/wg k8s-infra
/sig testing
/area jobs

@spiffxp spiffxp added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Oct 7, 2020
@k8s-ci-robot k8s-ci-robot added wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. area/jobs labels Oct 7, 2020
@spiffxp
Copy link
Member Author

spiffxp commented Oct 7, 2020

/sig release
FYI @kubernetes/release-engineering

@justaugustus
Copy link
Member

Arnaud has PRs in flight for this.
/assign @ameukam

and Carlos has been working on the build-fast stuff.
/assign @cpanato

@spiffxp
Copy link
Member Author

spiffxp commented Nov 5, 2020

Current obstacle is deciding how to allow the ci-kubernetes-build-canary job to push to gcr.io/k8s-staging-ci-images.

Currently:

  • ci-kubernetes-build runs on default build cluster aka google.com-owned k8s-prow-builds
  • all jobs in this cluster use the default pr-kubekins@kubernetes-jenkins gcp service account
  • pr-kubekins has role roles/editor on the kubernetes-ci-images project
  • gs://artifacts.kubernetes-ci-images.appspot.com/ (the bucket that corresponds to gcr.io/kubernetes-ci-images) gives roles/storage.legacyBucketOwner to project editors

What this means:

  • any job that runs in default can push to kubernetes-ci-images, including presubmits for unrelated repos

What could we do for pushing to gcr.io/k8s-staging-ci-images?

  • follow the same approach, allow the default prow-build@k8s-infra-prow-build service account on k8s-infra-prow-build to push to gcr.io/k8s-staging-ci-images
    • do this via adding the service account e-mail address to the [email protected] group?
    • do this via direct iam bindings on the projects or buckets in question?
  • create a service account that is granted push access to gcr.io/k8s-staging-ci-images, restrict its use to well known jobs via static tests
    • could do this for each staging project, or on demand
    • could hardcode job names in tests, or try enforcing some kind of path-based convention
  • suggest that any project that needs to push images to staging repos should run on the "trusted" cluster
    • this would mean no presubmits could push images to staging repos

@ameukam
Copy link
Member

ameukam commented Nov 5, 2020

create a service account that is granted push access to gcr.io/k8s-staging-ci-images, restrict its use to well known jobs via static tests

+1 for doing this for each staging project.
Does this involve a k8s service account for each staging project?

@ameukam
Copy link
Member

ameukam commented Nov 6, 2020

do this via adding the service account e-mail address to the [email protected] group?

ci-kubernetes-build-canary still fails even after the service account is added (see kubernetes/k8s.io#1393) to [email protected] : https://testgrid.k8s.io/sig-testing-canaries#build-master-canary

prow-build service account inherits of the permissions of the role roles/cloudbuild.builds.editor as member of [email protected] :

https://github.com/kubernetes/k8s.io/blob/74bfdc5741bdde3b8f489bdd8327474101b3b5e4/infra/gcp/lib.sh#L209-L231

which is not enough to make the job successful.

@spiffxp
Copy link
Member Author

spiffxp commented Nov 6, 2020

My read of GCS permissions is that members of the group have write and admin privileges to the bucket

$ gsutil iam get gs://artifacts.k8s-staging-ci-images.appspot.com/
{
  "bindings": [
    {
      "members": [
        "group:[email protected]",
        "projectEditor:k8s-staging-ci-images",
        "projectOwner:k8s-staging-ci-images"
      ],
      "role": "roles/storage.legacyBucketOwner"
    },
    {
      "members": [
        "projectViewer:k8s-staging-ci-images"
      ],
      "role": "roles/storage.legacyBucketReader"
    },
    {
      "members": [
        "group:[email protected]" # here
      ],
      "role": "roles/storage.legacyBucketWriter"
    },
    {
      "members": [
        "group:[email protected]",
        "group:[email protected]" # here
      ],
      "role": "roles/storage.objectAdmin"
    },
    {
      "members": [
        "allUsers"
      ],
      "role": "roles/storage.objectViewer"
    }
  ],
  "etag": "CAg="
}

@spiffxp
Copy link
Member Author

spiffxp commented Nov 6, 2020

Per https://cloud.google.com/iam/docs/understanding-roles

roles/storage.legacyBucketOwner

storage.buckets.get
storage.buckets.getIamPolicy
storage.buckets.setIamPolicy
storage.buckets.update
storage.objects.create
storage.objects.delete
storage.objects.list

roles/storage.legacyBucketWriter

storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.list

roles/storage.objectAdmin

resourcemanager.projects.get
resourcemanager.projects.list
storage.objects.*

roles/storage.admin

firebase.projects.get
resourcemanager.projects.get
resourcemanager.projects.list
storage.buckets.*
storage.objects.*

@spiffxp
Copy link
Member Author

spiffxp commented Nov 6, 2020

Perhaps the service account isn't getting picked up as a group member

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ gcloud auth activate-service-account [email protected] --key-file=prow-build.service-account.json
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ docker push gcr.io/k8s-staging-ci-images/conformance-amd64:v1.20.0-beta.1.209_bf5382a53e1546
The push refers to repository [gcr.io/k8s-staging-ci-images/conformance-amd64]
3836a827d4dc: Preparing
75b964e845dc: Preparing
50aa71a59909: Preparing
a4ffc6e6326f: Preparing
fb8aa2688f2b: Preparing
a2bc020e977c: Waiting
b0300c32a489: Waiting
denied: Token exchange failed for project 'k8s-staging-ci-images'. Caller does not have permission 'storage.buckets.get'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control

@spiffxp
Copy link
Member Author

spiffxp commented Nov 6, 2020

Opened kubernetes/k8s.io#1401 to do what I did manually

$ gsutil iam ch serviceAccount:[email protected]:roles/storage.objectAdmin gs://artifacts.k8s-staging-ci-images.appspot.com
$ gsutil iam ch serviceAccount:[email protected]:roles/storage.legacyBucketWriter gs://artifacts.k8s-staging-ci-images.appspot.com

Which I verified as working via a push which had failed before:

$ docker push gcr.io/k8s-staging-ci-images/conformance-amd64:v1.20.0-beta.1.209_bf5382a53e1546
The push refers to repository [gcr.io/k8s-staging-ci-images/conformance-amd64]
3836a827d4dc: Pushed
75b964e845dc: Pushed
50aa71a59909: Pushed
a4ffc6e6326f: Pushed
fb8aa2688f2b: Pushed
a2bc020e977c: Pushed
b0300c32a489: Pushed
v1.20.0-beta.1.209_bf5382a53e1546: digest: sha256:a97c22c59446a50398f42ebcd3af4f9999ec82adb008ce7dbed2f016536d3ef2 size: 1793

ameukam added a commit to ameukam/test-infra that referenced this issue Nov 10, 2020
ameukam added a commit to ameukam/test-infra that referenced this issue Nov 10, 2020
ameukam added a commit to ameukam/test-infra that referenced this issue Nov 10, 2020
ameukam added a commit to ameukam/test-infra that referenced this issue Nov 10, 2020
k8s-ci-robot added a commit that referenced this issue Nov 10, 2020
Add canary jobs for ci-kubernetes-build job variants.
@lasomethingsomething
Copy link
Contributor

@cpanato @ameukam Hello both: Wondering what your suggested next step might be from a Release Engineering group perspective?

@cpanato
Copy link
Member

cpanato commented Jan 19, 2021

I need to see what is the status of this item and maybe we can plan the next steps (@LappleApple)

@ameukam do you have any updates that you can provide to check where we are with this issue?

@ameukam
Copy link
Member

ameukam commented Jan 20, 2021

@LappleApple @cpanato The only update I'm aware is the switch to gcr.io/k8s-staging-ci-images : kubernetes/kubernetes#97087.
As suggested (#19483 (comment)) by spiffxp, the next step would be a KEP describing how we can migrate away from kubernetes-release-dev bucket.

@spiffxp
Copy link
Member Author

spiffxp commented Jan 25, 2021

IMO next steps are:

@spiffxp
Copy link
Member Author

spiffxp commented Feb 2, 2021

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 2, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Feb 8, 2021

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 8, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Feb 9, 2021

/remove-priority important-longterm
/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 9, 2021
ameukam added a commit to ameukam/test-infra that referenced this issue Feb 22, 2021
Ensure periodic-kubernetes-bazel-build can run k8s-infra-prow-build
cluster and push artifacts on k8s-release-dev bucket.

Related to : kubernetes#19483

Signed-off-by: Arnaud Meukam <[email protected]>
@spiffxp
Copy link
Member Author

spiffxp commented Feb 23, 2021

@spiffxp
Copy link
Member Author

spiffxp commented Feb 24, 2021

/close
Work related to fully deprecating the google-hosted artifacts from these jobs (#19483 (comment)) will be tracked under kubernetes/k8s.io#1571

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
Work related to fully deprecating the google-hosted artifacts from these jobs (#19483 (comment)) will be tracked under kubernetes/k8s.io#1571

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/jobs kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

7 participants