-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run all PRs on GH Actions VMs #4033
Conversation
Signed-off-by: Kevin Leimkuhler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kleimkuhler I'll be testing this thoroughly 😉
The indentation changes make the diff a little hard to read. Where is that it was inconsistent before? I've usually seen list elements in k8s manifests and elsewhere being formatted without an extra indentation...
.github/workflows/workflow.yml
Outdated
if: github.event_name != 'pull_request' || !github.event.pull_request.head.repo.fork | ||
env: | ||
PROXY_INIT_IMAGE_NAME: gcr.io/linkerd-io/proxy-init:v1.3.1 | ||
PROMETHEUS_IMAGE_NAME: prom/prometheus:v2.11.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been on prom v2.15.2 for a while, not sure how has this been working without updating this tag...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yea I'm not sure. I can update it, but maybe as a separate PR? I'd like to keep a version bump out of this if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image name is for seeding the kind cluster docker caches prior to Linkerd installation. It makes CI (and Linkerd's bootup) faster. It should be kept in-sync with Linkerd, but worst case it just slows things down.
@kleimkuhler it must be a kubernetes thing then. Every yaml coming out of |
@alpeb Okay interesting yea I can remove the indentation then |
Signed-off-by: Kevin Leimkuhler <[email protected]>
. bin/_tag.sh | ||
echo ::set-env name=TAG::$(CI_FORCE_CLEAN=1 bin/root-tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to declare env vars at the top level, that would get shared across all jobs. That might work to avoid repeating this in docker_build
, kind_integration_tests
and docker_push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But might be better to try that after the workflow.yml split off, so we don't have to add this back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep in the workflow.yml
split off I'd like to make a similar step for other jobs that set environment multiple times.
I'm not sure this can be set at the top level though since it relies on having checked out the code already. Unless you're suggesting that a step (after actions/checkout
) can set a workflow level environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you're suggesting that a step (after
actions/checkout
) can set a workflow level environment variable?
Yea, I know you can declare a global env var, but I'm not sure if it's modifiable from within a job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that could be helpful to set early in a workflow. After the workflow files are split I can see if there is a more global ::set-env
option, but I don't see something like that right now.
Signed-off-by: Kevin Leimkuhler <[email protected]>
Signed-off-by: Kevin Leimkuhler <[email protected]>
Signed-off-by: Kevin Leimkuhler <[email protected]>
This renamed |
4f4efd6
to
01d6889
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome. a couple nits then 🚢 👍
Signed-off-by: Kevin Leimkuhler <[email protected]>
Signed-off-by: Kevin Leimkuhler <[email protected]>
Signed-off-by: Kevin Leimkuhler <[email protected]>
I added back the When I changed the version of proxy-init and prometheus images, the That job is still helpful to to run! |
@kleimkuhler It appears |
... OTOH I'm not sure if pulling those images is really needed after all. The |
Signed-off-by: Kevin Leimkuhler <[email protected]>
@alpeb Yea I agree I changed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kleimkuhler! This looks great to me and it's testing fine in both PRs and master 🏆
Depends on #4033 ## Motivation If any job fails in the current GH Actions workflow, a re-run on the same commit SHA requires re-running *all* jobs--regardless if the job already passed in the previous run. This can be problematic when dealing with flakiness in the integration tests. If a test fails due to flakiness in `cloud_integration_tests`, all the unit tests, static checks, and `kind_integration_tests` are re-run which leads to lots of waiting and dealing with the possibility of flakiness in earlier jobs. With this change, individual workflows can now be re-run without triggering all other jobs to complete again first. ## Solution `workflow.yml` is now split into: - `static_checks.yml` - `unit_tests.yml` - `kind_integration.yml` - `cloud_integration.yml` ### Workflows `statc_checks.yml` performs checks related to dependencies, linting, and formatting. `unit_tests.yml` performs the Go and JS unit tests. `kind_integration.yml` builds the images (on Packet or the GH Action VM) and runs the integration tests on a KinD cluster. This workflow continues to run for **all** PRs and pushes to `master` and tags. `cloud_integration.yml` builds the images only on Packet. This is because forked repositories do not need to trigger this workflow. It then creates a unique GKE cluster and runs the integration tests on the cluster. ### The actual flow of work.. A forked repository or non-forked repository opening a PR triggers: - `static_checks` - `unit_tests` - `kind_integration_tests` These workflows all run in parallel and are invidivually re-runnable. A push to `master` or tags triggers: - `static_checks` - `unit_tests` - `kind_integration_tests` - `cloud_integration_tests` These workflows also all run in parallel, including the `docker_build` step of both integration test workflows. This has not conflicted in testing as it takes place on the same Packet host and just utilizes docker layer caching well. Signed-off-by: Kevin Leimkuhler <[email protected]>
Motivation
Currently all pushes to master branch, tags, and Linkerd org member PRs run
the
kind_integration_host
job on the same Packet host.The means that parallel jobs spin up KinD clusters with a unique name and
sandbox the tests so that they do not clash.
This is problematic for a few reasons:
resource constraints.
it's namespaces and the running one expects them to be present.
resulting in inconsistent timeouts and docker errors.
Solution
This change moves all KinD integration testing to GH Actions VMs. This is
currently what forked repository workflows do.
There is no longer a
docker_pull
job as it's responsibilities has been movedinto one of the
kind_integration_tests
steps.The renamed
kind_integration_tests
job is responsible for all PRworkflows and has steps specific to forked and non-forked repositories.
Non-forked repository PRs
The Packet host is still used for building docker images as leveraging docker
layer caching is still valuable--a build can be as fast as 30 seconds compared
to ~12 minutes.
Loading the docker images into the KinD cluster on the GH Action VM is done by
saving the Packet host docker images as image archives, and loading those
directly into the local KinD cluster.
Forked repository PRs
docker_build
has been sped up slightly by sendingdocker save
processes tothe background.
Docker layer caching cannot be leveraged since there are no SSH secrets
available, so the
artifact-upload
/artifact-download
actions introduced in#TODO are still used.
Cleanup
This PR also includes some general cleanup such as:
the current naming pattern.
currently exported multiple times.
throughout the workflow file, so lists are now indented.
Signed-off-by: Kevin Leimkuhler [email protected]