Run all PRs on GH Actions VMs #4033

kleimkuhler · 2020-02-10T16:07:57Z

Motivation

Currently all pushes to master branch, tags, and Linkerd org member PRs run
the kind_integration_host job on the same Packet host.

The means that parallel jobs spin up KinD clusters with a unique name and
sandbox the tests so that they do not clash.

This is problematic for a few reasons:

There is a limit on the number of jobs we can run in parallel due to
resource constraints.
Workflow cancellation and re-runs conflict when the cancelled run deletes
it's namespaces and the running one expects them to be present.
There has been an observed flakiness with running multiple KinD clusters
resulting in inconsistent timeouts and docker errors.

Solution

This change moves all KinD integration testing to GH Actions VMs. This is
currently what forked repository workflows do.

There is no longer a docker_pull job as it's responsibilities has been moved
into one of the kind_integration_tests steps.

The renamed kind_integration_tests job is responsible for all PR
workflows and has steps specific to forked and non-forked repositories.

Non-forked repository PRs

The Packet host is still used for building docker images as leveraging docker
layer caching is still valuable--a build can be as fast as 30 seconds compared
to ~12 minutes.

Loading the docker images into the KinD cluster on the GH Action VM is done by
saving the Packet host docker images as image archives, and loading those
directly into the local KinD cluster.

Forked repository PRs

docker_build has been sped up slightly by sending docker save processes to
the background.

Docker layer caching cannot be leveraged since there are no SSH secrets
available, so the artifact-upload/artifact-download actions introduced in
#TODO are still used.

Cleanup

This PR also includes some general cleanup such as:

Some job names have been renamed to better reflect their purpose or match
the current naming pattern.
Environment variables are set earlier in jobs as a separate step if it is
currently exported multiple times.
Indentation was really bothering me because it switches back and forth
throughout the workflow file, so lists are now indented.

Signed-off-by: Kevin Leimkuhler [email protected]

Signed-off-by: Kevin Leimkuhler <[email protected]>

alpeb

Thanks @kleimkuhler I'll be testing this thoroughly 😉
The indentation changes make the diff a little hard to read. Where is that it was inconsistent before? I've usually seen list elements in k8s manifests and elsewhere being formatted without an extra indentation...

.github/workflows/workflow.yml

alpeb · 2020-02-10T16:48:51Z

.github/workflows/workflow.yml

+        if: github.event_name != 'pull_request' || !github.event.pull_request.head.repo.fork
+        env:
+          PROXY_INIT_IMAGE_NAME: gcr.io/linkerd-io/proxy-init:v1.3.1
+          PROMETHEUS_IMAGE_NAME: prom/prometheus:v2.11.1


We've been on prom v2.15.2 for a while, not sure how has this been working without updating this tag...

Hm yea I'm not sure. I can update it, but maybe as a separate PR? I'd like to keep a version bump out of this if possible.

This image name is for seeding the kind cluster docker caches prior to Linkerd installation. It makes CI (and Linkerd's bootup) faster. It should be kept in-sync with Linkerd, but worst case it just slows things down.

kleimkuhler · 2020-02-10T19:26:06Z

@alpeb When lists are the value of a dictionary I've tended to see them indented just like any other value would be. Also why I changed this is I've had to un-indent new lists in order to match what we currently have. Lastly pretty much any YAML documentation I look at indents lists as well.

alpeb · 2020-02-10T20:23:35Z

@kleimkuhler it must be a kubernetes thing then. Every yaml coming out of kubectl get -oyaml doesn't use indentation for lists, nor have we in our test golden files or Helm templates. I've also put together this snippet that uses https://github.com/kubernetes-sigs/yaml, the same lib that k8s and ourselves rely on.

kleimkuhler · 2020-02-10T20:42:53Z

@alpeb Okay interesting yea I can remove the indentation then

Signed-off-by: Kevin Leimkuhler <[email protected]>

.github/workflows/workflow.yml

alpeb · 2020-02-10T23:01:11Z

.github/workflows/workflow.yml

+        . bin/_tag.sh
+        echo ::set-env name=TAG::$(CI_FORCE_CLEAN=1 bin/root-tag)


It's possible to declare env vars at the top level, that would get shared across all jobs. That might work to avoid repeating this in docker_build, kind_integration_tests and docker_push

But might be better to try that after the workflow.yml split off, so we don't have to add this back

Yep in the workflow.yml split off I'd like to make a similar step for other jobs that set environment multiple times.

I'm not sure this can be set at the top level though since it relies on having checked out the code already. Unless you're suggesting that a step (after actions/checkout) can set a workflow level environment variable?

Unless you're suggesting that a step (after actions/checkout) can set a workflow level environment variable?

Yea, I know you can declare a global env var, but I'm not sure if it's modifiable from within a job.

Yea that could be helpful to set early in a workflow. After the workflow files are split I can see if there is a more global ::set-env option, but I don't see something like that right now.

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler · 2020-02-11T20:56:32Z

This renamed Go Format to Go format and Validate go deps to Go dependencies, so those two required checks will not actually run. The required checks will need to be updated again.

siggy

this is awesome. a couple nits then 🚢 👍

.github/workflows/workflow.yml

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler · 2020-02-12T05:20:40Z

I added back the docker_pull job that runs in addition to docker_build.

When I changed the version of proxy-init and prometheus images, the docker save .. command failed in kind_integration_tests because the new prometheus version was not pulled down on the packet host.

That job is still helpful to to run!

alpeb · 2020-02-12T15:27:37Z

@kleimkuhler It appears kind_integration_tests are not running in forks because they now require docker_pull which only runs in master. Perhaps it would make sense to forgo docker_pull and instead put the Ensure Packet has the correct proxy-init and prometheus versions right before kind_integration_tests's Load cli-bin image into local docker images?

alpeb · 2020-02-12T16:15:14Z

... OTOH I'm not sure if pulling those images is really needed after all. The docker save commands will fail if the images are not there, which only happens right after our weekly docker cache cleanup, and they'll be automatically pulled and be available for the following builds. How about just logging the failure and not having the build step fail?

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler · 2020-02-12T18:05:05Z

@alpeb Yea I agree I changed the docker save .. commands to just output STDERR to STDOUT but not fail the script. It should be good to go again.

alpeb

Thanks @kleimkuhler! This looks great to me and it's testing fine in both PRs and master 🏆

Depends on #4033 ## Motivation If any job fails in the current GH Actions workflow, a re-run on the same commit SHA requires re-running *all* jobs--regardless if the job already passed in the previous run. This can be problematic when dealing with flakiness in the integration tests. If a test fails due to flakiness in `cloud_integration_tests`, all the unit tests, static checks, and `kind_integration_tests` are re-run which leads to lots of waiting and dealing with the possibility of flakiness in earlier jobs. With this change, individual workflows can now be re-run without triggering all other jobs to complete again first. ## Solution `workflow.yml` is now split into: - `static_checks.yml` - `unit_tests.yml` - `kind_integration.yml` - `cloud_integration.yml` ### Workflows `statc_checks.yml` performs checks related to dependencies, linting, and formatting. `unit_tests.yml` performs the Go and JS unit tests. `kind_integration.yml` builds the images (on Packet or the GH Action VM) and runs the integration tests on a KinD cluster. This workflow continues to run for **all** PRs and pushes to `master` and tags. `cloud_integration.yml` builds the images only on Packet. This is because forked repositories do not need to trigger this workflow. It then creates a unique GKE cluster and runs the integration tests on the cluster. ### The actual flow of work.. A forked repository or non-forked repository opening a PR triggers: - `static_checks` - `unit_tests` - `kind_integration_tests` These workflows all run in parallel and are invidivually re-runnable. A push to `master` or tags triggers: - `static_checks` - `unit_tests` - `kind_integration_tests` - `cloud_integration_tests` These workflows also all run in parallel, including the `docker_build` step of both integration test workflows. This has not conflicted in testing as it takes place on the same Packet host and just utilizes docker layer caching well. Signed-off-by: Kevin Leimkuhler <[email protected]>

Run all PRs on GH Actions VMs

db1be91

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler self-assigned this Feb 10, 2020

kleimkuhler requested review from alpeb and siggy February 10, 2020 16:08

alpeb reviewed Feb 10, 2020

View reviewed changes

Remove indentation

85e7798

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler requested a review from alpeb February 10, 2020 20:45

alpeb reviewed Feb 10, 2020

View reviewed changes

.github/workflows/workflow.yml Outdated Show resolved Hide resolved

alpeb reviewed Feb 10, 2020

View reviewed changes

alpeb mentioned this pull request Feb 10, 2020

Make CI workflow resilient to multiple runs on the same sha or tag #3635

Closed

Fail script early if bg jobs fail

be260f7

Signed-off-by: Kevin Leimkuhler <[email protected]>

This was referenced Feb 11, 2020

Separate single GH Actions workflow into independent workflows #4036

Closed

Separate single GH Actions workflow into independent workflows #4037

Closed

kleimkuhler added 2 commits February 10, 2020 17:54

Remove unused variable

c2dd5c7

Signed-off-by: Kevin Leimkuhler <[email protected]>

Fix script exit

01d6889

Signed-off-by: Kevin Leimkuhler <[email protected]>

kleimkuhler mentioned this pull request Feb 11, 2020

Separate single Actions workflow into multiple workflows #4039

Merged

kleimkuhler requested a review from alpeb February 11, 2020 17:44

kleimkuhler force-pushed the kleimkuhler/load-from-host branch from 4f4efd6 to 01d6889 Compare February 11, 2020 22:03

siggy approved these changes Feb 12, 2020

View reviewed changes

.github/workflows/workflow.yml Outdated Show resolved Hide resolved

.github/workflows/workflow.yml Show resolved Hide resolved

.github/workflows/workflow.yml Outdated Show resolved Hide resolved

kleimkuhler added 3 commits February 11, 2020 18:04

Address reviews

ccf98ca

Signed-off-by: Kevin Leimkuhler <[email protected]>

Ensure packet host has correct proxy-init and prometheus image versions

a9ca35a

Signed-off-by: Kevin Leimkuhler <[email protected]>

Fix reference format

b7acc2f

Signed-off-by: Kevin Leimkuhler <[email protected]>

Allow loading proxy-init and prometheus image loads to fail

d5e393d

Signed-off-by: Kevin Leimkuhler <[email protected]>

alpeb approved these changes Feb 12, 2020

View reviewed changes

olix0r merged commit a460ada into master Feb 12, 2020

olix0r deleted the kleimkuhler/load-from-host branch February 12, 2020 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run all PRs on GH Actions VMs #4033

Run all PRs on GH Actions VMs #4033

kleimkuhler commented Feb 10, 2020

alpeb left a comment

alpeb Feb 10, 2020

kleimkuhler Feb 10, 2020

siggy Feb 10, 2020

kleimkuhler commented Feb 10, 2020

alpeb commented Feb 10, 2020

kleimkuhler commented Feb 10, 2020

alpeb Feb 10, 2020

alpeb Feb 10, 2020

kleimkuhler Feb 10, 2020

alpeb Feb 10, 2020

kleimkuhler Feb 11, 2020

kleimkuhler commented Feb 11, 2020

siggy left a comment

kleimkuhler commented Feb 12, 2020

alpeb commented Feb 12, 2020

alpeb commented Feb 12, 2020

kleimkuhler commented Feb 12, 2020

alpeb left a comment

		. bin/_tag.sh
		echo ::set-env name=TAG::$(CI_FORCE_CLEAN=1 bin/root-tag)

Run all PRs on GH Actions VMs #4033

Run all PRs on GH Actions VMs #4033

Conversation

kleimkuhler commented Feb 10, 2020

Motivation

Solution

Non-forked repository PRs

Forked repository PRs

Cleanup

alpeb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kleimkuhler commented Feb 10, 2020

alpeb commented Feb 10, 2020

kleimkuhler commented Feb 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kleimkuhler commented Feb 11, 2020

siggy left a comment

Choose a reason for hiding this comment

kleimkuhler commented Feb 12, 2020

alpeb commented Feb 12, 2020

alpeb commented Feb 12, 2020

kleimkuhler commented Feb 12, 2020

alpeb left a comment

Choose a reason for hiding this comment