Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(cache): only hash top-level yarn.lock for cache resolution #10035

Merged
merged 1 commit into from
Apr 21, 2022

Conversation

mcous
Copy link
Contributor

@mcous mcous commented Apr 21, 2022

Overview

We've been seeing periodic flakiness in cache upload from CI with the following failure:

Error: .github/workflows/app-test-build-deploy.yaml (Line: 120, Col: 16):
Error: The template is not valid. .github/workflows/app-test-build-deploy.yaml (Line: 120, Col: 16): hashFiles('**/yarn.lock') couldn't finish within 120 seconds.

Upon inspection on my own machine, **/yarn.lock matches a bunch of files after make setup-js is run, most of which have nothing to do with our dev environment:

❯ ls **/yarn.lock
app-shell/node_modules/combined-stream/yarn.lock
app-shell/node_modules/react-input-autosize/yarn.lock
app-shell/node_modules/uri-js/yarn.lock
node_modules/@cypress/request/node_modules/form-data/yarn.lock
node_modules/better-opn/yarn.lock
node_modules/browserify-zlib/yarn.lock
node_modules/combined-stream/yarn.lock
node_modules/copy-to-clipboard/yarn.lock
node_modules/detective-postcss/yarn.lock
node_modules/file-exists-dazinatorfork/yarn.lock
node_modules/microevent.ts/yarn.lock
node_modules/minimalcss/yarn.lock
node_modules/postcss-discard-overridden/yarn.lock
node_modules/react-input-autosize/yarn.lock
node_modules/request/node_modules/form-data/yarn.lock
node_modules/socks-proxy-agent/yarn.lock
node_modules/sumchecker/yarn.lock
node_modules/tiny-emitter/yarn.lock
node_modules/uri-js/yarn.lock
node_modules/worker-rpc/yarn.lock
yarn.lock

Except for our own yarn.lock, all those other lock files seem to be stuff those npm packages have accidentally published. They're not used by yarn when resolving/installing our development environment.

Unless hashFiles is smart enough to skip node_modules, we could be wasting CI time including calculating the hash of all of these files. There's a good chance node_modules isn't in play when the cache key is calculated (though GitHub Actions seems to be re-hashing in the post-cache step). Even so, though, the globstar would cause some file tree traversal, which could be bad on macOS Ci runners that are known to occasionally have slow filesystem access.

This PR replaces hashFiles('**/yarn.lock') with hashFiles('yarn.lock'), which I hope could avoid hitting this 2 minute hash timeout.

Updates after publishing the PR

Upon investigation of previous CI runs and those triggered by this PR, this is my working theory of the timeline of the failure:

  1. Cache step runs
    1. Workflow YAML file is read, hashFiles is called
    2. node_modules is not yet present, so only top-level yarn.lock gets hashed, producing the correct hash key
    3. Cache restored
  2. CI runs
  3. Post-cache step runs
    1. Workflow YAML file is read again for some reason, hashFiles is called again
    2. Post-cache checks that the cache was hit in step (1), declines to upload new cache
    3. Newly computed cache key is thrown away because it wasn't actually needed

Evidence / example runs:

Changelog

  • ci(cache): only hash top-level yarn.lock for cache resolution

Review requests

  • CI is green
  • My "facts" above are actually true

Risk assessment

There are two failure modes that I can think of:

  1. We bust the cache too infrequently, leading to builds pulling in state caches (more likely, given the change)
  2. We bust the cache too much, resulting in longer CI times

(1) can result in some pretty pernicious stuff, so we should be extra sure that my assessment of the situation is correct.

@mcous mcous added the chore label Apr 21, 2022
@mcous mcous requested a review from a team as a code owner April 21, 2022 17:44
@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #10035 (5bcdddf) into edge (1cdd8d3) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             edge   #10035   +/-   ##
=======================================
  Coverage   74.68%   74.68%           
=======================================
  Files        2108     2108           
  Lines       55336    55336           
  Branches     5620     5620           
=======================================
  Hits        41326    41326           
  Misses      12872    12872           
  Partials     1138     1138           
Flag Coverage Δ
app 71.42% <ø> (ø)
components 52.80% <ø> (ø)
labware-library 49.74% <ø> (ø)
protocol-designer 44.88% <ø> (ø)
react-api-client 80.76% <ø> (ø)
shared-data 85.02% <ø> (ø)
step-generation 86.49% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Contributor

@b-cooper b-cooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good and the logic makes sense to me 🦞 Thanks for looking into this!

@mcous mcous merged commit 76cab37 into edge Apr 21, 2022
@mcous mcous deleted the ci_hash-top-yarn-lock branch April 21, 2022 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants