ci(cache): only hash top-level yarn.lock for cache resolution #10035
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
We've been seeing periodic flakiness in cache upload from CI with the following failure:
Upon inspection on my own machine,
**/yarn.lock
matches a bunch of files aftermake setup-js
is run, most of which have nothing to do with our dev environment:Except for our own
yarn.lock
, all those other lock files seem to be stuff thosenpm
packages have accidentally published. They're not used by yarn when resolving/installing our development environment.Unless
hashFiles
is smart enough to skipnode_modules
, we could be wasting CI time including calculating the hash of all of these files. There's a good chancenode_modules
isn't in play when the cache key is calculated (though GitHub Actions seems to be re-hashing in the post-cache step). Even so, though, the globstar would cause some file tree traversal, which could be bad on macOS Ci runners that are known to occasionally have slow filesystem access.This PR replaces
hashFiles('**/yarn.lock')
withhashFiles('yarn.lock')
, which I hope could avoid hitting this 2 minute hash timeout.Updates after publishing the PR
Upon investigation of previous CI runs and those triggered by this PR, this is my working theory of the timeline of the failure:
hashFiles
is callednode_modules
is not yet present, so only top-levelyarn.lock
gets hashed, producing the correct hash keyhashFiles
is called againEvidence / example runs:
edge
, indicating that the initial file hashing was only grabbing the singleyarn.lock
edge
post-cache on macOS takes ~1 minute simply to skip the upload due to cache hithashFiles
timeout, it must be runninghashFiles
in the post-cache step, even though it isn't necessaryci_hash-top-yarn-lock
post-cache on macOS takes0s
to do the same thing (skip cache upload due to cache hit)Changelog
Review requests
Risk assessment
There are two failure modes that I can think of:
(1) can result in some pretty pernicious stuff, so we should be extra sure that my assessment of the situation is correct.