-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A modest ccache proposal #54558
A modest ccache proposal #54558
Conversation
b0c8966
to
aaffb31
Compare
I was unclear about what I meant by, "with the generation count thing" on discord. What I meant was: Cataclysm-DDA/.github/workflows/matrix.yml Line 268 in 7468747
In pr builds
Use that timestamp in the primary key, restore-keys is just missing the timestamp There's a race condition where master writes an artifact before uploading its cache archive, so we still need to wipe the cache to prevent evictions, but it should be super rare. |
I don't understand the race condition 'to an artifact' you are describing. Even so, I don't see any races here. It's intentionally structured to always fail the primary cache lookup, on purpose, and rely on restore-keys to hit the most recent master cache artifact available as per the documentation of how restore keys works with multiple matching entries. restore-keys always goes to master because it's hard coded to use master, even on PRs. |
see: CleverRaven/Cataclysm-DDA#54558 Co-authored-by: Andrew Krieger <[email protected]>
see: CleverRaven/Cataclysm-DDA#54558 Co-authored-by: Andrew Krieger <[email protected]>
see: CleverRaven/Cataclysm-DDA#54558 Co-authored-by: Andrew Krieger <[email protected]>
see: CleverRaven/Cataclysm-DDA#54558 Co-authored-by: Andrew Krieger <[email protected]>
* ci: remove EXTRA_TEST_OPTS not used ATM * ci: install SDL dependencies only for tiles build see: CleverRaven/Cataclysm-DDA#55088 Co-authored-by: Brett Dong <[email protected]> * ci: order title first for readability * ci: limit resource usages on repeated pushes see: - CleverRaven/Cataclysm-DDA#54306 - CleverRaven/Cataclysm-DDA#54617 Co-authored-by: Kevin Granade <[email protected]> * ci: add `ccache_limit` and `ccache_key` * ci: correctly cache ccache see: - CleverRaven/Cataclysm-DDA#54078 - CleverRaven/Cataclysm-DDA#64553 Co-authored-by: Kevin Granade <[email protected]> Co-authored-by: Brett Dong <[email protected]> * ci: ccache stats and cleanup see: CleverRaven/Cataclysm-DDA#64553 Co-authored-by: Brett Dong <[email protected]> * ci: write ccache from upload only see: CleverRaven/Cataclysm-DDA#54558 Co-authored-by: Andrew Krieger <[email protected]> * ci: better performance ccache options see: CleverRaven/Cataclysm-DDA@aaffb31 Co-authored-by: Andrew Krieger <[email protected]> * ci: build only and test only script see: CleverRaven/Cataclysm-DDA#54435 Co-authored-by: Brett Dong <[email protected]> Co-authored-by: Alexey <[email protected]> * ci: add some line breaks my eyes hurt * ci: skip via either code or data see: - CleverRaven/Cataclysm-DDA@5707742 - CleverRaven/Cataclysm-DDA@f6ec25f Co-authored-by: Kevin Granade <[email protected]> Co-authored-by: Andrew Krieger <[email protected]> * ci: separate build and test step see: CleverRaven/Cataclysm-DDA#54435 Co-authored-by: Brett Dong <[email protected]> * ci: bump actions/upload-artifact to v3 see: CleverRaven/Cataclysm-DDA#64962 Co-authored-by: casswedson <[email protected]> * ci: ignore upload fail after build fails see: CleverRaven/Cataclysm-DDA@b7dc997 Co-authored-by: John Bytheway <[email protected]> * ci: emit success artifact * fix: use num_jobs correctly * Update .github/workflows/matrix.yml Co-authored-by: Olanti <[email protected]> * Update .github/workflows/matrix.yml Co-authored-by: Olanti <[email protected]> * ci: remove object-creator * ci: add run-on-draft flag * ci: match languages in translation test * ci: use `upload-artifact` * Update build-scripts/gha_compile_only.sh Co-authored-by: Olanti <[email protected]> * test: fix catch2 tag format * test: mark overmap lab generation test as flaky * test: mark ranged aiming test as flaky * test: use anonymous namespace * test: adjust grenade lethality to match BN value * ci: remove gh push trigger it didn't work * ci: reduce cache limit github cache limit is 10GB * Revert "test: mark overmap lab generation test as flaky" This reverts commit 71f6840. * ci: does it work? * ci: remove `FRAMEWORK` option * ci: fix restore key --------- Co-authored-by: Brett Dong <[email protected]> Co-authored-by: Kevin Granade <[email protected]> Co-authored-by: Andrew Krieger <[email protected]> Co-authored-by: Alexey <[email protected]> Co-authored-by: casswedson <[email protected]> Co-authored-by: John Bytheway <[email protected]> Co-authored-by: Olanti <[email protected]>
Summary
None
Purpose of change
The current ccache configuration sorta mostly works, but there's been headaches around ccache internal eviction because of the artificially low limit we set to avoid gha cache eviction. ccache internally shards its cache into 16 buckets, and applies a limit of /16 on size and file counts. As a result, a 1GB ccache limit means a single bucket can only hold 64MB of objects. For the basic build, the PCH file by itself is 60MB, so it gets trivially evicted if anything else shards into its bucket. We see this manifesting as pessimal ccache performance today. Somehow we need to raise the ccache cache limits without also creating eviction issues for our useful gha cache artifacts.
Describe the solution
This is a fusion of earlier ideas when brainstorming how to fix the ccache issues. A key insight I didn't have before is all PR runs rebase onto master before running, which means it's desirable to have as fresh artifacts from master as possible.
CCACHE_SLOPPINESS
in the Makefile to includeinclude_file_ctime,include_file_mtime
which is categorically useful and could be it's own PR, but we're here now.${{ github.ref_name }}
into the primary cache key. This is so when a PR writes to cache, it writes toccache-PR-<config>
, and master writes toccache-master-<config>
. This is important, because lastly, we setrestore-keys
for all runs to lookupccache-master-<config>-
, which means that when a PR misses it'll only restore from master caches, and not its own internal caches. Why is that important? Because:eat their childrenclear ccache after building. That way when the PR run uploads to cache, it'll upload an empty bundle with a miniscule number of bytes in it, likely avoiding evicting any useful caches. Because of therestore-keys
hack from above, this will never get 'hit' and thus will be first on the chopping block for eviction down the line, for however few bytes it'll reclaim.CCACHE_NOCOMPRESS: false
globally. This only practically matters for the macOS build which would use zstd internally, but the zstd settings we use mean that compressing the entire ccache archive at once in the actions/cache upload step will use less total gha cache space, on the order of 33% less (150->100mb). In the future we could experiment with setting this to true globally, but older ccache's on the ubuntu runners use zlib instead of zstd and will perform far worse.Describe alternatives you've considered
Testing
Additional context
The gcc11 ASAN build is still cursed. It is still suffering from 100% cache misses. I'm confident this is not related to cache size issues because the builds which upload to cache are not evicting any artifacts in a clean build.
https://en.wikipedia.org/wiki/A_Modest_Proposal