Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore Maven cache from default branch #14882

Merged
merged 1 commit into from
Nov 10, 2022

Conversation

nineinchnick
Copy link
Member

Description

Allow using cache from the default branch even when there are changes in
any pom files.

Non-technical explanation

n/a

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@nineinchnick
Copy link
Member Author

nineinchnick commented Nov 3, 2022

Using actions/cache explicitly instead of through actions/setup-java has the benefit of a bit more verbose output:
image

Allow to use cache from the default branch even when there are changes in
any pom files.
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the perceived benefit?

The number of PRs where pom changes is very low. And with this change those PRs will end up contacting Maven since the cache won't be "complete".

Note that subsequent CI runs for such PRs still uses the cache created from first run as long as pom.xml didn't change - so cache reuse already happens without this change.

@nineinchnick
Copy link
Member Author

nineinchnick commented Nov 10, 2022

The benefit is using a cache in all workflow runs. Right now, if you have a PR that modifies any pom file, it won't restore any cache and will download all dependencies. The same will happen after merging such PR to master. With this change, a cache will always be restored. In PRs that actually change any dependencies, only very few of them will need to be downloaded, if they're missing in the cache.

The drawback is that new cache entries can be created with a few outdated dependencies, so they'll be growing, but we'll have to see how much.

Note that PRs that change any poms are not that rare. We currently have 3 cache entries for the master branch, when a new one gets created every time we merge a PR that changes any pom files. After running git log --all -- '*/pom.xml' I see a few commits that modify pom files in the last 10 days or so.

And with this change those PRs will end up contacting Maven since the cache won't be "complete".

This is the reverse of what you described.

Note that subsequent CI runs for such PRs still uses the cache created from first run as long as pom.xml didn't change - so cache reuse already happens without this change.

If there is a subsequent CI run at all. And such cache entry won't be reused after merging to master.

@hashhar
Copy link
Member

hashhar commented Nov 10, 2022

git log --all -- '*/pom.xml'

That goes across all branches and includes more than you'd actually want. Try git log --oneline --since='1 year ago' -- '*/pom.xml' | grep -v '\[maven-release-plugin\]' | wc -l which gives 260 commits out of 4638 in last year. that's 5%.

From what I understand of how the cache is working:

  • Caches created on PRs cannot be used by other branches
  • Caches from master can be used by everyone
  • Which cache gets used is based on hash of all pom files

Which leads to following observations:

Currently:

  • Non-pom PRs: 100% hit to cache from master branch
  • POM PRs: 1 cache miss until a cache is created
  • PR merge: Newly created PRs have 100% cache miss until cache is created. Then goes back to 100% hit rate.

Proposal:

  • Non-pom PRs: 100% hit to cache from master branch
  • POM PRs: 100% hit to cache from master + Maven central download + new cache upload
  • PR merge: Newly created PR use stale cache. Once new cache is available it uses fresh cache with 100% hit

Proposal improves cache hit rate at expense of ever-inflating cache + stale cache usage.
Current impl. sacrifices hit rate at expense of smaller consistent caches.

I'm open to merging this and seeing the effect in practice but we need to keep an eye on how cache is behaving and how much time we start spending downloading and creating larger caches.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good. Let's see if it helps improve things further. We can always revisit this.

@hashhar hashhar merged commit 6523e85 into trinodb:master Nov 10, 2022
@nineinchnick nineinchnick deleted the ci-restore-cache branch November 10, 2022 13:43
@nineinchnick
Copy link
Member Author

I hope this helps with failures like these: https://github.com/trinodb/trino/actions/runs/3435555610/jobs/5728050172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants