Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[github-actions] Tools unavailable due to cache failure #9280

Closed
djaglowski opened this issue Apr 14, 2022 · 6 comments · Fixed by #10365
Closed

[github-actions] Tools unavailable due to cache failure #9280

djaglowski opened this issue Apr 14, 2022 · 6 comments · Fixed by #10365
Labels
bug Something isn't working

Comments

@djaglowski
Copy link
Member

Describe the bug
All PRs began failing at some point in the last 24 hours. Failure error messages indicate that the tools installed by make install-tools cannot be found. Typically, these are installed once per workflow by the setup-environment job, and then restored in each subsequent job that uses them, via the github actions cache. The root cause has not yet been determined.

#9276 is an immediate fix for the issue, which removes caching of tools and instead installs them every time they are needed. Unfortunately, non-trivial run time is added to workflows that benefitted from caching the tools.

Ideally, the root cause should be determined and addressed, and caching of tools should be restored.

@djaglowski
Copy link
Member Author

Closing this issue with the following understanding:

Tools are cached with a hash key generated from internal/tools/go.mod. Somehow, the contents of this cache entry appear to have been corrupted, though it is not entirely clear how this happened. Any job restoring from this particular cache entry was pulling an invalid tool set.

Altering the tools module results in generating a new hash key for the tools, and therefore bypasses the corrupt cache entry.

@dmitryax
Copy link
Member

Looks like it's some rase condition happened, and a cache entry stuck in "reserved" state with no content in it. Here is a comment for a similar issue that we had: actions/cache#485 (comment)

@dmitryax
Copy link
Member

From https://github.com/open-telemetry/opentelemetry-collector-contrib/runs/6018985784?check_suite_focus=true:

Post job cleanup.
/usr/bin/tar --posix --use-compress-program zstd -T0 -cf cache.tzst -P -C /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib --files-from manifest.txt
Unable to reserve cache with key v1-tools-Linux-0630015e23e408f18211e4144b0e2ac66150cb355c478aff69c2ef78c9a698f1, another job may be creating this cache. More details: Cache already exists. Scope: refs/tags/v0.49.0, Key: v1-tools-Linux-0630015e23e408f18211e4144b0e2ac66150cb355c478aff69c2ef78c9a698f1, Version: 375cef3a094c3c6d613e96468e4969e9ab6e9964aeb75801a1567e729577160d

@djaglowski
Copy link
Member Author

This issue sheds more light on the problem. I found this after observing another instance of this problem, by digging up what appears to be the actual failure where the cache became corrupted.

Post job cleanup.
/usr/bin/tar --posix --use-compress-program zstd -T0 -cf cache.tzst -P -C /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib --files-from manifest.txt
Warning: Cache upload failed because file read failed with EBADF: bad file descriptor, read
Warning: Cache upload failed because file read failed with EBADF: bad file descriptor, read
Warning: Cache service responded with 4[2](https://github.com/open-telemetry/opentelemetry-collector-contrib/runs/6560941976?check_suite_focus=true#step:13:2)9 during upload chunk.
Warning: Cache upload failed because file read failed with EBADF: bad file descriptor, read
Warning: Cache upload failed because file read failed with EBADF: bad file descriptor, read

@djaglowski
Copy link
Member Author

Reopening this issue in hopes that the new hint may lead to an actual solution.

@djaglowski
Copy link
Member Author

@djaglowski djaglowski changed the title [github-actions] Determine why caching of tools failed and decide whether it should be readded. [github-actions] Tools unavailable due to cache failure May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants