Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intermittent failures coming from buildkit parallel builds #37403

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Feb 13, 2024

The PROD cachie build often fails (especially in v2-8-test) when it tries to rebuild the PROD cache in parallel on ARM. There is some weird inter-buildx problem with it and some people experience it sometimes as documented in moby/buildkit#2367

Instead of finding the root cause, we change this specific job in CI to run sequentially. This changes the time it will take to update cache - but not as much as 4x because the builds do not parallelise very well anyway. Likely instead of 8m we will get maybe total 15m, and since this is just cache update after everything else has completed, it does not matter too much if it runs a little longer.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

The PROD cachie build often fails (especially in v2-8-test) when it
tries to rebuild the PROD cache in parallel on ARM. There is some weird
inter-buildx problem with it and some people experience it sometimes
as documented in moby/buildkit#2367

Instead of finding the root cause, we change this specific job in
CI to run sequentially. This changes the time it will take to update
cache - but not as much as 4x because the builds do not parallelise
very well anyway. Likely instead of 8m we will get maybe total 15m,
and since this is just cache update after everything else has
completed, it does not matter too much if it runs a little longer.
@potiuk potiuk merged commit 7a1c600 into apache:main Feb 13, 2024
84 checks passed
@potiuk potiuk deleted the fix-intermittent-failures-for-parallel-prod-cache-build branch February 13, 2024 22:34
potiuk added a commit that referenced this pull request Feb 13, 2024
The PROD cachie build often fails (especially in v2-8-test) when it
tries to rebuild the PROD cache in parallel on ARM. There is some weird
inter-buildx problem with it and some people experience it sometimes
as documented in moby/buildkit#2367

Instead of finding the root cause, we change this specific job in
CI to run sequentially. This changes the time it will take to update
cache - but not as much as 4x because the builds do not parallelise
very well anyway. Likely instead of 8m we will get maybe total 15m,
and since this is just cache update after everything else has
completed, it does not matter too much if it runs a little longer.

(cherry picked from commit 7a1c600)
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
The PROD cachie build often fails (especially in v2-8-test) when it
tries to rebuild the PROD cache in parallel on ARM. There is some weird
inter-buildx problem with it and some people experience it sometimes
as documented in moby/buildkit#2367

Instead of finding the root cause, we change this specific job in
CI to run sequentially. This changes the time it will take to update
cache - but not as much as 4x because the builds do not parallelise
very well anyway. Likely instead of 8m we will get maybe total 15m,
and since this is just cache update after everything else has
completed, it does not matter too much if it runs a little longer.

(cherry picked from commit 7a1c600)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants