-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add mystery comment #588
base: master
Are you sure you want to change the base?
Conversation
@p-shahi @MarcoPolo @galargh this is such a hack - do you have any idea why the wrong file ends up in the container without this? The error it fixes is this sort of thing: https://github.com/libp2p/test-plans/actions/runs/12243585872/job/34153471788?pr=586#step:3:46235 |
Thanks for debugging this. I'll try to tal today. |
Trying a few things on a branch here: #589 |
I think the cache busting fixes this issue: https://github.com/libp2p/test-plans/actions/runs/12266838240/job/34225761983?pr=589 without needing to add the mystery comment |
Which cache are you busting? The testplans one or the docker one? |
Does this reproduce locally or only in CI? |
I'm busting the cache only for the v2.x image builds. In case the problem is happening because of cached layers between the v1.x and v2.x images |
Docker - the problem only occurs when there's an S3 cache miss and this branch is hit. I guess #589 works because setting the env var before copying the files means the cache for the subsequent steps is invalidated. FWIW I also tried the Pruning the layer cache before building also seems to fix the problem and is probably a better approach than this PR though I'm still not sure why it's necessary. It doesn't seem to affect build time much which is something at least.
Like all the best issues, I could only reproduce this in CI, locally everything works as expected 😭 |
@MarcoPolo @achingbrain since this is still causing issues in rust-libp2p and probably elsewhere, what should we do here. Merge #590 and create an issue with a reminder to investigate further? @galargh do you have any thoughts? |
Sorry I missed this earlier, I'll try to have a look later today. |
@galargh any progress? |
I'm honestly baffled by this 🤯 I wasn't able to reproduce it myself by loosely following the instructions outlined in the description. So I reverted to using exactly the same commits for The failing run I was trying to reproduce: https://github.com/libp2p/js-libp2p/actions/runs/12235431303/attempts/8 Unfortunately, now it just works. Technically, it failed, but for an unrelated reason after making it past the point that caused the problems before. The only thing that comes to mind is that somehow the context sent to the build of v1 gets mixed up with the context used during the build of v2. But how?! I don't think I've seen something like that happening before. I've merged the layer pruning fix - #590 - as it seems like a more sensible option. I'll definitely keep thinking about it, but for now I don't have any good ideas to follow to be honest. |
Not quite sure why this is necessary but it means the correct file ends up in the docker container when building from scratch.
Investigation from the Slack thread:
Ok, I’ve been digging into this and what I’ve found is that when there’s a cache miss, we build from scratch.
The v1.x and v2.x Makefiles both run
docker build
in the correct implementation directory:I have a branch in the
test-plans
repo calledfix/debug-js-build
. In the branch it does the following:test/fixtures/relay.js
file before runningdocker build
-test-plans/transport-interop/impl/js/v2.x/Makefile
Line 28 in a7ff620
test-plans/transport-interop/impl/js/v2.x/Dockerfile
Line 14 in a7ff620
If I use this branch to run the ping interop test, the
test/fixtures/relay.js
file in the container does not have the same content as the one in the context directory (seeconnectionEncrypters
vsconnectionEncryption
):docker build
- https://github.com/libp2p/js-libp2p/actions/runs/12235431303/job/34137698038#step:6:886Very strange.
The only way I’ve been able to get the
test/fixtures/relay.js
file to have the correct content is to change it’s name so the v2.x file doesn’t have the same path as the v1.x file. I only get the renamed file in the docker image, not the one with the original name.Also, if I remove the v1.x folder so only build the v2.x version, the file has the correct contents, so some cache must be getting reused between the
v1.x
andv2.x
builds, though both build from thenode:lts
image without using any other base images.I’m at a bit of a loss about how this is possible, perhaps someone with a bit more docker knowledge has some ideas?