-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread 'tokio-runtime-worker' panicked #1566
Comments
@cryevecry thank you for opening the issue. Others have reported the same issue with version |
@MarcusSorealheis Good news! |
@cryevecry Just to make sure that this is the same issue as #1565, could you check whether this image (as the cas image) fixes it?
Initially this looks like a new issue to me. Could you also post the nativelink config that triggers this? |
@aaronmondal Okay, I'll try with this image and I'll write back to you. config:
|
@aaronmondal Alas, freezes are still present with this image. ghcr.io/tracemachina/nativelink:7zwvd8wsbsyfa4nnam8aq0953vbcgzqp The nativelink logs only show this
|
Ah so it's a precision issue, likely in the compression store. Will look into creating a reproducer and fix. |
`65535` is the largest value an unsigned 16-bit integer can hold. Very
curious to learn more here because we have not seen this issue with
anyone else to my knowledge.
…On Tue, Feb 4, 2025 at 08:42 Aaron Siddhartha Mondal < ***@***.***> wrote:
Ah so it's a precision issue, likely in the compression store. Will look
into a reproducer and fix.
—
Reply to this email directly, view it on GitHub
<#1566 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR6TSGV3RXC4DD5SE65JMD2ODUZFAVCNFSM6AAAAABVZM6O6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZUGUYTGNBWGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @cryevecry, I'm looking into this issue and I am having a really difficult time trying to getting it to reproduce. For a while, I thought the issue was related to this line:
But I'm starting to question this now, as I'm pretty sure it is impossible for that line to ever fail Given this, I'm starting to think maybe the issue is somewhere else, but the code was inlined so the stack trace might be missleading. Would it be possible to run with I tested your exact config on a few bazel invocations that are very substance, but it doesn't trigger it. I'm thinking maybe Thanks! |
@allada Hi!
If this is not enough, we can run the nativelink debugging image. |
@cryevecry We haven't been able to figure this out so far. Looks like we'll need to do this with debug symbols. We initially thought this was an issue in the compression store, but haven't been able to reproduce. We're now thinking that this might be caused by data corruption in the filesystem store which then causes unintended behavior in the bytestream server, but we're also not certain of that. Could you try this image? That should give us a better stack trace.
(was built here: https://github.com/aaronmondal/nativelink/actions/runs/13189047660/job/36817988723) |
|
When using the cache via reclient, we encounter hanging builds.
The logs contain the following:
Version v0.5.3
Nativelink started in the docker of the VM
The text was updated successfully, but these errors were encountered: