-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]Test simple pinned blocking alloc Failed nightly tests #10585
Comments
I can repro this and I am looking at the issue. It appears to be a race condition where a free/notify is not seen by the state machine but I don't know why. It repros pretty consistently, and adding transition logging makes it go away. |
Ok here's what I see so far, the state in In this case then |
Closes #15350. This PR changes the order of the callback `MemoryBuffer.onClosed` to happen after our `MemoryCleaner` finishes. This is done so that we can accurately, and safely, reflect the state of the memory resource (be it device or host). This PR is needed to address a bug found in spark-rapids here: NVIDIA/spark-rapids#10585. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Nghia Truong (https://github.com/ttnghia) - Gera Shegalov (https://github.com/gerashegalov) URL: #15351
Describe the bug
rapids_nightly-dev-github #1073 failed due to spark 334 unit test - simple pinned blocking alloc failing. Looks like it timed out.
The text was updated successfully, but these errors were encountered: