-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu CI failing pretty consistently with segfault #8194
Comments
Thanks for raising this @jrbourbeau - I'll see if I'm able to reproduce the segfaults locally to follow up here with more info |
Was able to reproduce the segfaults locally - not sure if these are coming from multiple tests, but one in particular that seems to trigger them is
Here are the associated failures/errors:
My first assumption is that we're running into some issues around UCX clean up? Can try bisecting Distributed to see if we can isolate this to a specific commit |
Able to reproduce things, but having some difficulty bisecting it to any specific cause 😕 eyeballing this successful GPU CI run from a few weeks ago, I tried rolling back to older versions of UCX/UCX-Py with Distributed 2023.7.1 and still saw the segfaults. |
I've noticed gpuCI has been failing pretty consistently (for example, this build and this build) with a segfault. Note that tests actually pass -- there must be some extra step where things are going awry
cc @charlesbluca @quasiben @dask/gpu
The text was updated successfully, but these errors were encountered: