-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] segfaults seen in cuDF after prefetch calls intermittently #11265
Comments
I can sometimes repro this locally. I have a hunch that it is related to this line rapidsai/cudf@e6537de#diff-51063a0e9329a153db632035df960b0626c9a464a852250778f27bedd53b0972R39. I know this line is getting executed from different threads and that we are mutating the same key without a lock. I added some prints in this area and found that behavior:
The fact that we have repeated lines means multiple threads got here. So a possibility is that we are getting unlucky in some cases and two threads race in the wrong place within STL and corrupt some memory, STL containers are not thread safe. |
confirming that a patch with a lock fixes the segfault after several iterations. I am going to PR to cuDF 24.08. |
JNI with the fix has been deployed. |
reopen for 24.10, new JNI is still building |
close as new JNI 24.10.0-SNAPSHOT with rapidsai/cudf#16425 is available in sonatype |
Describe the bug
first seen in a pre-merge run,
crashed JVM,complete core dump file hs_err_pid1671911.log
Steps/Code to reproduce bug
not always repro (intermittently, non-related to DATAGEN_SEED)
Expected behavior
pass the test
Environment details (please complete the following information)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: