-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] udf-examples-native case failed core dump #11842
Comments
As it failed with 24.12, cc @sameerz |
Thanks @pxLi for filing the issue. Looking into it. |
So, I am able to produce an issue locally that looks quite similar to the one reported here. The stack trace and the error message are not the exact same, but the same test fails within the same docker container as the jenkins job uses. Here is the error from my local setting. Note that the
I managed to narrow down when this error stated. The error seems to have been caused by NVIDIA/cccl#2266. With the exact same versions of cuda, cudf, spark-rapids-jni, and the plugin, the same test passes with the cccl older than the commit NVIDIA/cccl@f53e72555. But it fails with the cccl at or after that commit. Now I'm trying to reproduce the issue within a cudf c++ unit test. |
So, I tried to reproduce this issue within a cudf c++ unit test, cudf java unit test, spark-rapids-jni c++ unit test, and spark-rapids-jni java unit test. I captured the input a failed run used as a parquet file, copied over the exact source code from the examples repo, added a unit test that reads the captured parquet file and calls the native UDF. However, I was not able to reproduce it in any unit test. Based on this, I think that this is likely some problem in the examples repo, rather than cccl, cudf or the plugin. While looking at the logs, I noticed one thing. cudf, spark-rapids-jni, and spark-rapids-examples use cccl, but all different versions. Especially the spark-rapids-examples used to use |
Yes, the example is built against directly to cudf code instead of relying on jni or plugin. Thanks! we are good to close this ticket |
Describe the bug
first seen in examples-udf-examples-native run:179
https://github.com/NVIDIA/spark-rapids-examples/tree/branch-24.12/examples/UDF-Examples/RAPIDS-accelerated-UDFs
core dump: (complete file hs_err_pid177.log)
Steps/Code to reproduce bug
build and test case at: https://github.com/NVIDIA/spark-rapids-examples/blob/branch-24.12/examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md#building-and-run-the-tests-without-native-code-examples
https://github.com/NVIDIA/spark-rapids-examples/blob/branch-24.12/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/rapids_udf_test.py
Expected behavior
A clear and concise description of what you expected to happen.
Environment details (please complete the following information)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: