[FEA] Add spark-rapids-jni builds to cudf CI #17337

vyasr · 2024-11-15T17:34:26Z

Changes to libcudf's build system lead not infrequently to breakages in the spark-rapids-jni. While not all changes in libcudf that break the Spark builds should be considered showstoppers (for example, the plugin still uses detail APIs that libcudf should continue to feel free to change without warning if needed), changes involving CMake should be scrutinized more carefully due to the very specific dance that the spark plugin's build uses to support the layers of the build they need (libcudf, the libcudfjni interface layer, and finally the spark-rapids-jni). Currently, such breaks are reported to us after the fact and then we have to go through rounds of changes in libcudf and then manual testing in Spark.

To improve this situation, we should add builds of spark-rapids-jni to cudf CI. We do not need to run tests, and we should be able to build for just a single architecture. spark-rapids-jni already has detailed instructions for containerized builds that support using a custom version of cudf that we should be able to adapt into a Github Actions job running in the same container. That should be sufficient for us to catch the majority of breaking changes that should be fixed by cudf itself. I would not suggest running the full Spark test suite since that would be quite a bit more expensive; for now, a build job alone should suffice. We should also make the build failures non-blocking errors so that it does not block CI.

jlowe · 2024-12-11T19:30:04Z

NVIDIA/spark-rapids-jni#2677 should make this a bit easier, as it refactors the C++ build portion into a shell script that can be invoked separately. Here's how I see this running part of cudf precommit:

Build the container image from https://github.com/NVIDIA/spark-rapids-jni/blob/branch-25.02/ci/Dockerfile. Replace branch-25.02 with the current development branch. Note that this image can be updated over time, so bonus points for detecting when a new image needs to be generated.
Use the container for the premerge spark-rapids-jni check. Inside the container, execute the following:
git clone --depth 1 --branch dev-branch-here https://github.com/NVIDIA/spark-rapids-jni.git
cd spark-rapids-jni
git submodule update --init
Update thirdparty/cudf here to reflect the state of the cudf repo with the PR applied
LIBCUDF_DEPENDENCY_MODE=latest scl enable gcc-toolset-11 build/buildcpp.sh

Note that this will build for all GPU architectures supported by RAPIDS which is overkill for a premerge "does it build" check on spark-rapids-jni. To build for a specific architecture, add GPU_ARCHS=arch to the front of that last cmdline, e.g.:
GPU_ARCHS=89-real LIBCUDF_DEPENDENCY_MODE=latest scl enable gcc-toolset-11 build/buildcpp.sh

vyasr added the feature request New feature or request label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add spark-rapids-jni builds to cudf CI #17337

[FEA] Add spark-rapids-jni builds to cudf CI #17337

vyasr commented Nov 15, 2024

jlowe commented Dec 11, 2024 •

edited

Loading

[FEA] Add spark-rapids-jni builds to cudf CI #17337

[FEA] Add spark-rapids-jni builds to cudf CI #17337

Comments

vyasr commented Nov 15, 2024

jlowe commented Dec 11, 2024 • edited Loading

jlowe commented Dec 11, 2024 •

edited

Loading