Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add spark-rapids-jni builds to cudf CI #17337

Open
vyasr opened this issue Nov 15, 2024 · 1 comment
Open

[FEA] Add spark-rapids-jni builds to cudf CI #17337

vyasr opened this issue Nov 15, 2024 · 1 comment
Labels
feature request New feature or request

Comments

@vyasr
Copy link
Contributor

vyasr commented Nov 15, 2024

Changes to libcudf's build system lead not infrequently to breakages in the spark-rapids-jni. While not all changes in libcudf that break the Spark builds should be considered showstoppers (for example, the plugin still uses detail APIs that libcudf should continue to feel free to change without warning if needed), changes involving CMake should be scrutinized more carefully due to the very specific dance that the spark plugin's build uses to support the layers of the build they need (libcudf, the libcudfjni interface layer, and finally the spark-rapids-jni). Currently, such breaks are reported to us after the fact and then we have to go through rounds of changes in libcudf and then manual testing in Spark.

To improve this situation, we should add builds of spark-rapids-jni to cudf CI. We do not need to run tests, and we should be able to build for just a single architecture. spark-rapids-jni already has detailed instructions for containerized builds that support using a custom version of cudf that we should be able to adapt into a Github Actions job running in the same container. That should be sufficient for us to catch the majority of breaking changes that should be fixed by cudf itself. I would not suggest running the full Spark test suite since that would be quite a bit more expensive; for now, a build job alone should suffice. We should also make the build failures non-blocking errors so that it does not block CI.

@vyasr vyasr added the feature request New feature or request label Nov 15, 2024
@jlowe
Copy link
Member

jlowe commented Dec 11, 2024

NVIDIA/spark-rapids-jni#2677 should make this a bit easier, as it refactors the C++ build portion into a shell script that can be invoked separately. Here's how I see this running part of cudf precommit:

  • Build the container image from https://github.com/NVIDIA/spark-rapids-jni/blob/branch-25.02/ci/Dockerfile. Replace branch-25.02 with the current development branch. Note that this image can be updated over time, so bonus points for detecting when a new image needs to be generated.
  • Use the container for the premerge spark-rapids-jni check. Inside the container, execute the following:
  • git clone --depth 1 --branch dev-branch-here https://github.com/NVIDIA/spark-rapids-jni.git
  • cd spark-rapids-jni
  • git submodule update --init
  • Update thirdparty/cudf here to reflect the state of the cudf repo with the PR applied
  • LIBCUDF_DEPENDENCY_MODE=latest scl enable gcc-toolset-11 build/buildcpp.sh

Note that this will build for all GPU architectures supported by RAPIDS which is overkill for a premerge "does it build" check on spark-rapids-jni. To build for a specific architecture, add GPU_ARCHS=arch to the front of that last cmdline, e.g.:
GPU_ARCHS=89-real LIBCUDF_DEPENDENCY_MODE=latest scl enable gcc-toolset-11 build/buildcpp.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants