Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build pyraft with scikit-build #644

Merged
merged 12 commits into from
May 17, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented May 11, 2022

This PR changes the pyraft build system to use scikit-build and CMake. It depends on the changes in #633 and should be merged after that PR.

This PR is largely complete, except that I am not yet sure exactly which libraries/headers are entirely necessary for the build. This version builds locally for me, but not all tests pass since I don't have nccl. However, I am a bit surprised to see the build succeeding without me even searching for UCX (perhaps this is because it's already on my system include path) so I need to figure out what do with UCX, NCCL, and the various CUDA math libraries (cusparse, cublas, etc).

@vyasr
Copy link
Contributor Author

vyasr commented May 11, 2022

I could use some tips on exactly what find_package, target_link_libraries, and target_include_directories should go here or in the subdirectory CMakeLists.txt files. I see previous logic for UCX as well as requests for math libs, but the package compiles without any extra logic to add those and when I import the only case that causes a linker error is NCCL AFAICT.

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 11, 2022
@vyasr vyasr force-pushed the feature/skbuild_raft branch from f8b35fe to 5478bff Compare May 14, 2022 00:48
@github-actions github-actions bot removed the cpp label May 14, 2022
@vyasr
Copy link
Contributor Author

vyasr commented May 14, 2022

I still need some help resolving the last comment (the dependent libraries) but otherwise I think we're in OK shape.

@vyasr vyasr marked this pull request as ready for review May 14, 2022 01:27
@vyasr vyasr requested review from a team as code owners May 14, 2022 01:27
@robertmaynard
Copy link
Contributor

I could use some tips on exactly what find_package, target_link_libraries, and target_include_directories should go here or in the subdirectory CMakeLists.txt files. I see previous logic for UCX as well as requests for math libs, but the package compiles without any extra logic to add those and when I import the only case that causes a linker error is NCCL AFAICT.

I think everything is correct when RAFT_COMPILE_LIBRARIES is set to OFF as no linking is required. The include paths for the CUDA toolkit will be propagated via CMake usage requirements from the raft::raft target.

When RAFT_COMPILE_LIBRARIES is ON, we can also expect that the required libraries that raft::raft requires will be brought in via usage requirements.

@cjnolet
Copy link
Member

cjnolet commented May 16, 2022

In terms of dependencies, Pyraft is going to need to build against ucx (to which the conda package has relied upon ucx-py being installed in the past) and nccl (to which the conda package has relied upon the nccl conda package being installed in the past). Pyraft will need to link against nccl but ucx is loaded through dlopen at runtime and any users of that portion of pyraft are going to be building their own code against the raft-comms APIs, which may never end up linking against UCX (cugraph is an example of this).

This package also requires building the raft handle, which is in turn going to require cudart, rmm, cusolver, cublas, and cusparse. This shouldn't require RAFT_COMPILE_LIBRARIES. I'm not sure if it will require thrust and it would be nice to remove the dependency if it's not needed.

@vyasr
Copy link
Contributor Author

vyasr commented May 16, 2022

I think everything is correct when RAFT_COMPILE_LIBRARIES is set to OFF as no linking is required. The include paths for the CUDA toolkit will be propagated via CMake usage requirements from the raft::raft target.

When RAFT_COMPILE_LIBRARIES is ON, we can also expect that the required libraries that raft::raft requires will be brought in via usage requirements.

I agree that these transitive dependencies from the libraft build should be fine, I am mostly concerned about the additional dependencies that are only required by the pyraft build, NCCL and UCX. The C++ CMakeLists.txt has options for these dependencies, but they look like no-ops to me so I'm pretty sure that they have to be handled in the pyraft build unless I'm missing a reference to these somewhere else.

In terms of dependencies, Pyraft is going to need to build against ucx (to which the conda package has relied upon ucx-py being installed in the past) and nccl (to which the conda package has relied upon the nccl conda package being installed in the past). Pyraft will need to link against nccl but ucx is loaded through dlopen at runtime and any users of that portion of pyraft are going to be building their own code against the raft-comms APIs, which may never end up linking against UCX (cugraph is an example of this).

So I currently link against NCCL using this somewhat hacky approach, which works fine for me locally. I am not sure if there is a better way to get CMake to find NCCL, and it looks like the old Python build was just relying on the libraries being available somewhere on the system path because setup.py didn't contain any NCCL-specific information. AFAICT NCCL does not provide any CMake config files for discovery, so should we just be implementing our own FindNCCL.cmake file?

Since UCX is dlopened the build contains no UCX-specific information. Locally the comms tests are all skipped for me because the import fails, but I suspect that's an issue with my local UCX installation because the comms tests are all passing on CI. Do you see anything that you would improve here? I note that the old setup.py does add UCX to include/lib paths, but I suspect that's outdated because as you say UCX is dlopened and so shouldn't be required at compile or link time.

This package also requires building the raft handle, which is in turn going to require cudart, rmm, cusolver, cublas, and cusparse. This shouldn't require RAFT_COMPILE_LIBRARIES. I'm not sure if it will require thrust and it would be nice to remove the dependency if it's not needed.

I originally missed that cusolver/cublas/cusparse were listed as interface targets in the C++ CMakeLists.txt. I had assumed that they were Python-only dependencies because they are mentioned explicitly in the setup.py script. Now that we are switching the Python build to CMake we should be able to inherit these dependencies like @robertmaynard suggested, so I don't think there's much else to do here. I'm open to trying to remove the thrust dependency, but I'd say we can do that in a follow-up PR once we get the core issues in this PR sorted out.

@cjnolet
Copy link
Member

cjnolet commented May 16, 2022

@vyasr,

I am not sure if there is a better way to get CMake to find NCCL, and it looks like the old Python build was just relying on the libraries being available somewhere on the system path because setup.py didn't contain any NCCL-specific information. AFAICT NCCL does not provide any CMake config files for discovery, so should we just be implementing our own FindNCCL.cmake file?

Will the get_nccl.cmake and get_ucx.cmake files here work for the pyraft build? They don't get explicitly linked into main C++ build because all the comms tests are in pyraft and so there's no C++ libs linking against them. Because they are required dependencies in the cmake build, they will fail, however, when they are enabled in the C++ only build but can't be found on the system. They do get enabled in the build.sh when building pyraft here, and I do admit that they are not actually doing anything in the pyraft build.

The intention here was to have them throw an error when ./build.sh is invoked without any other arguments (thus pyraft and libraft being built) and I agree that's not going to catch cases where just pyraft is being built. Thus far, I think this works in CI because it's explicitly installing the conda packages for nccl and ucx-py and those happen to be on the lib and header paths.

Since UCX is dlopened the build contains no UCX-specific information.

Thinking through this a little more deeply- ucx-py should be providing the python-based interface to ucx, so that should allow the python pieces in pylibraft to build. However, the comms tests are being wrapped through cython, which ultimately require this header to be compiled. I think since the standard include path ($CONDA_PREFIX/include?) is being added to the include path, it might just be working when ucx has been installed. The dlopen for ucx is important to be able to load the ucx symbols at runtime (which are required when running the tests as well) and I suspect that's probably working for the same reason that the lib is already available on the system linking path.

I think the get_ucx and get_nccl should be integrated into your pyraft scikit-build cmakelists, and explicitly linked. I think this is another positive benefit to your converting these builds to scikit-build!

@github-actions github-actions bot added the cpp label May 17, 2022
@vyasr
Copy link
Contributor Author

vyasr commented May 17, 2022

@cjnolet awesome, thanks for the context! Yes I think switching to scikit-build makes this much cleaner. Now the dependencies are correctly expressed: libraft does not depend on either NCCL or UCX, but pyraft does. AFAICT pyraft only requires the protocol component of UCX (UCP) to build and function correctly based on the fact that tests are passing for me locally, but let me know if I missed anything. Aside from that, I think we should be set.

@vyasr
Copy link
Contributor Author

vyasr commented May 17, 2022

I realized that since we dlopen UCX we shouldn't need it at all at link time, so I've removed that altogether. This PR is good to go from my end.

@jakirkham
Copy link
Member

What happens if UCX is not present? Does it use TCP or something?

@cjnolet
Copy link
Member

cjnolet commented May 17, 2022

@vyasr we still want to explicitly add ucx to the header path and make sure it's available. Unfortunately, right now it just happens to be available as a byproduct of ucx-py being installed in CI but we still need to compile against it to build the tests.

Copy link
Contributor

@sevagh sevagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vyasr
Copy link
Contributor Author

vyasr commented May 17, 2022

@vyasr we still want to explicitly add ucx to the header path and make sure it's available. Unfortunately, right now it just happens to be available as a byproduct of ucx-py being installed in CI but we still need to compile against it to build the tests.

Ack sorry I got excited to be able to delete code 😄 I forgot that the built Cython does in fact require these headers during the build stage. I put that back.

@cjnolet
Copy link
Member

cjnolet commented May 17, 2022

Thanks again @vyasr. This all looks great!

@cjnolet
Copy link
Member

cjnolet commented May 17, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit fd0b447 into rapidsai:branch-22.06 May 17, 2022
rapids-bot bot pushed a commit that referenced this pull request May 20, 2022
This PR fixes a number of small errors that slipped through in #644 and #633. None of them materially affect the build (whether a preexisting libraft exists or not) since 1) benchmarks are currently turned off by default anyway in the C++ build, and 2) pyraft only depends on libraft for headers, not for compiled libraries.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #662
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp gpuCI improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants