-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Migrating cuml comms to raft::comms #7
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
…-comms Conflicts: cpp/include/raft/handle.hpp
…-comms Conflicts: cpp/include/raft/handle.hpp
cpp/include/raft/comms/comms.hpp
Outdated
public: | ||
virtual ~comms_iface(); | ||
|
||
virtual int getSize() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to enable clang-tidy in CI, now that we have it ready in cuML too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some very minor nitpicks. Otherwise overall the changes LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been integrating this PR with cugraph. Just had a few import issues
@seunghwak @afender, making the nccl type and data type size functions static fixes the GCC issue for now. I can create a new issue for us to knock out in the future so that we can get this merged soon. What do you think? |
Sounds good for me, thanks for the effort! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in my response. Just a couple of minor things.
cpp/CMakeLists.txt
Outdated
@@ -67,6 +67,8 @@ find_package(CUDA 10.0 REQUIRED) | |||
|
|||
set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake) | |||
|
|||
message("HELLO!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably a remnant of a debug session?
cpp/cmake/comms.cmake
Outdated
cmake_minimum_required(VERSION 3.14 FATAL_ERROR) | ||
project(comms LANGUAGES CXX CUDA) | ||
|
||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this module-path set command here, in a module?
@teju85 Thank you for the re-review. I've addressed the latest comments. Is this ready to be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cjnolet . Changes LGTM.
@dantegd, I understood you and Ishika both needed the relative Python imports. Do you have any further review feedback? |
Synced up with Dante on Slack 👍 Ishika is going to add a PR on top of this in the next few hours/days so we can keep adding enhancements there. Merging 🚀 |
add cross products for yaml->json configs
Demangle the error stack trace provided by GCC. Example output: ```bash RAFT failure at file=/workspace/raft/cpp/bench/ann/src/raft/raft_ann_bench_utils.h line=127: Ooops! Obtained 16 stack frames #1 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::logic_error::logic_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5e [0x7fb20acce45e] #2 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::configured_raft_resources::stream_wait(CUstream_st*) const +0x2e3 [0x7fb20acd0ac3] #3 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::RaftIvfPQ<float, long>::search(float const*, int, int, unsigned long*, float*, CUstream_st*) const +0x63e [0x7fb20acd44fe] #4 in ./cpp/build/ANN_BENCH: void raft::bench::ann::bench_search<float>(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective) +0xf76 [0x55853859f586] #5 in ./cpp/build/ANN_BENCH: benchmark::internal::LambdaBenchmark<benchmark::RegisterBenchmark<void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&)::{lambda(benchmark::State&)#1}>::Run(benchmark::State&) +0x84 [0x558538548f14] #6 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkInstance::Run(long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const +0x168 [0x5585385d6498] #7 in ./cpp/build/ANN_BENCH(+0x149108) [0x5585385b7108] #8 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoNIterations() +0x34f [0x5585385b8c7f] #9 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoOneRepetition() +0x119 [0x5585385b99b9] #10 in ./cpp/build/ANN_BENCH(+0x13afdd) [0x5585385a8fdd] #11 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) +0x58e [0x5585385aa8fe] #12 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks() +0x6a [0x5585385aaada] #13 in ./cpp/build/ANN_BENCH: raft::bench::ann::run_main(int, char**) +0x11ed [0x5585385980cd] #14 in /lib/x86_64-linux-gnu/libc.so.6(+0x28150) [0x7fb213e28150] #15 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x89 [0x7fb213e28209] #16 in ./cpp/build/ANN_BENCH(+0xbfcef) [0x55853852dcef] ``` Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #2188
This is based on top of initial work by @teju85 in #3. This is simply a copy (and reduction) of the current cuml comms to header-only form.