Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Migrating cuml comms to raft::comms #7

Merged
merged 51 commits into from
Jun 3, 2020

Conversation

cjnolet
Copy link
Member

@cjnolet cjnolet commented May 7, 2020

This is based on top of initial work by @teju85 in #3. This is simply a copy (and reduction) of the current cuml comms to header-only form.

@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@cjnolet cjnolet changed the title [WIP] Migrating cuml comms to raft::comms [REVIEW] Migrating cuml comms to raft::comms May 13, 2020
cpp/include/raft/comms/comms.hpp Outdated Show resolved Hide resolved
cpp/include/raft/comms/std_comms.hpp Outdated Show resolved Hide resolved
cpp/include/raft/comms/std_comms.hpp Outdated Show resolved Hide resolved
cpp/include/raft/comms/std_comms.hpp Outdated Show resolved Hide resolved
public:
virtual ~comms_iface();

virtual int getSize() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to enable clang-tidy in CI, now that we have it ready in cuML too?

Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some very minor nitpicks. Otherwise overall the changes LGTM!

cpp/include/raft/comms/comms_helper.hpp Outdated Show resolved Hide resolved
cpp/include/raft/comms/comms_helper.hpp Outdated Show resolved Hide resolved
cpp/include/raft/comms/nccl_helper.hpp Outdated Show resolved Hide resolved
cpp/include/raft/comms/std_comms.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@Iroy30 Iroy30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been integrating this PR with cugraph. Just had a few import issues

python/raft/common/__init__.py Outdated Show resolved Hide resolved
python/raft/dask/__init__.py Outdated Show resolved Hide resolved
python/raft/dask/common/__init__.py Outdated Show resolved Hide resolved
python/raft/dask/common/comms.py Outdated Show resolved Hide resolved
@cjnolet
Copy link
Member Author

cjnolet commented May 26, 2020

@teju85 @Iroy30 I have implemented your requested changes. Mind giving this PR a re-review?

@cjnolet
Copy link
Member Author

cjnolet commented May 27, 2020

@seunghwak @afender, making the nccl type and data type size functions static fixes the GCC issue for now. I can create a new issue for us to knock out in the future so that we can get this merged soon. What do you think?

@seunghwak
Copy link
Contributor

@seunghwak @afender, making the nccl type and data type size functions static fixes the GCC issue for now. I can create a new issue for us to knock out in the future so that we can get this merged soon. What do you think?

Sounds good for me, thanks for the effort!

Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in my response. Just a couple of minor things.

@@ -67,6 +67,8 @@ find_package(CUDA 10.0 REQUIRED)

set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)

message("HELLO!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably a remnant of a debug session?

cmake_minimum_required(VERSION 3.14 FATAL_ERROR)
project(comms LANGUAGES CXX CUDA)

set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this module-path set command here, in a module?

@cjnolet
Copy link
Member Author

cjnolet commented Jun 3, 2020

@teju85 Thank you for the re-review. I've addressed the latest comments. Is this ready to be merged?

Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cjnolet . Changes LGTM.

@afender afender changed the base branch from branch-0.14 to branch-0.15 June 3, 2020 16:43
@cjnolet
Copy link
Member Author

cjnolet commented Jun 3, 2020

@dantegd, I understood you and Ishika both needed the relative Python imports. Do you have any further review feedback?

@afender
Copy link
Member

afender commented Jun 3, 2020

Synced up with Dante on Slack 👍 Ishika is going to add a PR on top of this in the next few hours/days so we can keep adding enhancements there.

Merging 🚀

@afender afender merged commit 314eb6b into rapidsai:branch-0.15 Jun 3, 2020
copy-pr-bot bot pushed a commit that referenced this pull request Oct 24, 2023
add cross products for yaml->json configs
rapids-bot bot pushed a commit that referenced this pull request Feb 15, 2024
Demangle the error stack trace provided by GCC.
Example output:
```bash
RAFT failure at file=/workspace/raft/cpp/bench/ann/src/raft/raft_ann_bench_utils.h line=127: Ooops!
Obtained 16 stack frames
#1 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::logic_error::logic_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5e [0x7fb20acce45e]
#2 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::configured_raft_resources::stream_wait(CUstream_st*) const +0x2e3 [0x7fb20acd0ac3]
#3 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::RaftIvfPQ<float, long>::search(float const*, int, int, unsigned long*, float*, CUstream_st*) const +0x63e [0x7fb20acd44fe]
#4 in ./cpp/build/ANN_BENCH: void raft::bench::ann::bench_search<float>(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective) +0xf76 [0x55853859f586]
#5 in ./cpp/build/ANN_BENCH: benchmark::internal::LambdaBenchmark<benchmark::RegisterBenchmark<void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&)::{lambda(benchmark::State&)#1}>::Run(benchmark::State&) +0x84 [0x558538548f14]
#6 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkInstance::Run(long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const +0x168 [0x5585385d6498]
#7 in ./cpp/build/ANN_BENCH(+0x149108) [0x5585385b7108]
#8 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoNIterations() +0x34f [0x5585385b8c7f]
#9 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoOneRepetition() +0x119 [0x5585385b99b9]
#10 in ./cpp/build/ANN_BENCH(+0x13afdd) [0x5585385a8fdd]
#11 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) +0x58e [0x5585385aa8fe]
#12 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks() +0x6a [0x5585385aaada]
#13 in ./cpp/build/ANN_BENCH: raft::bench::ann::run_main(int, char**) +0x11ed [0x5585385980cd]
#14 in /lib/x86_64-linux-gnu/libc.so.6(+0x28150) [0x7fb213e28150]
#15 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x89 [0x7fb213e28209]
#16 in ./cpp/build/ANN_BENCH(+0xbfcef) [0x55853852dcef]


```

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #2188
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants