Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] add logger in cuML C++ #1867

Merged
merged 138 commits into from
Apr 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
138 commits
Select commit Hold shift + click to select a range
3e2d0ea
initial commit to include spdlog in cuML
teju85 Mar 13, 2020
2761bd0
removed unnecessary/unused variables from main cmakelists
teju85 Mar 13, 2020
7d18768
initial version of cuml logger class based on spdlog
teju85 Mar 13, 2020
4c31d0e
added logger source to cmake build. Fixed compilation issues
teju85 Mar 13, 2020
bb40a80
call Logger::setPattern instead
teju85 Mar 13, 2020
0e9cf45
doxygen updates
teju85 Mar 13, 2020
a64a7c8
more doxygen updates
teju85 Mar 13, 2020
0bbc1ed
fixed a missing close comment markers
teju85 Mar 13, 2020
ea169a3
added unit-tests for logger class
teju85 Mar 13, 2020
50dab4e
clang format fixes
teju85 Mar 13, 2020
b8ba79a
updated dbscan.h to use logger instead of std::cout
teju85 Mar 13, 2020
4dc0ab6
updated copyright year
teju85 Mar 13, 2020
9345730
updated dbscan/runner.h to use logger instead of std::cout
teju85 Mar 13, 2020
b7af3b0
fixed compiler warnings in dbscan/runner.h too
teju85 Mar 13, 2020
1339a92
updated dbscan unit-tests to use logger
teju85 Mar 14, 2020
4f8de4f
copyright year update
teju85 Mar 14, 2020
ed33a93
logging support in umap unit-tests
teju85 Mar 14, 2020
dda195a
copyright year update
teju85 Mar 14, 2020
2d929bf
logging support in tsne unit-tests
teju85 Mar 14, 2020
3bcba24
copyright year update
teju85 Mar 14, 2020
dc6b3be
logging support in holtwinter unit-tests
teju85 Mar 14, 2020
562f874
copyright year update
teju85 Mar 14, 2020
e1c2f12
updated SVM code to use the logger API
teju85 Mar 14, 2020
4dfcbd0
clang-format fixes
teju85 Mar 14, 2020
03f9697
updated tsne to use logger
teju85 Mar 15, 2020
de4b6c4
copyright year update
teju85 Mar 15, 2020
5934240
updated tsne/utils.h to also use cuml logger
teju85 Mar 15, 2020
903e09e
copyright year update
teju85 Mar 15, 2020
271a48c
logic to get the current pattern in order to provide an ability to te…
teju85 Mar 15, 2020
05a13a5
updated memory.cuh to use logger
teju85 Mar 15, 2020
0c922a7
fixed a typo in the logging statement
teju85 Mar 15, 2020
6cf771a
copyright year update
teju85 Mar 15, 2020
08a5f8c
clang format fixes
teju85 Mar 15, 2020
0a1f19d
updated one of decisiontree headers to use logger
teju85 Mar 15, 2020
a9adc4b
updated decisiontree.cu to use logger
teju85 Mar 15, 2020
f32eac6
clang format fixes
teju85 Mar 15, 2020
9a30921
updated all of decisiontree source files to use logger
teju85 Mar 15, 2020
4b74a9d
fixed a typo in previous commit
teju85 Mar 15, 2020
0779b4d
clang format fixes
teju85 Mar 15, 2020
7745437
used a RAII-based pattern setter class to automatically set patterns …
teju85 Mar 15, 2020
5477a9f
made a commented cout statement to a debug print
teju85 Mar 15, 2020
2cd8345
updated knn to use logger
teju85 Mar 15, 2020
39a4884
updated doxygen for knn header
teju85 Mar 15, 2020
ea10127
updated RF impl to use logger
teju85 Mar 15, 2020
52427df
clang format fixes
teju85 Mar 15, 2020
a53e8b3
updated RF code to use logger
teju85 Mar 15, 2020
3fe407a
fixed compiler warnings on the THROW statement in svm (also fixes #1070)
teju85 Mar 15, 2020
ca817da
updated QN to use logger
teju85 Mar 15, 2020
1882ae6
set logger verbosity level for OWL-QN as well
teju85 Mar 15, 2020
0a11c6b
used stringstream to collect matrix info to be logged
teju85 Mar 15, 2020
b83a3e3
updated simplicial set implementation to use logger
teju85 Mar 15, 2020
ac75b0e
clang format updates
teju85 Mar 15, 2020
997b573
updated umap optimize.h to use logger
teju85 Mar 16, 2020
4bebbca
updated umap.cu to use logger
teju85 Mar 16, 2020
d4d7174
removed unnecessary #includes
teju85 Mar 16, 2020
2bcefae
updated umap supervised.h to use logger
teju85 Mar 16, 2020
51fe91c
final update to umap algo to use logger
teju85 Mar 16, 2020
8ab5da4
updated kmeans LOG method to use cuml logger internally
teju85 Mar 16, 2020
0d50f98
updated simple_mat.h to print the matrix into an ostream
teju85 Mar 16, 2020
1ae9cac
updated all log messages to remove trailing newline as that will be a…
teju85 Mar 16, 2020
d9e307e
clang format fixes
teju85 Mar 16, 2020
0287107
updated tsne test to not be verbose
teju85 Mar 16, 2020
835c6c2
increase verbosity only for positive verbose levels
teju85 Mar 16, 2020
f641136
clang format fixes
teju85 Mar 16, 2020
abd2122
Merge branch 'branch-0.13' of https://github.com/rapidsai/cuml into f…
teju85 Mar 16, 2020
a2de675
clang format fixes
teju85 Mar 16, 2020
79bf811
updated kmeans_test to use logger
teju85 Mar 16, 2020
adb17a0
using cuml logger for CUDA_CHECK_NO_THROW macro
teju85 Mar 16, 2020
d7287f2
removed todo comment on CUDA_CHECK_NO_THROW macro
teju85 Mar 16, 2020
dfe1aae
use snprintf instead of sprintf
teju85 Mar 16, 2020
1a5cf71
added CUBLAS_CHECK_NO_THROW issue #229
teju85 Mar 16, 2020
0ec9a02
added CUSOLVER_CHECK_NO_THROW issue #229
teju85 Mar 16, 2020
7819236
fixed a typo in the cusolver check macro
teju85 Mar 16, 2020
b628323
clang format fixes
teju85 Mar 16, 2020
bbf3200
added CUSPARSE_CHECK_NO_THROW issue #229
teju85 Mar 16, 2020
0a7cc9d
updated doxygen docs for cusparse_wrappers
teju85 Mar 16, 2020
3713b2b
removed 2 copies of cusparse_wrappers.h and updated the one inside sr…
teju85 Mar 16, 2020
34129cb
updated rproj.hxx use the updated cusparse_wrappers.h (issue #1691)
teju85 Mar 16, 2020
a385d60
added inline declaration to avoid linker multiple definition errors
teju85 Mar 16, 2020
0f903d4
clang format fixes
teju85 Mar 16, 2020
d06a063
updated benchmark event timer class dtor to use NO_THROW macros
teju85 Mar 16, 2020
faa6237
fixed linker error with prims benchmark
teju85 Mar 16, 2020
6fc08ef
clang format fixes
teju85 Mar 16, 2020
9e7bc57
updated cublas_wrappers.h to be included from a .cpp file (issue #239)
teju85 Mar 16, 2020
2f2c8cc
updated cusolver_wrappers.h to be included from a .cpp file (issue #239)
teju85 Mar 16, 2020
0ff001e
updated cusparse_wrappers.h to be included from a .cpp file (issue #239)
teju85 Mar 16, 2020
8fa97f2
clang format fixes
teju85 Mar 16, 2020
bfd8c85
use of NO_CHECK macros in the dtor of cuml_allocator
teju85 Mar 16, 2020
d5e8991
updated cuml_api to use NO_CHECK macros
teju85 Mar 16, 2020
ff65fa8
updated rmm allocator adapter to use _NO_CHECK macros
teju85 Mar 16, 2020
bcdef36
clang format fixes
teju85 Mar 16, 2020
5b45e69
added logic to use logger for NO_THROW macros in cuml-comms std and m…
teju85 Mar 16, 2020
77fcd2b
clang format fixes
teju85 Mar 16, 2020
e22b29d
removed outdated todo comments
teju85 Mar 18, 2020
0fdd1d1
Merge branch 'branch-0.13' of https://github.com/rapidsai/cuml into f…
teju85 Mar 18, 2020
1828ce5
Merge pull request #1941 from dantegd/014-fix-ci-cpp11
teju85 Mar 28, 2020
81ee2bb
Merge branch 'branch-0.14' of https://github.com/rapidsai/cuml into f…
teju85 Mar 28, 2020
832445c
update changelog
teju85 Mar 28, 2020
7e04b0e
clang format fixes
teju85 Mar 28, 2020
911c926
made prims bench to wait for spdlog repo setup
teju85 Mar 28, 2020
1c0a063
made prims unit-tests to wait for spdlog repo setup
teju85 Mar 28, 2020
b5c8c25
[skip ci] updated c++ dev guide with logging info
teju85 Mar 29, 2020
0647b1e
[skip ci] updated c++ dev guide with info on how to change log pattern
teju85 Mar 29, 2020
927c230
[skip ci] updated c++ dev guide with info on how to temporarily set l…
teju85 Mar 29, 2020
c691199
[skip ci] updated c++ dev guide with a section on logging tips
teju85 Mar 29, 2020
7565979
Merge branch 'branch-0.14' of https://github.com/rapidsai/cuml into f…
teju85 Apr 7, 2020
f96f979
ENH moved spdlog dependency declaration inside Dependencies.cmake
teju85 Apr 7, 2020
62001c8
BUG removed accidental duplication of cuml comms test cpp file
teju85 Apr 7, 2020
a4c42f1
BUG fixed prims unit-test linker issue
teju85 Apr 7, 2020
4bf7709
Merge branch 'branch-0.14' of https://github.com/rapidsai/cuml into f…
teju85 Apr 9, 2020
e2b481b
DOC updated the todo comment in Logger with the issue id
teju85 Apr 9, 2020
0e526df
DOC fixed a typo in qn_util.h doxygen comment
teju85 Apr 9, 2020
940e6c6
ENH converted unit-test debug messages to DEBUG level
teju85 Apr 9, 2020
822206f
BUG fixed a typo with logger macro
teju85 Apr 9, 2020
6cbc6ce
BUG fixed a typo with logger macro
teju85 Apr 9, 2020
2ac658a
ENH reduced CUML_ACTIVE_LEVEL to DEBUG
teju85 Apr 9, 2020
0c36895
DOC updated dev-guide with CUML_ACTIVE_LEVEL change
teju85 Apr 9, 2020
ca332b1
BUG fixed compilation errors with DEBUG and TRACE log macros
teju85 Apr 9, 2020
d5a89e6
BUG split the file:line printing in TRACE/DEBUG messages with the act…
teju85 Apr 9, 2020
a3e11f5
ENH updated kmeans unit-tests to use DEBUG level messages
teju85 Apr 9, 2020
98c6cc4
ENH updated tsne unit-tests to use DEBUG level messages
teju85 Apr 9, 2020
f6c1d8b
ENH updated umap unit-tests to use DEBUG level messages
teju85 Apr 9, 2020
f7074fa
ENH updated ucp_helper.h to use cuml logger
teju85 Apr 9, 2020
8a66267
ENH replaced all printf/cout in std-comms with cuml logger
teju85 Apr 9, 2020
486e7ad
BUG clang format fixes
teju85 Apr 9, 2020
ec0d319
Merge branch 'branch-0.14' of https://github.com/rapidsai/cuml into f…
teju85 Apr 10, 2020
193d439
FEA added a new method to logger class for checking logging status at…
teju85 Apr 10, 2020
a2fcd48
ENH upgraded boolean verbose to integer levels for a more fine graine…
teju85 Apr 10, 2020
4c8d31c
DOC fixed doxygen errors
teju85 Apr 10, 2020
db0a0e5
BUG clang format fixes
teju85 Apr 10, 2020
92ed1ee
ENH updated python wrapper dbscan to pass an integer log-level
teju85 Apr 10, 2020
f935a17
ENH made log messages in dbscan to be DEBUG level
teju85 Apr 10, 2020
1d4203f
ENH updated tsne to use integer verbosity
teju85 Apr 10, 2020
fe48bdf
DOC removed duplicate doxygen comments
teju85 Apr 10, 2020
9d28f71
ENH removed verbose flag in internals of tsne
teju85 Apr 10, 2020
2dd9478
ENH default log messages in tsne updated to DEBUG types
teju85 Apr 10, 2020
7685fc0
ENH updated tsne cython wrapper to use integer verbosity
teju85 Apr 10, 2020
e47724a
DOC clang format fixes
teju85 Apr 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# cuML 0.14.0 (Date TBD)

## New Features
- PR #1867: C++: add logging interface support in cuML based spdlog
- PR #1906: UMAP MNMG

## Improvements
Expand Down
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ set(CUML_INCLUDE_DIRECTORIES
${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
${CUTLASS_DIR}/src/cutlass
${CUB_DIR}/src/cub
${SPDLOG_DIR}/src/spdlog/include
${TREELITE_DIR}/include
${TREELITE_DIR}/include/fmt)

Expand All @@ -264,6 +265,7 @@ if(BUILD_CUML_CPP_LIBRARY)
src/common/cumlHandle.cpp
src/common/cuml_api.cpp
src/common/cuML_comms_impl.cpp
src/common/logger.cpp
src/common/nvtx.cu
src/comms/cuML_comms_test.cpp
src/datasets/make_blobs.cu
Expand Down
6 changes: 4 additions & 2 deletions cpp/bench/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,16 @@ if(BUILD_CUML_PRIMS_BENCH)
prims/distance_l1.cu
prims/distance_unexp_l2.cu
prims/fused_l2_nn.cu
prims/main.cpp
prims/map_then_reduce.cu
prims/matrix_vector_op.cu
prims/permute.cu
prims/reduce.cu
prims/rng.cu
prims/main.cpp)
../src/common/logger.cpp # because prims is header only!
)

add_dependencies(prims_benchmark cutlass)
add_dependencies(prims_benchmark spdlog)

target_link_libraries(prims_benchmark benchmarklib)

Expand Down
13 changes: 7 additions & 6 deletions cpp/bench/prims/benchmark.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019, NVIDIA CORPORATION.
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -20,6 +20,7 @@
#include <cuda_runtime.h>
#include <cuda_utils.h>
#include <utils.h>
#include <cuml/common/logger.hpp>
#include <sstream>
#include <string>
#include <vector>
Expand Down Expand Up @@ -123,13 +124,13 @@ struct CudaEventTimer {
* value given by `cudaEventElapsedTime()`.
*/
~CudaEventTimer() {
CUDA_CHECK(cudaEventRecord(stop, stream));
CUDA_CHECK(cudaEventSynchronize(stop));
CUDA_CHECK_NO_THROW(cudaEventRecord(stop, stream));
CUDA_CHECK_NO_THROW(cudaEventSynchronize(stop));
float milliseconds = 0.0f;
CUDA_CHECK(cudaEventElapsedTime(&milliseconds, start, stop));
CUDA_CHECK_NO_THROW(cudaEventElapsedTime(&milliseconds, start, stop));
state->SetIterationTime(milliseconds / 1000.f);
CUDA_CHECK(cudaEventDestroy(start));
CUDA_CHECK(cudaEventDestroy(stop));
CUDA_CHECK_NO_THROW(cudaEventDestroy(start));
CUDA_CHECK_NO_THROW(cudaEventDestroy(stop));
}

private:
Expand Down
13 changes: 7 additions & 6 deletions cpp/bench/sg/benchmark.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019, NVIDIA CORPORATION.
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -19,6 +19,7 @@
#include <benchmark/benchmark.h>
#include <cuda_runtime.h>
#include <utils.h>
#include <cuml/common/logger.hpp>
#include <cuml/cuml.hpp>
#include <sstream>
#include <vector>
Expand Down Expand Up @@ -207,13 +208,13 @@ struct CudaEventTimer {
* value given by `cudaEventElapsedTime()`.
*/
~CudaEventTimer() {
CUDA_CHECK(cudaEventRecord(stop, stream));
CUDA_CHECK(cudaEventSynchronize(stop));
CUDA_CHECK_NO_THROW(cudaEventRecord(stop, stream));
CUDA_CHECK_NO_THROW(cudaEventSynchronize(stop));
float milliseconds = 0.0f;
CUDA_CHECK(cudaEventElapsedTime(&milliseconds, start, stop));
CUDA_CHECK_NO_THROW(cudaEventElapsedTime(&milliseconds, start, stop));
state->SetIterationTime(milliseconds / 1000.f);
CUDA_CHECK(cudaEventDestroy(start));
CUDA_CHECK(cudaEventDestroy(stop));
CUDA_CHECK_NO_THROW(cudaEventDestroy(start));
CUDA_CHECK_NO_THROW(cudaEventDestroy(stop));
}

private:
Expand Down
16 changes: 15 additions & 1 deletion cpp/cmake/Dependencies.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,19 @@ ExternalProject_Add(cutlass
BUILD_COMMAND ""
INSTALL_COMMAND "")

##############################################################################
# - spdlog -------------------------------------------------------------------

set(SPDLOG_DIR ${CMAKE_CURRENT_BINARY_DIR}/spdlog CACHE STRING
"Path to spdlog install directory")
ExternalProject_Add(spdlog
GIT_REPOSITORY https://github.com/gabime/spdlog.git
GIT_TAG v1.x
PREFIX ${SPDLOG_DIR}
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND "")

##############################################################################
# - faiss --------------------------------------------------------------------

Expand Down Expand Up @@ -155,7 +168,8 @@ set_property(TARGET benchmarklib PROPERTY
# This allows the cloning to happen sequentially, enhancing the printing at
# compile time, helping significantly to troubleshoot build issues.
add_dependencies(cutlass cub)
add_dependencies(faiss cutlass)
add_dependencies(spdlog cutlass)
add_dependencies(faiss spdlog)
add_dependencies(faisslib faiss)
add_dependencies(treelite faiss)
add_dependencies(googletest treelite)
Expand Down
41 changes: 18 additions & 23 deletions cpp/comms/mpi/src/cuML_comms_mpi_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

#include <common/cumlHandle.hpp>
#include <cuML_comms.hpp>
#include <cuml/common/logger.hpp>

#include <utils.h>

Expand All @@ -36,19 +37,16 @@
} \
} while (0)

//@todo adapt logging infrastructure for MPI_CHECK_NO_THROW once available:
//https://github.com/rapidsai/cuml/issues/100
#define MPI_CHECK_NO_THROW(call) \
do { \
int status = call; \
if (MPI_SUCCESS != status) { \
int mpi_error_string_lenght = 0; \
char mpi_error_string[MPI_MAX_ERROR_STRING]; \
MPI_Error_string(status, mpi_error_string, &mpi_error_string_lenght); \
std::fprintf(stderr, \
"ERROR: MPI call='%s' at file=%s line=%d failed with %s ", \
#call, __FILE__, __LINE__, mpi_error_string); \
} \
#define MPI_CHECK_NO_THROW(call) \
do { \
int status = call; \
if (MPI_SUCCESS != status) { \
int mpi_error_string_lenght = 0; \
char mpi_error_string[MPI_MAX_ERROR_STRING]; \
MPI_Error_string(status, mpi_error_string, &mpi_error_string_lenght); \
CUML_LOG_ERROR("MPI call='%s' at file=%s line=%d failed with %s ", \
#call, __FILE__, __LINE__, mpi_error_string); \
} \
} while (0)

#define NCCL_CHECK(call) \
Expand All @@ -58,16 +56,13 @@
ncclGetErrorString(status)); \
} while (0)

//@todo adapt logging infrastructure for NCCL_CHECK_NO_THROW once available:
//https://github.com/rapidsai/cuml/issues/100
#define NCCL_CHECK_NO_THROW(call) \
do { \
ncclResult_t status = call; \
if (ncclSuccess != status) { \
std::fprintf(stderr, \
"ERROR: NCCL call='%s' at file=%s line=%d failed with %s ", \
#call, __FILE__, __LINE__, ncclGetErrorString(status)); \
} \
#define NCCL_CHECK_NO_THROW(call) \
do { \
ncclResult_t status = call; \
if (status != ncclSuccess) { \
CUML_LOG_ERROR("NCCL call='%s' failed. Reason:%s\n", #call, \
ncclGetErrorString(status)); \
} \
} while (0)

namespace ML {
Expand Down
53 changes: 24 additions & 29 deletions cpp/comms/std/src/cuML_std_comms_impl.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019, NVIDIA CORPORATION.
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -39,6 +39,9 @@ constexpr bool UCX_ENABLED = false;
#include <cuML_comms.hpp>
#include <exception>
#include <memory>

teju85 marked this conversation as resolved.
Show resolved Hide resolved
#include <cuml/common/logger.hpp>

#include <thread>

#include <cuda_runtime.h>
Expand All @@ -52,16 +55,13 @@ constexpr bool UCX_ENABLED = false;
ncclGetErrorString(status)); \
} while (0)

//@todo adapt logging infrastructure for NCCL_CHECK_NO_THROW once available:
//https://github.com/rapidsai/cuml/issues/100
#define NCCL_CHECK_NO_THROW(call) \
do { \
ncclResult_t status = call; \
if (ncclSuccess != status) { \
std::fprintf(stderr, \
"ERROR: NCCL call='%s' at file=%s line=%d failed with %s ", \
#call, __FILE__, __LINE__, ncclGetErrorString(status)); \
} \
#define NCCL_CHECK_NO_THROW(call) \
do { \
ncclResult_t status = call; \
if (status != ncclSuccess) { \
CUML_LOG_ERROR("NCCL call='%s' failed. Reason:%s\n", #call, \
ncclGetErrorString(status)); \
} \
} while (0)

namespace ML {
Expand Down Expand Up @@ -296,11 +296,10 @@ void cumlStdCommunicator_impl::isend(const void *buf, int size, int dest,
ucp_isend((struct comms_ucp_handle *)_ucp_handle, ep_ptr, buf, size, tag,
default_tag_mask, getRank(), _verbose);

if (_verbose) {
std::cout << getRank() << ": Created send request [id=" << *request
<< ", ptr= " << ucp_req->req << ", to=" << dest
<< ", ep=" << ep_ptr << "]" << std::endl;
}
CUML_LOG_DEBUG(
"%d: Created send request [id=%llu], ptr=%llu, to=%llu, ep=%llu", getRank(),
(unsigned long long)*request, (unsigned long long)ucp_req->req,
(unsigned long long)dest, (unsigned long long)ep_ptr);

_requests_in_flight.insert(std::make_pair(*request, ucp_req));
#endif
Expand Down Expand Up @@ -328,11 +327,10 @@ void cumlStdCommunicator_impl::irecv(void *buf, int size, int source, int tag,
ucp_irecv((struct comms_ucp_handle *)_ucp_handle, _ucp_worker, ep_ptr, buf,
size, tag, tag_mask, source, _verbose);

if (_verbose) {
std::cout << getRank() << ": Created receive request [id=" << *request
<< ", ptr=" << ucp_req->req << ", from=" << source
<< "ep=" << ep_ptr << "]" << std::endl;
}
CUML_LOG_DEBUG(
"%d: Created receive request [id=%llu], ptr=%llu, from=%llu, ep=%llu",
getRank(), (unsigned long long)*request, (unsigned long long)ucp_req->req,
(unsigned long long)source, (unsigned long long)ep_ptr);

_requests_in_flight.insert(std::make_pair(*request, ucp_req));
#endif
Expand Down Expand Up @@ -396,14 +394,11 @@ void cumlStdCommunicator_impl::waitall(int count,
// is complete, we can go ahead and clean it up.
if (!req->needs_release || req->req->completed == 1) {
restart = true;
if (_verbose) {
std::cout << getRank() << ": request completed. [ptr=" << req->req
<< ", num_left= " << requests.size() - 1
<< ", other_rank=" << req->other_rank
<< ", is_send=" << req->is_send_request
<< ", completed_immediately=" << !req->needs_release << "]"
<< std::endl;
}
CUML_LOG_DEBUG(
"%d: request completed. [ptr=%llu, num_left=%lu,"
" other_rank=%d, is_send=%d, completed_immediately=%d]",
getRank(), (unsigned long long)req->req, requests.size() - 1,
req->other_rank, req->is_send_request, !req->needs_release);

// perform cleanup
free_ucp_request((struct comms_ucp_handle *)_ucp_handle, req);
Expand Down
17 changes: 7 additions & 10 deletions cpp/comms/std/src/ucp_helper.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019, NVIDIA CORPORATION.
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,12 +15,12 @@
*/

#include <dlfcn.h>
#include <stdio.h>
#include <ucp/api/ucp.h>
#include <ucp/api/ucp_def.h>

#include <stdio.h>

#include <utils.h>
#include <cuml/common/logger.hpp>
#include <cuml/common/utils.hpp>

/**
* An opaque handle for managing `dlopen` state within
Expand Down Expand Up @@ -72,10 +72,7 @@ void load_ucp_handle(struct comms_ucp_handle *ucp_handle) {
dlopen("libucp.so", RTLD_LAZY | RTLD_NOLOAD | RTLD_NODELETE);
if (!ucp_handle->ucp_handle) {
ucp_handle->ucp_handle = dlopen("libucp.so", RTLD_LAZY | RTLD_NODELETE);
if (!ucp_handle->ucp_handle) {
printf("Cannot open UCX library: %s\n", dlerror());
exit(1);
}
ASSERT(ucp_handle->ucp_handle, "Cannot open UCX library: %s\n", dlerror());
}
dlerror();
}
Expand Down Expand Up @@ -161,7 +158,7 @@ struct ucp_request *ucp_isend(struct comms_ucp_handle *ucp_handle,
bool verbose) {
ucp_tag_t ucp_tag = build_message_tag(rank, tag);

if (verbose) printf("Sending tag: %ld\n", ucp_tag);
CUML_LOG_DEBUG("Sending tag: %ld", ucp_tag);

ucs_status_ptr_t send_result = (*(ucp_handle->send_func))(
ep_ptr, buf, size, ucp_dt_make_contig(1), ucp_tag, send_handle);
Expand Down Expand Up @@ -201,7 +198,7 @@ struct ucp_request *ucp_irecv(struct comms_ucp_handle *ucp_handle,
int sender_rank, bool verbose) {
ucp_tag_t ucp_tag = build_message_tag(sender_rank, tag);

if (verbose) printf("%d: Receiving tag: %ld\n", ucp_tag);
CUML_LOG_DEBUG("%d: Receiving tag: %ld", ucp_tag);

ucs_status_ptr_t recv_result = (*(ucp_handle->recv_func))(
worker, buf, size, ucp_dt_make_contig(1), ucp_tag, tag_mask, recv_handle);
Expand Down
Loading