Skip to content

Commit

Permalink
Add stacktrace into cudf exception types (rapidsai#13298)
Browse files Browse the repository at this point in the history
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort.

Closes rapidsai#12422.

### Example:
```
#0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446
#1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113
#2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e
#3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88
#4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9
#5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027
rapidsai#6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f
rapidsai#7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6
rapidsai#8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195
rapidsai#9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109
rapidsai#10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f
rapidsai#11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87
rapidsai#12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95
rapidsai#13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c
rapidsai#14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
rapidsai#15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
rapidsai#16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5
```

### Usage

In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use.

For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example:
```
try {
  // cudf API calls
} catch (cudf::logic_error const& e) {
  std::cout << e.what() << std::endl;
  std::cout << e.stacktrace() << std::endl;
  throw e;
} 
// similar with catching other exception types
```

### Follow-up work

The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace.


### Note:
 * This feature doesn't require libcudf to be built in Debug mode.
 * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions.
 * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging).
 * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Jason Lowe (https://github.com/jlowe)

URL: rapidsai#13298
  • Loading branch information
ttnghia authored Jun 9, 2023
1 parent c270986 commit 69206d1
Show file tree
Hide file tree
Showing 13 changed files with 350 additions and 123 deletions.
1 change: 1 addition & 0 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ outputs:
- test -f $PREFIX/include/cudf/detail/utilities/linked_column.hpp
- test -f $PREFIX/include/cudf/detail/utilities/logger.hpp
- test -f $PREFIX/include/cudf/detail/utilities/pinned_host_vector.hpp
- test -f $PREFIX/include/cudf/detail/utilities/stacktrace.hpp
- test -f $PREFIX/include/cudf/detail/utilities/vector_factories.hpp
- test -f $PREFIX/include/cudf/detail/utilities/visitor_overload.hpp
- test -f $PREFIX/include/cudf/dictionary/detail/concatenate.hpp
Expand Down
56 changes: 53 additions & 3 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,18 @@ option(
stream to external libraries."
OFF
)
# Option to add all symbols to the dynamic symbol table in the library file, allowing to retrieve
# human-readable stacktrace for debugging.
option(
CUDF_BUILD_STACKTRACE_DEBUG
"Replace the current optimization flags by the options '-rdynamic -Og -NDEBUG', useful for debugging with stacktrace retrieval"
OFF
)
option(DISABLE_DEPRECATION_WARNINGS "Disable warnings generated from deprecated declarations." OFF)
# Option to enable line info in CUDA device compilation to allow introspection when profiling /
# memchecking
option(CUDA_ENABLE_LINEINFO
"Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler" OFF
"Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler)" OFF
)
option(CUDA_WARNINGS_AS_ERRORS "Enable -Werror=all-warnings for all CUDA compilation" ON)
# cudart can be statically linked or dynamically linked. The python ecosystem wants dynamic linking
Expand Down Expand Up @@ -94,13 +101,17 @@ message(VERBOSE "CUDF: Use a file cache for JIT compiled kernels: ${JITIFY_USE_C
message(VERBOSE "CUDF: Build and statically link Arrow libraries: ${CUDF_USE_ARROW_STATIC}")
message(VERBOSE "CUDF: Build and enable S3 filesystem support for Arrow: ${CUDF_ENABLE_ARROW_S3}")
message(VERBOSE "CUDF: Build with per-thread default stream: ${CUDF_USE_PER_THREAD_DEFAULT_STREAM}")
message(
VERBOSE
"CUDF: Replace the current optimization flags by the options '-rdynamic -Og' (useful for debugging with stacktrace retrieval): ${CUDF_BUILD_STACKTRACE_DEBUG}"
)
message(
VERBOSE
"CUDF: Disable warnings generated from deprecated declarations: ${DISABLE_DEPRECATION_WARNINGS}"
)
message(
VERBOSE
"CUDF: Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler: ${CUDA_ENABLE_LINEINFO}"
"CUDF: Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler): ${CUDA_ENABLE_LINEINFO}"
)
message(VERBOSE "CUDF: Statically link the CUDA runtime: ${CUDA_STATIC_RUNTIME}")

Expand All @@ -115,6 +126,10 @@ if(BUILD_TESTS AND NOT CUDF_BUILD_TESTUTIL)
)
endif()

if(CUDF_BUILD_STACKTRACE_DEBUG AND NOT CMAKE_COMPILER_IS_GNUCXX)
message(FATAL_ERROR "CUDF_BUILD_STACKTRACE_DEBUG is only supported with GCC compiler")
endif()

set(CUDF_CXX_FLAGS "")
set(CUDF_CUDA_FLAGS "")
set(CUDF_CXX_DEFINITIONS "")
Expand Down Expand Up @@ -608,6 +623,7 @@ add_library(
src/utilities/default_stream.cpp
src/utilities/linked_column.cpp
src/utilities/logger.cpp
src/utilities/stacktrace.cpp
src/utilities/traits.cpp
src/utilities/type_checks.cpp
src/utilities/type_dispatcher.cpp
Expand Down Expand Up @@ -646,6 +662,31 @@ target_compile_options(
"$<$<COMPILE_LANGUAGE:CUDA>:${CUDF_CUDA_FLAGS}>"
)

if(CUDF_BUILD_STACKTRACE_DEBUG)
# Remove any optimization level to avoid nvcc warning "incompatible redefinition for option
# 'optimize'".
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS}")
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS_RELEASE "${CMAKE_CUDA_FLAGS_RELEASE}")
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS_MINSIZEREL
"${CMAKE_CUDA_FLAGS_MINSIZEREL}"
)
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS_RELWITHDEBINFO
"${CMAKE_CUDA_FLAGS_RELWITHDEBINFO}"
)

add_library(cudf_backtrace INTERFACE)
target_compile_definitions(cudf_backtrace INTERFACE CUDF_BUILD_STACKTRACE_DEBUG)
target_compile_options(
cudf_backtrace INTERFACE "$<$<COMPILE_LANGUAGE:CXX>:-Og>"
"$<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=-Og>"
)
target_link_options(
cudf_backtrace INTERFACE "$<$<LINK_LANGUAGE:CXX>:-rdynamic>"
"$<$<LINK_LANGUAGE:CUDA>:-Xlinker=-rdynamic>"
)
target_link_libraries(cudf PRIVATE cudf_backtrace)
endif()

# Specify include paths for the current target and dependents
target_include_directories(
cudf
Expand Down Expand Up @@ -829,7 +870,9 @@ if(CUDF_BUILD_STREAMS_TEST_UTIL)
# depending via ctest and whether it has been updated to expose public stream APIs.
foreach(_mode cudf testing)
set(_tgt "cudf_identify_stream_usage_mode_${_mode}")
add_library(${_tgt} SHARED tests/utilities/identify_stream_usage.cpp)
add_library(
${_tgt} SHARED src/utilities/stacktrace.cpp tests/utilities/identify_stream_usage.cpp
)

set_target_properties(
${_tgt}
Expand All @@ -838,7 +881,14 @@ if(CUDF_BUILD_STREAMS_TEST_UTIL)
CXX_STANDARD_REQUIRED ON
POSITION_INDEPENDENT_CODE ON
)
target_compile_options(
${_tgt} PRIVATE "$<BUILD_INTERFACE:$<$<COMPILE_LANGUAGE:CXX>:${CUDF_CXX_FLAGS}>>"
)
target_include_directories(${_tgt} PRIVATE "$<BUILD_INTERFACE:${CUDF_SOURCE_DIR}/include>")
target_link_libraries(${_tgt} PUBLIC CUDA::cudart rmm::rmm)
if(CUDF_BUILD_STACKTRACE_DEBUG)
target_link_libraries(${_tgt} PRIVATE cudf_backtrace)
endif()
add_library(cudf::${_tgt} ALIAS ${_tgt})

if("${_mode}" STREQUAL "testing")
Expand Down
47 changes: 47 additions & 0 deletions cpp/include/cudf/detail/utilities/stacktrace.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <string>

namespace cudf::detail {
/**
* @addtogroup utility_stacktrace
* @{
* @file
*/

/**
* @brief Specify whether the last stackframe is included in the stacktrace.
*/
enum class capture_last_stackframe : bool { YES, NO };

/**
* @brief Query the current stacktrace and return the whole stacktrace as one string.
*
* Depending on the value of the flag `capture_last_frame`, the caller that executes stacktrace
* retrieval can be included in the output result.
*
* @param capture_last_frame Flag to specify if the current stackframe will be included into
* the output
* @return A string storing the whole current stacktrace
*/
std::string get_stacktrace(capture_last_stackframe capture_last_frame);

/** @} */ // end of group

} // namespace cudf::detail
30 changes: 27 additions & 3 deletions cpp/include/cudf/utilities/error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

#pragma once

#include <cudf/detail/utilities/stacktrace.hpp>

#include <cuda.h>
#include <cuda_runtime_api.h>
#include <stdexcept>
Expand All @@ -29,13 +31,35 @@ namespace cudf {
* @file
*/

/**
* @brief The struct to store the current stacktrace upon its construction.
*/
struct stacktrace_recorder {
stacktrace_recorder()
// Exclude the current stackframe, as it is this constructor.
: _stacktrace{cudf::detail::get_stacktrace(cudf::detail::capture_last_stackframe::NO)}
{
}

public:
/**
* @brief Get the stored stacktrace captured during object construction.
*
* @return The pointer to a null-terminated string storing the output stacktrace
*/
char const* stacktrace() const { return _stacktrace.c_str(); }

protected:
std::string const _stacktrace; //!< The whole stacktrace stored as one string.
};

/**
* @brief Exception thrown when logical precondition is violated.
*
* This exception should not be thrown directly and is instead thrown by the
* CUDF_EXPECTS macro.
*/
struct logic_error : public std::logic_error {
struct logic_error : public std::logic_error, public stacktrace_recorder {
/**
* @brief Constructs a logic_error with the error message.
*
Expand All @@ -57,7 +81,7 @@ struct logic_error : public std::logic_error {
* @brief Exception thrown when a CUDA error is encountered.
*
*/
struct cuda_error : public std::runtime_error {
struct cuda_error : public std::runtime_error, public stacktrace_recorder {
/**
* @brief Construct a new cuda error object with error message and code.
*
Expand Down Expand Up @@ -92,7 +116,7 @@ struct fatal_cuda_error : public cuda_error {
* unsupported data_type. This exception should not be thrown directly and is
* instead thrown by the CUDF_EXPECTS or CUDF_FAIL macros.
*/
struct data_type_error : public std::invalid_argument {
struct data_type_error : public std::invalid_argument, public stacktrace_recorder {
/**
* @brief Constructs a data_type_error with the error message.
*
Expand Down
8 changes: 8 additions & 0 deletions cpp/include/cudf_test/stream_checking_resource_adaptor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,12 @@

#include <cudf_test/default_stream.hpp>

#include <cudf/detail/utilities/stacktrace.hpp>

#include <rmm/mr/device/device_memory_resource.hpp>

#include <iostream>

/**
* @brief Resource that verifies that the default stream is not used in any allocation.
*
Expand Down Expand Up @@ -162,6 +166,10 @@ class stream_checking_resource_adaptor final : public rmm::mr::device_memory_res
: (cstream != cudf::test::get_default_stream().value());

if (invalid_stream) {
// Exclude the current function from stacktrace.
std::cout << cudf::detail::get_stacktrace(cudf::detail::capture_last_stackframe::NO)
<< std::endl;

if (error_on_invalid_stream_) {
throw std::runtime_error("Attempted to perform an operation on an unexpected stream!");
} else {
Expand Down
88 changes: 88 additions & 0 deletions cpp/src/utilities/stacktrace.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <cudf/detail/utilities/stacktrace.hpp>

#if defined(__GNUC__) && defined(CUDF_BUILD_STACKTRACE_DEBUG)
#include <cxxabi.h>
#include <execinfo.h>

#include <cstdlib>
#include <cstring>
#include <sstream>
#endif // defined(__GNUC__) && defined(CUDF_BUILD_STACKTRACE_DEBUG)

namespace cudf::detail {

std::string get_stacktrace(capture_last_stackframe capture_last_frame)
{
#if defined(__GNUC__) && defined(CUDF_BUILD_STACKTRACE_DEBUG)
constexpr int max_stack_depth = 64;
void* stack[max_stack_depth];

auto const depth = backtrace(stack, max_stack_depth);
auto const modules = backtrace_symbols(stack, depth);

if (modules == nullptr) { return "No stacktrace could be captured!"; }

std::stringstream ss;

// Skip one more depth to avoid including the stackframe of this function.
auto const skip_depth = 1 + (capture_last_frame == capture_last_stackframe::YES ? 0 : 1);
for (auto i = skip_depth; i < depth; ++i) {
// Each modules[i] string contains a mangled name in the format like following:
// `module_name(function_name+0x012) [0x01234567890a]`
// We need to extract function name and function offset.
char* begin_func_name = std::strstr(modules[i], "(");
char* begin_func_offset = std::strstr(modules[i], "+");
char* end_func_offset = std::strstr(modules[i], ")");

auto const frame_idx = i - skip_depth;
if (begin_func_name && begin_func_offset && end_func_offset &&
begin_func_name < begin_func_offset) {
// Split `modules[i]` into separate null-terminated strings.
// After this, mangled function name will then be [begin_func_name, begin_func_offset), and
// function offset is in [begin_func_offset, end_func_offset).
*(begin_func_name++) = '\0';
*(begin_func_offset++) = '\0';
*end_func_offset = '\0';

// We need to demangle function name.
int status{0};
char* func_name = abi::__cxa_demangle(begin_func_name, nullptr, nullptr, &status);

ss << "#" << frame_idx << ": " << modules[i] << " : "
<< (status == 0 /*demangle success*/ ? func_name : begin_func_name) << "+"
<< begin_func_offset << "\n";
free(func_name);
} else {
ss << "#" << frame_idx << ": " << modules[i] << "\n";
}
}

free(modules);

return ss.str();
#else
#ifdef CUDF_BUILD_STACKTRACE_DEBUG
return "Stacktrace is only supported when built with a GNU compiler.";
#else
return "libcudf was not built with stacktrace support.";
#endif // CUDF_BUILD_STACKTRACE_DEBUG
#endif // __GNUC__
}

} // namespace cudf::detail
Loading

0 comments on commit 69206d1

Please sign in to comment.