Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Statistics Resource Adaptor and cython bindings to tracking_resource_adaptor and statistics_resource_adaptor #626

Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c5e91d9
Adding additional tracking info and cython bindings to tracking_resou…
mdemoret-nv Nov 10, 2020
b31bb2f
Adding PR to CHANGELOG
mdemoret-nv Nov 10, 2020
533fe2a
Style cleanup
mdemoret-nv Nov 10, 2020
cffe9bf
Fixing incorrect Cython class name
mdemoret-nv Nov 10, 2020
e1218fa
Apply suggestions from code review
mdemoret-nv Nov 10, 2020
25d5da3
Merge remote-tracking branch 'upstream/branch-0.17' into enh-extend-t…
mdemoret-nv Nov 19, 2020
976dabb
Removed the reset() method, added ability to push/pull
mdemoret-nv Nov 25, 2020
39d5e22
Merge remote-tracking branch 'upstream/branch-0.17' into enh-extend-t…
mdemoret-nv Nov 25, 2020
ddd296a
Style cleanup
mdemoret-nv Nov 25, 2020
df0054e
Adding a reference to the MR used for allocation in device_buffer
mdemoret-nv Dec 4, 2020
cbf2772
Merge remote-tracking branch 'upstream/branch-0.18' into enh-extend-t…
mdemoret-nv Dec 15, 2020
e3e586a
Removing reset() and push/pop from the tracking manager
mdemoret-nv Dec 15, 2020
da25869
Apply suggestions from code review
mdemoret-nv Dec 15, 2020
312ca50
Changing ssize_t to int64_t per review from harrism
mdemoret-nv Dec 15, 2020
4c677f5
Merge branch 'branch-0.20' into enh-extend-tracking-resource-adaptor
mdemoret-nv Apr 5, 2021
1d64c6d
Incorporating feedback from code review. Simplifying counter struct.
mdemoret-nv Apr 9, 2021
97753ee
Style cleanup.
mdemoret-nv Apr 9, 2021
b812bc0
Style cleanup for `black` which was missed in the logs
mdemoret-nv Apr 9, 2021
9d74e5d
Cleaning up code to reduce number of changes with 0.20
mdemoret-nv Apr 9, 2021
9e69c67
Update python/rmm/tests/test_rmm.py
mdemoret-nv May 13, 2021
8bfd27b
Merge branch 'branch-0.20' into enh-extend-tracking-resource-adaptor
mdemoret-nv May 13, 2021
7cf3123
Moving the dl library link to the `rmm::rmm` main interface.
mdemoret-nv May 13, 2021
b5bbab4
Merge branch 'branch-21.08' into enh-extend-tracking-resource-adaptor
mdemoret-nv Jun 2, 2021
c7395e4
Separated the statistics portion from the tracking_resource_adaptor i…
mdemoret-nv Jun 2, 2021
1872ee2
Getting tests to pass
mdemoret-nv Jun 2, 2021
6b20fb8
Style cleanup
mdemoret-nv Jun 2, 2021
b281f9e
Improving method comment.
mdemoret-nv Jun 3, 2021
cab217d
Style cleanup.
mdemoret-nv Jun 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
- PR #611 Add a note to the contribution guide about requiring 2 C++ reviewers
- PR #615 Improve gpuCI Scripts
- PR #635 Add Python docs build to gpuCI
- PR #626 Adding additional tracking info and cython bindings to `tracking_resource_adaptor`

## Bug Fixes

Expand Down
34 changes: 31 additions & 3 deletions include/rmm/detail/stack_trace.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
#include <sstream>

#if defined(RMM_ENABLE_STACK_TRACES)
#include <cxxabi.h>
#include <dlfcn.h>
#include <execinfo.h>
#include <memory>
#include <vector>
Expand Down Expand Up @@ -63,9 +65,35 @@ class stack_trace {
if (strings.get() == nullptr) {
os << "But no stack trace could be found!" << std::endl;
} else {
///@todo: support for demangling of C++ symbol names
for (int i = 0; i < st.stack_ptrs.size(); ++i) {
os << "#" << i << " in " << strings.get()[i] << std::endl;
// Iterate over the stack pointers converting to a string
for (int i = 0; i < st.stack_ptrs.size(); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this code more beautiful. Discussed with @codereport . Suggestion is to replace the for with a std::transform over st.stack_ptrs().

And use an IILE to clean up the internal if, something like

auto const str = [&] {
    if (dladdr(st.stack_ptrs[i], &info)) {
        int status = -1;
        // Demangle the name. This can occasionally fail
        std::unique_ptr<char, decltype(&::free)> demangled(
            abi::__cxa_demangle(info.dli_sname, nullptr, 0, &status), &::free);
        // If it fails, fallback to the dli_name.        
        if (status == 0 or info.dli_sname) {
            auto const name = status == 0 ? demangled.get() : info.dli_sname;
            return name + " from " << info.dli_fname;
        }
    }
    return "";
} ();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the IILE to clean up the string construction but I couldnt come up with a good way to use std::transform here that was cleaner than the current implementation. Since we need both the value and the index we would need to use the binary_op overload of std::transform with one iterator over the values and something similar to boost::counting_iterator for the index.

Maybe you are better with these functions in <algorithm> than I am, but I couldn't find an elegant way to create the indexing iterator that didn't require creating a second object like a vector. It's a shame because I was hoping to use std::ostream_iterator for the output which would have been nice. If I'm missing something here, let me know.

// Leading index
os << "#" << i << " in ";

bool written = false;
Dl_info info;

if (dladdr(st.stack_ptrs[i], &info)) {
int status = -1;

// Demangle the name. This can occasionally fail
std::unique_ptr<char, decltype(&::free)> demangled(
abi::__cxa_demangle(info.dli_sname, NULL, 0, &status), &::free);

// If it fails, fallback to the dli_name.
if (status == 0) {
os << demangled.get() << " from " << info.dli_fname;
written = true;
} else if (info.dli_sname != NULL) {
os << info.dli_sname << " from " << info.dli_fname;
written = true;
}
}

// If the demangling and fallback failed, just print the mangled string
if (!written) { os << strings.get()[i]; }

os << std::endl;
}
}
#else
Expand Down
235 changes: 219 additions & 16 deletions include/rmm/mr/device/tracking_resource_adaptor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,47 @@ class tracking_resource_adaptor final : public device_memory_resource {
allocation_size{size} {};
};

/**
harrism marked this conversation as resolved.
Show resolved Hide resolved
* @brief Stores the current, peak, and total number of allocation and
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
* allocated bytes
*/
struct allocation_counts {
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
ssize_t current_bytes{0}; // Current outstanding bytes
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
ssize_t current_count{0}; // Current outstanding count
ssize_t peak_bytes{0}; // Max value of current_bytes
ssize_t peak_count{0}; // Max value of current_count
std::size_t total_bytes{0}; // Total allocated bytes
std::size_t total_count{0}; // Total allocated count

/**
* @brief Increments the current, peak and total by bytes
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
*
* @param bytes Number of bytes allocated
*/
void increment_count(size_t bytes)
{
current_bytes += bytes;
current_count += 1;

peak_bytes = std::max(current_bytes, peak_bytes);
peak_count = std::max(current_count, peak_count);

total_bytes += bytes;
total_count += 1;
}

/**
* @brief Decrement the current bytes and count. Peak and total remain unchanged
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
*
* @param bytes Number of bytes deallocated
*/
void decrement_count(size_t bytes)
{
current_bytes -= bytes;
current_count -= 1;
}
};

/**
* @brief Construct a new tracking resource adaptor using `upstream` to satisfy
* allocation requests.
Expand All @@ -75,13 +116,16 @@ class tracking_resource_adaptor final : public device_memory_resource {
* @param capture_stacks If true, capture stacks for allocation calls
*/
tracking_resource_adaptor(Upstream* upstream, bool capture_stacks = false)
: upstream_{upstream}, capture_stacks_{capture_stacks}, allocated_bytes_{0}
: capture_stacks_{capture_stacks}, upstream_{upstream}
{
RMM_EXPECTS(nullptr != upstream, "Unexpected null upstream resource pointer.");

// Need to maintain at least one on the stack at all times
allocation_count_stack_.push_back(allocation_counts());
}

tracking_resource_adaptor() = delete;
~tracking_resource_adaptor() = default;
virtual ~tracking_resource_adaptor() = default;
tracking_resource_adaptor(tracking_resource_adaptor const&) = delete;
tracking_resource_adaptor(tracking_resource_adaptor&&) = default;
tracking_resource_adaptor& operator=(tracking_resource_adaptor const&) = delete;
Expand Down Expand Up @@ -133,27 +177,153 @@ class tracking_resource_adaptor final : public device_memory_resource {
* @return std::size_t number of bytes that have been allocated through this
* allocator.
*/
std::size_t get_allocated_bytes() const noexcept { return allocated_bytes_; }
std::size_t get_allocated_bytes() const noexcept
{
return get_total_allocation_counts().current_bytes;
}

/**
* @brief Log any outstanding allocations via RMM_LOG_DEBUG
* @brief Returns an allocation_counts struct for this adaptor containing the
* total current, peak, and total number of bytes and allocation count for
* this adaptor regardless of any push/popped allocation_counts. Note: Because
* its possible to change memory resources at any time while maintaining the
* same upstream memory resource, its possible to have a negative allocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the note, or the reason for a stack. You can't change the upstream for a resource_adaptor, you would have to create a new resource adaptor. You can't have more deallocations than allocations without an exception. This seems way overcomplicated.

* bytes or count if the number of deallocate() calls is greater than the
* number of allocate().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not brief. :) @brief should be one line. Separate detailed docs from @brief with a single blank line.

*
* @return allocation_counts struct containing the allocation number of bytes
* and count info
*/
void log_outstanding_allocations() const
allocation_counts get_total_allocation_counts() const noexcept
{
#if SPDLOG_ACTIVE_LEVEL <= SPDLOG_LEVEL_DEBUG
read_lock_t lock(mtx);
if (not allocations.empty()) {
std::ostringstream oss;
for (auto const& al : allocations) {
read_lock_t lock(mtx_);

// Start by copying the head
allocation_counts sum_counts = allocation_count_stack_.front();

auto prev_curr_bytes = sum_counts.current_bytes;
auto prev_curr_count = sum_counts.current_count;

for (size_t i = 1; i < allocation_count_stack_.size(); i++) {
// Sum the total and current
sum_counts.current_bytes += allocation_count_stack_[i].current_bytes;
sum_counts.current_count += allocation_count_stack_[i].current_count;

sum_counts.total_bytes += allocation_count_stack_[i].total_bytes;
sum_counts.total_count += allocation_count_stack_[i].total_count;

// Peak works differently. For each, the max value is the previous stack's
// `current_bytes` + the current stack's `peak_bytes`
sum_counts.peak_bytes =
std::max(sum_counts.peak_bytes, prev_curr_bytes + allocation_count_stack_[i].peak_bytes);
sum_counts.peak_count =
std::max(sum_counts.peak_count, prev_curr_count + allocation_count_stack_[i].peak_count);

prev_curr_bytes = allocation_count_stack_[i].current_bytes;
prev_curr_count = allocation_count_stack_[i].current_count;
}

return sum_counts;
}

/**
* @brief Pushes a new allocation_counts struct onto the stack to allow
* tracking memory allocations for a particular section without creating a new
* memory resource. Call `pop_allocation_counts` to retrieve the
* allocation_counts since this method was called.
*
* @return allocation_counts Returns the previous allocation_counts at the top
* of the stack
*/
allocation_counts push_allocation_counts()
{
write_lock_t lock(mtx_);

// Copy the top of the stack
allocation_counts counts = allocation_count_stack_.back();

// Push a new one on the stack
allocation_count_stack_.emplace_back(allocation_counts{});

return counts;
}

/**
* @brief Pops the top allocation_counts struct off the stack and returns the
* current, peak and total allocations since `push_allocation_counts` was
* called.
*
* @throws rmm::out_of_range exception if `allocation_count_stack_.size() <=
* 1`
*
* @return allocation_counts Returns the allocation_counts struct since
* `push_allocation_counts` was called
*/
allocation_counts pop_allocation_counts()
{
write_lock_t lock(mtx_);

RMM_EXPECTS(allocation_count_stack_.size() > 1,
rmm::out_of_range,
"Attempted to pop only allocation_counts on stack.");

// Copy the top of the stack
allocation_counts counts = allocation_count_stack_.back();

// Pop the stack
allocation_count_stack_.pop_back();

// When popping the stack, the top needs to be rolled into the new top to keep the total values
// correct
auto& new_top = allocation_count_stack_.back();

// Make sure to do peak first here
new_top.peak_bytes = std::max(new_top.peak_bytes, new_top.current_bytes + counts.peak_bytes);
new_top.peak_count = std::max(new_top.peak_count, new_top.current_count + counts.peak_count);

// Sum the total and current
new_top.current_bytes += counts.current_bytes;
new_top.current_count += counts.current_count;

new_top.total_bytes += counts.total_bytes;
new_top.total_count += counts.total_count;

return counts;
}

/**
* @brief Gets a string containing the outstanding allocation pointers, their
* size, and optionally the stack trace for when each pointer was allocated.
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
*
* @return std::string Containing the outstanding allocation pointers.
*/
std::string get_outstanding_allocations_str() const
{
read_lock_t lock(mtx_);

std::ostringstream oss;

if (!allocations_.empty()) {
for (auto const& al : allocations_) {
oss << al.first << ": " << al.second.allocation_size << " B";
if (al.second.strace != nullptr) {
oss << " : callstack:" << std::endl << *al.second.strace;
}
oss << std::endl;
}
RMM_LOG_DEBUG("Outstanding Allocations: {}", oss.str());
}

return oss.str();
}

/**
* @brief Log any outstanding allocations via RMM_LOG_DEBUG
*
*/
void log_outstanding_allocations() const
{
#if SPDLOG_ACTIVE_LEVEL <= SPDLOG_LEVEL_DEBUG
RMM_LOG_DEBUG("Outstanding Allocations: {}", get_outstanding_allocations_str());
#endif // SPDLOG_ACTIVE_LEVEL <= SPDLOG_LEVEL_DEBUG
}

Expand All @@ -179,8 +349,10 @@ class tracking_resource_adaptor final : public device_memory_resource {
{
write_lock_t lock(mtx_);
allocations_.emplace(p, allocation_info{bytes, capture_stacks_});

// Increment the allocation_count_ while we have the lock
allocation_count_stack_.back().increment_count(bytes);
}
allocated_bytes_ += bytes;

return p;
}
Expand All @@ -197,11 +369,40 @@ class tracking_resource_adaptor final : public device_memory_resource {
void do_deallocate(void* p, std::size_t bytes, cuda_stream_view stream) override
{
upstream_->deallocate(p, bytes, stream);

{
write_lock_t lock(mtx_);
allocations_.erase(p);

const auto found = allocations_.find(p);

// Ensure the allocation is found and the number of bytes match
if (found == allocations_.end()) {
// Don't throw but log an error. Throwing in a descructor (or any noexcept) will call
// std::terminate
RMM_LOG_ERROR(
"Deallocating a pointer that was not tracked. Ptr: {:p} [{}B], Current Num. Allocations: "
"{}",
fmt::ptr(p),
bytes,
this->allocations_.size());
} else {
allocations_.erase(found);

auto allocated_bytes = found->second.allocation_size;

if (allocated_bytes != bytes) {
// Don't throw but log an error. Throwing in a descructor (or any noexcept) will call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Don't throw but log an error. Throwing in a descructor (or any noexcept) will call
// Don't throw but log an error. Throwing in a destructor (or any noexcept) will call

// std::terminate
RMM_LOG_ERROR(
"Alloc bytes ({}) and Dealloc bytes ({}) do not match", allocated_bytes, bytes);

bytes = allocated_bytes;
}
}

// Decrement the current allocated counts.
allocation_count_stack_.back().decrement_count(bytes);
}
allocated_bytes_ -= bytes;
}

/**
Expand Down Expand Up @@ -239,8 +440,10 @@ class tracking_resource_adaptor final : public device_memory_resource {

bool capture_stacks_; // whether or not to capture call stacks
std::map<void*, allocation_info> allocations_; // map of active allocations
std::atomic<std::size_t> allocated_bytes_; // number of bytes currently allocated
std::shared_timed_mutex mutable mtx_; // mutex for thread safe access to allocations_
std::vector<allocation_counts>
allocation_count_stack_; // Stack of allocation_counts structs to track memory allocation
// between push and pop
std::shared_timed_mutex mutable mtx_; // mutex for thread safe access to allocations_
Upstream* upstream_; // the upstream resource used for satisfying allocation requests
};

Expand Down
27 changes: 27 additions & 0 deletions python/rmm/_lib/memory_resource.pxd
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Copyright (c) 2020, NVIDIA CORPORATION.

from libcpp cimport bool
from libc.stdint cimport int8_t
from libcpp.vector cimport vector
from libcpp.string cimport string
Expand Down Expand Up @@ -79,6 +80,29 @@ cdef extern from "memory_resource_wrappers.hpp" nogil:
shared_ptr[device_memory_resource_wrapper] new_resource
) except +

cdef cppclass tracking_resource_adaptor_wrapper(
device_memory_resource_wrapper
):
struct allocation_counts:
allocation_counts()

ssize_t current_bytes
ssize_t current_count
ssize_t peak_bytes
ssize_t peak_count
size_t total_bytes
size_t total_count

tracking_resource_adaptor_wrapper(
shared_ptr[device_memory_resource_wrapper] upstream_mr,
bool capture_stacks
) except +

allocation_counts get_total_allocation_counts() except +
allocation_counts push_allocation_counts() except +
allocation_counts pop_allocation_counts() except +
string get_outstanding_allocations_str() except +


cdef class MemoryResource:
cdef shared_ptr[device_memory_resource_wrapper] c_obj
Expand All @@ -102,3 +126,6 @@ cdef class LoggingResourceAdaptor(MemoryResource):
cdef object _log_file_name
cpdef get_file_name(self)
cpdef flush(self)

cdef class TrackingMemoryResource(MemoryResource):
pass
Loading