-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Statistics Resource Adaptor and cython bindings to tracking_resource_adaptor
and statistics_resource_adaptor
#626
Merged
rapids-bot
merged 28 commits into
rapidsai:branch-21.08
from
mdemoret-nv:enh-extend-tracking-resource-adaptor
Jun 8, 2021
Merged
Changes from 22 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
c5e91d9
Adding additional tracking info and cython bindings to tracking_resou…
mdemoret-nv b31bb2f
Adding PR to CHANGELOG
mdemoret-nv 533fe2a
Style cleanup
mdemoret-nv cffe9bf
Fixing incorrect Cython class name
mdemoret-nv e1218fa
Apply suggestions from code review
mdemoret-nv 25d5da3
Merge remote-tracking branch 'upstream/branch-0.17' into enh-extend-t…
mdemoret-nv 976dabb
Removed the reset() method, added ability to push/pull
mdemoret-nv 39d5e22
Merge remote-tracking branch 'upstream/branch-0.17' into enh-extend-t…
mdemoret-nv ddd296a
Style cleanup
mdemoret-nv df0054e
Adding a reference to the MR used for allocation in device_buffer
mdemoret-nv cbf2772
Merge remote-tracking branch 'upstream/branch-0.18' into enh-extend-t…
mdemoret-nv e3e586a
Removing reset() and push/pop from the tracking manager
mdemoret-nv da25869
Apply suggestions from code review
mdemoret-nv 312ca50
Changing ssize_t to int64_t per review from harrism
mdemoret-nv 4c677f5
Merge branch 'branch-0.20' into enh-extend-tracking-resource-adaptor
mdemoret-nv 1d64c6d
Incorporating feedback from code review. Simplifying counter struct.
mdemoret-nv 97753ee
Style cleanup.
mdemoret-nv b812bc0
Style cleanup for `black` which was missed in the logs
mdemoret-nv 9d74e5d
Cleaning up code to reduce number of changes with 0.20
mdemoret-nv 9e69c67
Update python/rmm/tests/test_rmm.py
mdemoret-nv 8bfd27b
Merge branch 'branch-0.20' into enh-extend-tracking-resource-adaptor
mdemoret-nv 7cf3123
Moving the dl library link to the `rmm::rmm` main interface.
mdemoret-nv b5bbab4
Merge branch 'branch-21.08' into enh-extend-tracking-resource-adaptor
mdemoret-nv c7395e4
Separated the statistics portion from the tracking_resource_adaptor i…
mdemoret-nv 1872ee2
Getting tests to pass
mdemoret-nv 6b20fb8
Style cleanup
mdemoret-nv b281f9e
Improving method comment.
mdemoret-nv cab217d
Style cleanup.
mdemoret-nv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -65,6 +65,29 @@ class tracking_resource_adaptor final : public device_memory_resource { | |||||
allocation_size{size} {}; | ||||||
}; | ||||||
|
||||||
/** | ||||||
harrism marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
* @brief Utility struct for counting the current, peak, and total value of a number | ||||||
*/ | ||||||
struct counter { | ||||||
int64_t value{0}; // Current value | ||||||
int64_t peak{0}; // Max value of `value` | ||||||
int64_t total{0}; // Sum of all added values | ||||||
|
||||||
counter& operator+=(int64_t x) | ||||||
{ | ||||||
value += x; | ||||||
total += x; | ||||||
peak = std::max(value, peak); | ||||||
return *this; | ||||||
} | ||||||
|
||||||
counter& operator-=(int64_t x) | ||||||
{ | ||||||
value -= x; | ||||||
return *this; | ||||||
} | ||||||
}; | ||||||
|
||||||
/** | ||||||
* @brief Construct a new tracking resource adaptor using `upstream` to satisfy | ||||||
* allocation requests. | ||||||
|
@@ -75,13 +98,13 @@ class tracking_resource_adaptor final : public device_memory_resource { | |||||
* @param capture_stacks If true, capture stacks for allocation calls | ||||||
*/ | ||||||
tracking_resource_adaptor(Upstream* upstream, bool capture_stacks = false) | ||||||
: capture_stacks_{capture_stacks}, allocated_bytes_{0}, upstream_{upstream} | ||||||
: capture_stacks_{capture_stacks}, upstream_{upstream} | ||||||
{ | ||||||
RMM_EXPECTS(nullptr != upstream, "Unexpected null upstream resource pointer."); | ||||||
} | ||||||
|
||||||
tracking_resource_adaptor() = delete; | ||||||
~tracking_resource_adaptor() = default; | ||||||
virtual ~tracking_resource_adaptor() = default; | ||||||
tracking_resource_adaptor(tracking_resource_adaptor const&) = delete; | ||||||
tracking_resource_adaptor(tracking_resource_adaptor&&) = default; | ||||||
tracking_resource_adaptor& operator=(tracking_resource_adaptor const&) = delete; | ||||||
|
@@ -133,27 +156,56 @@ class tracking_resource_adaptor final : public device_memory_resource { | |||||
* @return std::size_t number of bytes that have been allocated through this | ||||||
* allocator. | ||||||
*/ | ||||||
std::size_t get_allocated_bytes() const noexcept { return allocated_bytes_; } | ||||||
std::size_t get_allocated_bytes() const noexcept { return allocation_bytes_.value; } | ||||||
|
||||||
/** | ||||||
* @brief Log any outstanding allocations via RMM_LOG_DEBUG | ||||||
* @brief Returns a `counter` struct for this adaptor containing the current, | ||||||
* peak, and total number of allocated bytes or allocation counts for this | ||||||
* adaptor since it was created. | ||||||
* | ||||||
* @param return_bytes true to return bytes counter, false to re | ||||||
* @return counter | ||||||
*/ | ||||||
void log_outstanding_allocations() const | ||||||
counter get_counter(bool return_bytes = true) const noexcept | ||||||
{ | ||||||
#if SPDLOG_ACTIVE_LEVEL <= SPDLOG_LEVEL_DEBUG | ||||||
read_lock_t lock(mtx_); | ||||||
if (not allocations_.empty()) { | ||||||
std::ostringstream oss; | ||||||
|
||||||
return return_bytes ? allocation_bytes_ : allocation_count_; | ||||||
} | ||||||
|
||||||
/** | ||||||
* @brief Gets a string containing the outstanding allocation pointers, their | ||||||
* size, and optionally the stack trace for when each pointer was allocated. | ||||||
mdemoret-nv marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
* | ||||||
* @return std::string Containing the outstanding allocation pointers. | ||||||
*/ | ||||||
std::string get_outstanding_allocations_str() const | ||||||
{ | ||||||
read_lock_t lock(mtx_); | ||||||
|
||||||
std::ostringstream oss; | ||||||
|
||||||
if (!allocations_.empty()) { | ||||||
for (auto const& al : allocations_) { | ||||||
oss << al.first << ": " << al.second.allocation_size << " B"; | ||||||
if (al.second.strace != nullptr) { | ||||||
oss << " : callstack:" << std::endl << *al.second.strace; | ||||||
} | ||||||
oss << std::endl; | ||||||
} | ||||||
RMM_LOG_DEBUG("Outstanding Allocations: {}", oss.str()); | ||||||
} | ||||||
|
||||||
return oss.str(); | ||||||
} | ||||||
|
||||||
/** | ||||||
* @brief Log any outstanding allocations via RMM_LOG_DEBUG | ||||||
* | ||||||
*/ | ||||||
void log_outstanding_allocations() const | ||||||
{ | ||||||
#if SPDLOG_ACTIVE_LEVEL <= SPDLOG_LEVEL_DEBUG | ||||||
RMM_LOG_DEBUG("Outstanding Allocations: {}", get_outstanding_allocations_str()); | ||||||
#endif // SPDLOG_ACTIVE_LEVEL <= SPDLOG_LEVEL_DEBUG | ||||||
} | ||||||
|
||||||
|
@@ -179,8 +231,11 @@ class tracking_resource_adaptor final : public device_memory_resource { | |||||
{ | ||||||
write_lock_t lock(mtx_); | ||||||
allocations_.emplace(p, allocation_info{bytes, capture_stacks_}); | ||||||
|
||||||
// Increment the allocation_count_ while we have the lock | ||||||
allocation_bytes_ += bytes; | ||||||
allocation_count_ += 1; | ||||||
} | ||||||
allocated_bytes_ += bytes; | ||||||
|
||||||
return p; | ||||||
} | ||||||
|
@@ -197,11 +252,41 @@ class tracking_resource_adaptor final : public device_memory_resource { | |||||
void do_deallocate(void* p, std::size_t bytes, cuda_stream_view stream) override | ||||||
{ | ||||||
upstream_->deallocate(p, bytes, stream); | ||||||
|
||||||
{ | ||||||
write_lock_t lock(mtx_); | ||||||
allocations_.erase(p); | ||||||
|
||||||
const auto found = allocations_.find(p); | ||||||
|
||||||
// Ensure the allocation is found and the number of bytes match | ||||||
if (found == allocations_.end()) { | ||||||
// Don't throw but log an error. Throwing in a descructor (or any noexcept) will call | ||||||
// std::terminate | ||||||
RMM_LOG_ERROR( | ||||||
"Deallocating a pointer that was not tracked. Ptr: {:p} [{}B], Current Num. Allocations: " | ||||||
"{}", | ||||||
fmt::ptr(p), | ||||||
bytes, | ||||||
this->allocations_.size()); | ||||||
} else { | ||||||
allocations_.erase(found); | ||||||
|
||||||
auto allocated_bytes = found->second.allocation_size; | ||||||
|
||||||
if (allocated_bytes != bytes) { | ||||||
// Don't throw but log an error. Throwing in a descructor (or any noexcept) will call | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
// std::terminate | ||||||
RMM_LOG_ERROR( | ||||||
"Alloc bytes ({}) and Dealloc bytes ({}) do not match", allocated_bytes, bytes); | ||||||
|
||||||
bytes = allocated_bytes; | ||||||
} | ||||||
} | ||||||
|
||||||
// Decrement the current allocated counts. | ||||||
allocation_bytes_ -= bytes; | ||||||
allocation_count_ -= 1; | ||||||
} | ||||||
allocated_bytes_ -= bytes; | ||||||
} | ||||||
|
||||||
/** | ||||||
|
@@ -239,7 +324,8 @@ class tracking_resource_adaptor final : public device_memory_resource { | |||||
|
||||||
bool capture_stacks_; // whether or not to capture call stacks | ||||||
std::map<void*, allocation_info> allocations_; // map of active allocations | ||||||
std::atomic<std::size_t> allocated_bytes_; // number of bytes currently allocated | ||||||
counter allocation_bytes_; // peak, current and total allocated bytes | ||||||
counter allocation_count_; // peak, current and total allocation count | ||||||
std::shared_timed_mutex mutable mtx_; // mutex for thread safe access to allocations_ | ||||||
Upstream* upstream_; // the upstream resource used for satisfying allocation requests | ||||||
}; | ||||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Quick google shows that dladdr now lives in libc rather than libdl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With stack traces enabled, this was needed to compile the tests (original comment). Keith and I briefly discussed this here: #626 (comment).
Can you send me the link where you saw that
dladdr
has moved? All I am seeing from this link is:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, the docs I found were not for linux -- Solaris and something called illumos. As I said, it was a quick google.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested that all is well in libcudf, cuML, etc. when this library is linked here? Note that the other
target_link_libraries
for RMM are all header-only, which is why this one has me worried (RMM is a header-only library).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdemoret reports cuML builds and tests fine against this PR.