Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting workspace memory resource #1356

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a939784
Wrap the workspace resource into a limiting_resource_adaptor
achirkin Mar 20, 2023
ac5762b
Set the pool memory resource by default and start the ivf-pq use case
achirkin Mar 20, 2023
c33d519
Refactor the resource to more rely on shared_ptr to manage lifetime
achirkin Mar 21, 2023
de5dd84
Preserve the semantics of not transfering the ownership of raw pointe…
achirkin Mar 21, 2023
36365b5
Merge branch 'branch-23.04' into fea-limited-workspace-resource
achirkin Mar 21, 2023
2eeb9e9
Merge branch 'branch-23.04' into fea-limited-workspace-resource
achirkin Mar 28, 2023
b8c5bc3
Merge branch 'branch-23.04' into fea-limited-workspace-resource
achirkin Mar 29, 2023
48f90fe
Merge branch 'branch-23.06' into fea-limited-workspace-resource
achirkin May 9, 2023
79c954e
Fix a missing merge change
achirkin May 9, 2023
6cf1103
Make the resource change not permanent
achirkin May 9, 2023
370b9ed
Don't force use the temp local workspace for all raft allocations
achirkin May 9, 2023
197106b
Merge remote-tracking branch 'rapidsai/branch-23.08' into fea-limited…
achirkin Jun 28, 2023
06cf4ff
Don't use device_resources
achirkin Jun 28, 2023
f27ba86
Using more of workspace memory resource
achirkin Jun 28, 2023
d435855
Let device_uvector_policy keep the memory resource when needed
achirkin Jun 28, 2023
1b62e3a
Make helper to query workspace size
achirkin Jun 28, 2023
5fed631
Tiny unrelated test fix: copy data in a stream.
achirkin Jun 28, 2023
a2e749d
Update the API to always use shared pointers to the resources
achirkin Jun 28, 2023
7736d76
Fix a typo
achirkin Jun 28, 2023
be63f73
Rename limited->limiting resource for consistency
achirkin Jun 29, 2023
c70728a
Add comments
achirkin Jun 29, 2023
be047d4
Remove repeated word in the comment
achirkin Jun 29, 2023
3e151d4
Fix a missing word in the comment
achirkin Jun 29, 2023
2649391
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 4, 2023
db27247
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 6, 2023
0900904
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 7, 2023
4cf6455
Add a deprecation comment to the mr argument
achirkin Jul 7, 2023
127907c
Add function deprecations
achirkin Jul 7, 2023
d6a27c5
Remove ANN reference
achirkin Jul 7, 2023
4530423
Use the plain workspace resource by default and print a warning if ne…
achirkin Jul 7, 2023
f87cbf2
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 14, 2023
d4f0c78
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 14, 2023
775f718
Add a note about no deleter
achirkin Jul 18, 2023
d7fcde9
Use the workspace resource size to determine the batch sizes for ivf-pq
achirkin Jul 18, 2023
eaafd3f
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 19, 2023
8eb9b80
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 19, 2023
463f409
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 20, 2023
e85033e
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 21, 2023
044b6ca
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 22, 2023
f3edcbc
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 25, 2023
ae0f469
Use get_workspace_free_bytes and debug-log the usage of the default p…
achirkin Jul 25, 2023
e082d09
Merge branch 'branch-23.08' into fea-limited-workspace-resource
achirkin Jul 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions cpp/include/raft/core/device_container_policy.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -164,10 +164,19 @@ class device_uvector_policy {
public:
auto create(raft::resources const& res, size_t n) -> container_type
{
return container_type(n, resource::get_cuda_stream(res), resource::get_workspace_resource(res));
if (mr_ == nullptr) {
tfeher marked this conversation as resolved.
Show resolved Hide resolved
// NB: not using the workspace resource by default!
// The workspace resource is for short-lived temporary allocations.
return container_type(n, resource::get_cuda_stream(res));
} else {
return container_type(n, resource::get_cuda_stream(res), mr_);
}
}

device_uvector_policy() = default;
constexpr device_uvector_policy() = default;
constexpr explicit device_uvector_policy(rmm::mr::device_memory_resource* mr) noexcept : mr_(mr)
{
}

[[nodiscard]] constexpr auto access(container_type& c, size_t n) const noexcept -> reference
{
Expand All @@ -181,6 +190,9 @@ class device_uvector_policy {

[[nodiscard]] auto make_accessor_policy() noexcept { return accessor_policy{}; }
[[nodiscard]] auto make_accessor_policy() const noexcept { return const_accessor_policy{}; }

private:
rmm::mr::device_memory_resource* mr_{nullptr};
};

} // namespace raft
2 changes: 1 addition & 1 deletion cpp/include/raft/core/device_mdarray.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ auto make_device_mdarray(raft::resources const& handle,
using mdarray_t = device_mdarray<ElementType, decltype(exts), LayoutPolicy>;

typename mdarray_t::mapping_type layout{exts};
typename mdarray_t::container_policy_type policy{};
typename mdarray_t::container_policy_type policy{mr};

return mdarray_t{handle, layout, policy};
}
Expand Down
19 changes: 12 additions & 7 deletions cpp/include/raft/core/device_resources.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

#include <memory>
#include <mutex>
#include <optional>
#include <string>
#include <unordered_map>
#include <utility>
Expand Down Expand Up @@ -60,12 +61,12 @@ namespace raft {
class device_resources : public resources {
public:
device_resources(const device_resources& handle,
rmm::mr::device_memory_resource* workspace_resource)
std::shared_ptr<rmm::mr::device_memory_resource> workspace_resource,
std::optional<std::size_t> allocation_limit = std::nullopt)
: resources{handle}
{
// replace the resource factory for the workspace_resources
resources::add_resource_factory(
std::make_shared<resource::workspace_resource_factory>(workspace_resource));
resource::set_workspace_resource(*this, workspace_resource, allocation_limit);
}

device_resources(const device_resources& handle) : resources{handle} {}
Expand All @@ -80,19 +81,23 @@ class device_resources : public resources {
* @param[in] stream_pool the stream pool used (which has default of nullptr if unspecified)
* @param[in] workspace_resource an optional resource used by some functions for allocating
* temporary workspaces.
* @param[in] allocation_limit the total amount of memory in bytes available to the temporary
* workspace resources.
*/
device_resources(rmm::cuda_stream_view stream_view = rmm::cuda_stream_per_thread,
std::shared_ptr<rmm::cuda_stream_pool> stream_pool = {nullptr},
rmm::mr::device_memory_resource* workspace_resource = nullptr)
std::shared_ptr<rmm::mr::device_memory_resource> workspace_resource = {nullptr},
std::optional<std::size_t> allocation_limit = std::nullopt)
: resources{}
{
resources::add_resource_factory(std::make_shared<resource::device_id_resource_factory>());
resources::add_resource_factory(
std::make_shared<resource::cuda_stream_resource_factory>(stream_view));
resources::add_resource_factory(
std::make_shared<resource::cuda_stream_pool_resource_factory>(stream_pool));
resources::add_resource_factory(
std::make_shared<resource::workspace_resource_factory>(workspace_resource));
if (workspace_resource) {
resource::set_workspace_resource(*this, workspace_resource, allocation_limit);
}
}

/** Destroys all held-up resources */
Expand Down Expand Up @@ -255,4 +260,4 @@ class stream_syncer {

} // namespace raft

#endif
#endif
9 changes: 5 additions & 4 deletions cpp/include/raft/core/handle.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ namespace raft {
*/
class handle_t : public raft::device_resources {
public:
handle_t(const handle_t& handle, rmm::mr::device_memory_resource* workspace_resource)
handle_t(const handle_t& handle,
std::shared_ptr<rmm::mr::device_memory_resource> workspace_resource)
: device_resources(handle, workspace_resource)
{
}
Expand All @@ -51,9 +52,9 @@ class handle_t : public raft::device_resources {
* @param[in] workspace_resource an optional resource used by some functions for allocating
* temporary workspaces.
*/
handle_t(rmm::cuda_stream_view stream_view = rmm::cuda_stream_per_thread,
std::shared_ptr<rmm::cuda_stream_pool> stream_pool = {nullptr},
rmm::mr::device_memory_resource* workspace_resource = nullptr)
handle_t(rmm::cuda_stream_view stream_view = rmm::cuda_stream_per_thread,
std::shared_ptr<rmm::cuda_stream_pool> stream_pool = {nullptr},
std::shared_ptr<rmm::mr::device_memory_resource> workspace_resource = {nullptr})
: device_resources{stream_view, stream_pool, workspace_resource}
{
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <raft/core/logger.hpp>
#include <raft/core/resource/device_memory_resource.hpp>

#include <rmm/mr/device/device_memory_resource.hpp>

#include <mutex>
#include <set>
#include <string>

namespace raft::resource::detail {

/**
* Warn a user of the calling algorithm if they use the default non-pooled memory allocator,
* as it may hurt the performance.
*
* This helper function is designed to produce the warning once for a given `user_name`.
*
* @param[in] res
* @param[in] user_name the name of the algorithm or any other identification.
*
*/
inline void warn_non_pool_workspace(resources const& res, std::string user_name)
{
// Detect if the plain cuda memory resource is used for the workspace
if (rmm::mr::cuda_memory_resource{}.is_equal(*get_workspace_resource(res)->get_upstream())) {
static std::set<std::string> notified_names{};
static std::mutex mutex{};
std::lock_guard<std::mutex> guard(mutex);
auto [it, inserted] = notified_names.insert(std::move(user_name));
if (inserted) {
RAFT_LOG_WARN(
"[%s] the default cuda resource is used for the raft workspace allocations. This may lead "
"to a significant slowdown for this algorithm. Consider using the default pool resource "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for adding this

"(`raft::resource::set_workspace_to_pool_resource`) or set your own resource explicitly "
"(`raft::resource::set_workspace_resource`).",
it->c_str());
}
}
}

} // namespace raft::resource::detail
Loading