Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANN bench options to specify CAGRA graph and dataset locations #1896

Merged
merged 91 commits into from
Nov 16, 2023
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
d1b53b1
Separate cagra index type from internal idx type
tfeher Jul 21, 2023
a617988
Merge branch 'branch-23.08' into cagra_matrix_idx
tfeher Jul 23, 2023
69effcd
Merge remote-tracking branch 'origin/branch-23.08' into cagra_matrix_idx
tfeher Jul 25, 2023
2a67251
wip
tfeher Jul 26, 2023
01a12aa
Merge remote-tracking branch 'origin/branch-23.08' into cagra_orig_bench
tfeher Jul 30, 2023
c3ce61e
Orig CAGRA benchmark works
tfeher Jul 31, 2023
49ca646
cagra pin dataset
tfeher Jul 31, 2023
bded674
Adding FAISS cpu to raft-ann-bench
cjnolet Sep 11, 2023
f0e3c8f
Adding faiss cpu indexes and build
cjnolet Sep 12, 2023
f66fd21
Docs updates
cjnolet Sep 12, 2023
20b793a
Merge branch 'branch-23.10' into enh-ann-bench-faiss-cpu
cjnolet Sep 12, 2023
ad255fd
Merge branch 'branch-23.10' into enh-ann-bench-faiss-cpu
cjnolet Sep 12, 2023
6d7f390
Resetting build all gpu arch to 0
cjnolet Sep 12, 2023
491d090
Merge branch 'enh-ann-bench-faiss-cpu' of github.com:cjnolet/raft int…
cjnolet Sep 12, 2023
c9569a5
Doc updates
cjnolet Sep 12, 2023
28bee2b
More updates
cjnolet Sep 12, 2023
1e7ba4f
Cleaning up includes
cjnolet Sep 12, 2023
563b386
Explicitly adding spdlog and fmt
cjnolet Sep 13, 2023
9585f20
Using selectors for faiss
cjnolet Sep 13, 2023
87e3be0
Adding ability to link against faiss avx lib (only if arch supports it)
cjnolet Sep 13, 2023
74e6a5d
Removing some legacy get_faiss cmake bits
cjnolet Sep 13, 2023
fcd029f
Updating faiss cpu to override search params
cjnolet Sep 13, 2023
a56227e
Trying again.
cjnolet Sep 14, 2023
3fcd1e9
Making libfaiss installs either or
cjnolet Sep 14, 2023
208fe0e
Merge remote-tracking branch 'origin/branch-23.10' into cagra_pin_dat…
tfeher Sep 24, 2023
929005b
Enable orig CAGRA benchmark
tfeher Sep 24, 2023
c63cbcd
update readme
tfeher Sep 24, 2023
1ec75ba
Using consistent naming for faiss algos
cjnolet Sep 25, 2023
a5585fa
Merge remote-tracking branch 'origin/branch-23.10' into enh-ann-bench…
cjnolet Sep 25, 2023
7d21375
Updating faiss version
cjnolet Sep 25, 2023
001c224
Pringing raft_faiss_targets
cjnolet Sep 25, 2023
c430bb8
Using faiss from pytorch
cjnolet Sep 25, 2023
02cb915
Use huge_page_resource, memmap input graph
tfeher Sep 28, 2023
30428fd
Building faiss statically each time. Will slow down CI but alleviate …
cjnolet Sep 28, 2023
b5606c1
Merge branch 'branch-23.10' into enh-ann-bench-faiss-cpu
cjnolet Sep 28, 2023
db2d210
Updates
cjnolet Sep 28, 2023
cb2eef8
Reverting
cjnolet Sep 28, 2023
375c38e
Using https for faiss github repo
cjnolet Sep 28, 2023
c4fb53c
Trying again
cjnolet Oct 2, 2023
f38031a
Merge branch 'branch-23.10' into enh-ann-bench-faiss-cpu
cjnolet Oct 2, 2023
8bb273c
Using corey's fork for now
cjnolet Oct 2, 2023
d539316
More updates
cjnolet Oct 2, 2023
fce179b
CHecking cudatoolkit library dir
cjnolet Oct 3, 2023
f54a757
iTerminating string
cjnolet Oct 3, 2023
385b4f4
Teach faiss about conda [hacky]
robertmaynard Oct 4, 2023
95c12db
Adding thread pool to overlap faiss queries
cjnolet Oct 4, 2023
7b67e89
Merge branch 'branch-23.12' into enh-ann-bench-faiss-cpu
cjnolet Oct 5, 2023
419d994
Merge branch 'branch-23.12' into enh-ann-bench-faiss-cpu
cjnolet Oct 5, 2023
1e7b5c8
Seeing if this fixes the devcontainers
cjnolet Oct 6, 2023
36d4dd3
Merge branch 'branch-23.12' into enh-ann-bench-faiss-cpu
cjnolet Oct 6, 2023
667b95c
Fixing dependencies.yml
cjnolet Oct 6, 2023
daffaf4
Adding openblas to nn_bench deps
cjnolet Oct 7, 2023
eb167e3
Merge branch 'raft_cagra_pin_huge_page' into huge_page
cjnolet Oct 7, 2023
284897c
Merge branch 'enh-ann-bench-faiss-cpu' into huge_page
cjnolet Oct 7, 2023
072d43d
improve benchmark settings
tfeher Oct 9, 2023
6d9082d
Merge remote-tracking branch 'tfeher_gitlab/raft_cagra_pin_huge_page'…
cjnolet Oct 9, 2023
c4eee56
Merge branch 'branch-23.12' into huge_page
cjnolet Oct 12, 2023
3e9079d
Fixing style
cjnolet Oct 12, 2023
15539ea
Merge branch 'branch-23.12' into huge_page
cjnolet Oct 18, 2023
f9aad90
Merge branch 'branch-23.12' into huge_page
tfeher Oct 18, 2023
b33363e
remove unnecessary changes
tfeher Oct 18, 2023
32e7b31
Do specific cagra graph/dataset memory allocation in the benchmark
tfeher Oct 20, 2023
07c9b55
Merge remote-tracking branch 'origin/branch-23.12' into huge_page
tfeher Oct 20, 2023
dec8c53
Merge branch 'branch-23.12' into huge_page
tfeher Nov 6, 2023
b223dae
Remove debug printouts and improve docstrings
tfeher Nov 6, 2023
be139f5
Merge branch 'branch-23.12' into huge_page
tfeher Nov 6, 2023
68bd922
Merge branch 'branch-23.12' into huge_page
tfeher Nov 7, 2023
53c6ded
Improve comments, errors, naming
tfeher Nov 8, 2023
4ec2576
Merge remote-tracking branch 'origin/branch-23.12' into huge_page
tfeher Nov 8, 2023
e355ff0
update tuning guide
tfeher Nov 8, 2023
64705d3
Merge remote-tracking branch 'origin/branch-23.12' into huge_page
tfeher Nov 8, 2023
0cb632a
corret tuning guide
tfeher Nov 8, 2023
7aded1b
Merge branch 'branch-23.12' into huge_page
tfeher Nov 9, 2023
8f0f78b
Merge branch 'branch-23.12' into huge_page
tfeher Nov 9, 2023
8ad652c
Using _memory_type for consistencyy
cjnolet Nov 9, 2023
313876b
Adding correct link for hnsw params
cjnolet Nov 9, 2023
abc4343
Merge branch 'branch-23.12' into huge_page
cjnolet Nov 9, 2023
69f8d88
Merge branch 'branch-23.12' into huge_page
tfeher Nov 9, 2023
ef6d8aa
Merge branch 'branch-23.12' into huge_page
tfeher Nov 9, 2023
e3b2726
Merge branch 'branch-23.12' into huge_page
tfeher Nov 14, 2023
ca5ab3a
Merge branch 'branch-23.12' into huge_page
tfeher Nov 14, 2023
bb0f9e8
Merge branch 'branch-23.12' into huge_page
tfeher Nov 14, 2023
ca2f17a
Fix merge error
tfeher Nov 14, 2023
6f9a39c
Resolve CAGAR bench parameter name conflict
tfeher Nov 14, 2023
6ca2a3d
Update dataset memory allocation according to changed order of set_se…
tfeher Nov 15, 2023
0c44d84
Add benchmark arg to control log level
tfeher Nov 15, 2023
fb7847c
Add raft_log_level arg to python wrapper
tfeher Nov 15, 2023
a1f1e97
Merge branch 'branch-23.12' into huge_page
cjnolet Nov 16, 2023
36cef0d
Fixing log level docs and option
cjnolet Nov 16, 2023
918cd35
More robust index file
cjnolet Nov 16, 2023
61339e0
Use "force" instead of "overwrite"
cjnolet Nov 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions cpp/bench/ann/src/common/cuda_huge_page_resource.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
tfeher marked this conversation as resolved.
Show resolved Hide resolved

#include <rmm/cuda_stream_view.hpp>
#include <rmm/detail/error.hpp>
#include <rmm/mr/device/device_memory_resource.hpp>
#include <sys/mman.h>

#include <cstddef>

namespace raft::mr {
/**
* @brief `device_memory_resource` derived class that uses mmap to allocate memory.
* This class enables memory allocation using huge pages.
* It is assumed that the allocated memory is directly accessible on device. This currently only
* works on GH systems.
*
* TODO(tfeher): consider improving or removing this helper once we made progress with
* https://github.com/rapidsai/raft/issues/1819
*/
class cuda_huge_page_resource final : public rmm::mr::device_memory_resource {
public:
cuda_huge_page_resource() = default;
~cuda_huge_page_resource() override = default;
cuda_huge_page_resource(cuda_huge_page_resource const&) = default;
cuda_huge_page_resource(cuda_huge_page_resource&&) = default;
cuda_huge_page_resource& operator=(cuda_huge_page_resource const&) = default;
cuda_huge_page_resource& operator=(cuda_huge_page_resource&&) = default;

/**
* @brief Query whether the resource supports use of non-null CUDA streams for
* allocation/deallocation. `cuda_huge_page_resource` does not support streams.
*
* @returns bool false
*/
[[nodiscard]] bool supports_streams() const noexcept override { return false; }

/**
* @brief Query whether the resource supports the get_mem_info API.
*
* @return true
*/
[[nodiscard]] bool supports_get_mem_info() const noexcept override { return true; }

private:
/**
* @brief Allocates memory of size at least `bytes` using cudaMalloc.
*
* The returned pointer has at least 256B alignment.
*
* @note Stream argument is ignored
*
* @throws `rmm::bad_alloc` if the requested allocation could not be fulfilled
*
* @param bytes The size, in bytes, of the allocation
* @return void* Pointer to the newly allocated memory
*/
void* do_allocate(std::size_t bytes, rmm::cuda_stream_view) override
{
void* _addr{nullptr};
_addr = mmap(NULL, bytes, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a request provide an alternative methods for allocation (without mmap) so that we can use Transparent Huge Pages. I will submit a follow up PR with these changes. I am thinking about the following design:

enum class AllocatorType { HostPinned, HostMmap, HostMalloc }

class host_device_allocator final : public rmm::mr::device_memory_resource {
  host_device_allocator(AllocatorType type): type_(type) {}
  //...
  void* do_alocate(std::size_t bytes, rmm::cuda_stream_view) override {
    switch(type_) {
      case HostPinned:
        cudaMallocHost(&ptr, bytes));
        return ptr;
      case HostMmap:
        ptr = mmap(...);
        //...
        return ptr;
      case HostMalloc:
        //...
    } 
  }
 private:
  AllocatorType type_;
//...
}

In this case we would have a single custom mr class that we would configure according to the benchmark input parameter. Let me know what do you think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll have to think about this. I don't think I generally mind the approach of having a composite_memory_resource or delegating_memory_resource but I think we should continue using RMM's abstraction layers everywhere possible instead of, for example, calling cudaMallocHost directly. My concern is a pretty easy fix, though, by just storing the corresponding memory_resource instances inside the delegating_memory_resource and delegating as necessary. That would maintain the compatibility w/ RMM (this could, for example, still be set as the current_device_resource if we so wanted).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also cc'ing @harrism for his thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class above is a subclass of rmm::mr::device_memory_resource. We are using it to allocate mdarrays that are accessible from both host and device. It is a workaround to unblock some tests.

I feel that RAFT's distinction between host_mdarry and device_mdarray stands in the way. What our algorithm needs is an mdarray which we can configured with any allocator (host, device). In a few places our algorithm does not care whether the input is a host_mdarray or a device_mdarray, just passes the pointer to cudaMemcpy, to fetch a chunk of the array we are working on.

How we allocate mdarray is described by a ContainerPolicy class. So probably the only thing we need is to implement a contaner policy that could be initialized by both device_memory_resource and host_memory_resource. Afterwards we can simply utilize existing RMM memory resources.

The only thing we are missing from RMM side is a memory resource that uses mmap and madvise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look, it's fine to provide your own base-level memory_resource classes that derive from RMM resource base classes. The problem today is that there is no common base for host and device MRs. We are starting to refactor on top of cuda::memory_resource which abstracts memory kind as a template parameter, and adds resource_ref and async_resource_ref which have template parameters to specify memory kind. This means you can have a function that accepts any memory resource that is device accessible, host accessible, or both, or whatever kind you can define. And we can start to reuse all the MR and adaptor types in RMM for different kinds of memory.

The problem with @tfeher 's suggestion at the top is that it appears to mix host- and device-accessible memory kinds under device_memory_resource, which is supposed to allocate only device-accessible memory.

As to @cjnolet 's comment, I agree that you should have separate resources for different kinds of allocation, and then have a higher level MR that takes pointers/refs to MRs for each kind you want to dispatch to, rather than hard-coding different allocation calls within a single MR.

if (_addr == MAP_FAILED) { RAFT_FAIL("huge_page_resource::MAP FAILED"); }
if (madvise(_addr, bytes, MADV_HUGEPAGE) == -1) {
munmap(_addr, bytes);
RAFT_FAIL("huge_page_resource::madvise MADV_HUGEPAGE");
}
memset(_addr, 0, bytes);
return _addr;
}

/**
* @brief Deallocate memory pointed to by \p p.
*
* @note Stream argument is ignored.
*
* @throws Nothing.
*
* @param p Pointer to be deallocated
*/
void do_deallocate(void* ptr, std::size_t size, rmm::cuda_stream_view) override
{
if (munmap(ptr, size) == -1) { RAFT_FAIL("huge_page_resource::munmap"); }
}

/**
* @brief Compare this resource to another.
*
* Two cuda_huge_page_resources always compare equal, because they can each
* deallocate memory allocated by the other.
*
* @throws Nothing.
*
* @param other The other resource to compare to
* @return true If the two resources are equivalent
* @return false If the two resources are not equal
*/
[[nodiscard]] bool do_is_equal(device_memory_resource const& other) const noexcept override
{
return dynamic_cast<cuda_huge_page_resource const*>(&other) != nullptr;
}

/**
* @brief Get free and available memory for memory resource
*
* @throws `rmm::cuda_error` if unable to retrieve memory info.
*
* @return std::pair contaiing free_size and total_size of memory
*/
[[nodiscard]] std::pair<std::size_t, std::size_t> do_get_mem_info(
rmm::cuda_stream_view) const override
{
std::size_t free_size{};
std::size_t total_size{};
RMM_CUDA_TRY(cudaMemGetInfo(&free_size, &total_size));
return std::make_pair(free_size, total_size);
}
};
} // namespace raft::mr
130 changes: 130 additions & 0 deletions cpp/bench/ann/src/common/cuda_pinned_resource.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <rmm/mr/device/device_memory_resource.hpp>
tfeher marked this conversation as resolved.
Show resolved Hide resolved

#include <rmm/cuda_stream_view.hpp>
#include <rmm/detail/error.hpp>

#include <cstddef>

namespace raft::mr {
/**
* @brief `device_memory_resource` derived class that uses cudaMallocHost/Free for
* allocation/deallocation.
*
* This is almost the same as rmm::mr::host::pinned_memory_resource, but it has
* device_memory_resource as base class. Pinned memory can be accessed from device,
* and using this allocator we can create device_mdarray backed by pinned allocator.
*
* TODO(tfeher): it would be preferred to just rely on the existing allocator from rmm
* (pinned_memory_resource), but that is incompatible with the container_policy class
* for device matrix, because the latter expects a device_memory_resource. We shall
* revise this once we progress with Issue https://github.com/rapidsai/raft/issues/1819
*/
class cuda_pinned_resource final : public rmm::mr::device_memory_resource {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to see these new memory resources worked back into RMM eventually.

Copy link
Contributor

@tfeher tfeher Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RMM has a pinned memory resource. But it is a subclass of host_memory_resource. The problem is on our side: our API for managing allocators for mdspan requires a device_memory_resource. We would need to enable host allocator there, and then we could use rmm::mr::host_memory_resource.

Copy link
Member

@harrism harrism Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a plan to add a device-accessible pinned memory resource. Ideally using the new cuda::memory_resource refactoring.

public:
cuda_pinned_resource() = default;
~cuda_pinned_resource() override = default;
cuda_pinned_resource(cuda_pinned_resource const&) = default;
cuda_pinned_resource(cuda_pinned_resource&&) = default;
cuda_pinned_resource& operator=(cuda_pinned_resource const&) = default;
cuda_pinned_resource& operator=(cuda_pinned_resource&&) = default;

/**
* @brief Query whether the resource supports use of non-null CUDA streams for
* allocation/deallocation. `cuda_pinned_resource` does not support streams.
*
* @returns bool false
*/
[[nodiscard]] bool supports_streams() const noexcept override { return false; }

/**
* @brief Query whether the resource supports the get_mem_info API.
*
* @return true
*/
[[nodiscard]] bool supports_get_mem_info() const noexcept override { return true; }

private:
/**
* @brief Allocates memory of size at least `bytes` using cudaMalloc.
*
* The returned pointer has at least 256B alignment.
*
* @note Stream argument is ignored
*
* @throws `rmm::bad_alloc` if the requested allocation could not be fulfilled
*
* @param bytes The size, in bytes, of the allocation
* @return void* Pointer to the newly allocated memory
*/
void* do_allocate(std::size_t bytes, rmm::cuda_stream_view) override
{
void* ptr{nullptr};
RMM_CUDA_TRY_ALLOC(cudaMallocHost(&ptr, bytes));
return ptr;
}

/**
* @brief Deallocate memory pointed to by \p p.
*
* @note Stream argument is ignored.
*
* @throws Nothing.
*
* @param p Pointer to be deallocated
*/
void do_deallocate(void* ptr, std::size_t, rmm::cuda_stream_view) override
{
RMM_ASSERT_CUDA_SUCCESS(cudaFreeHost(ptr));
}

/**
* @brief Compare this resource to another.
*
* Two cuda_pinned_resources always compare equal, because they can each
* deallocate memory allocated by the other.
*
* @throws Nothing.
*
* @param other The other resource to compare to
* @return true If the two resources are equivalent
* @return false If the two resources are not equal
*/
[[nodiscard]] bool do_is_equal(device_memory_resource const& other) const noexcept override
{
return dynamic_cast<cuda_pinned_resource const*>(&other) != nullptr;
}

/**
* @brief Get free and available memory for memory resource
*
* @throws `rmm::cuda_error` if unable to retrieve memory info.
*
* @return std::pair contaiing free_size and total_size of memory
*/
[[nodiscard]] std::pair<std::size_t, std::size_t> do_get_mem_info(
rmm::cuda_stream_view) const override
{
std::size_t free_size{};
std::size_t total_size{};
RMM_CUDA_TRY(cudaMemGetInfo(&free_size, &total_size));
return std::make_pair(free_size, total_size);
}
};
} // namespace raft::mr
17 changes: 17 additions & 0 deletions cpp/bench/ann/src/raft/raft_benchmark.cu
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,21 @@ void parse_build_param(const nlohmann::json& conf,
}
}

AllocatorType parse_allocator(std::string mem_type)
{
if (mem_type == "device") {
tfeher marked this conversation as resolved.
Show resolved Hide resolved
return AllocatorType::Device;
} else if (mem_type == "host_pinned") {
return AllocatorType::HostPinned;
} else if (mem_type == "host_huge_page") {
return AllocatorType::HostHugePage;
}
THROW(
"Invalid value for memory type %s, must be one of [\"device\", \"host_pinned\", "
"\"host_huge_page\"",
mem_type.c_str());
}

template <typename T, typename IdxT>
void parse_search_param(const nlohmann::json& conf,
typename raft::bench::ann::RaftCagra<T, IdxT>::SearchParam& param)
Expand All @@ -236,6 +251,8 @@ void parse_search_param(const nlohmann::json& conf,
THROW("Invalid value for algo: %s", tmp.c_str());
}
}
if (conf.contains("graph_mem")) { param.graph_mem = parse_allocator(conf.at("graph_mem")); }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we spell out *_memory_type to match the other properties that we parse for other algorithms? Will get confusing to set these if we use different spellings.

if (conf.contains("dataset_mem")) { param.dataset_mem = parse_allocator(conf.at("dataset_mem")); }
}
#endif

Expand Down
Loading
Loading