Adding throughput and latency modes to raft-ann-bench (#1920)

Separating the way the benhcmarks are measured into `throughput` and `latency` modes. - `latency` mode accumulates the times for each batch to be processed and then estimates QPS and provides the average time spent doing processing on the GPU. For batch size of 1, this becomes a fairly estimate of average latency per query. For larger batches, it becomes a fairly accurate estimate of time spent per batch. - `throughput` mode pipelines the individual batches using a thread pool (and stream pool for the GPU algos). For both smaller and larger batches, this gives a good estimate of the amount of data we can push through the hardware in a period of time. A good comprehensive comparison will include both of these numbers. Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #1920
rapidsai · Oct 28, 2023 · 9ad76fa · 9ad76fa
1 parent 0d199f9
commit 9ad76fa
Show file tree

Hide file tree

Showing 11 changed files with 299 additions and 150 deletions.
diff --git a/cpp/bench/ann/CMakeLists.txt b/cpp/bench/ann/CMakeLists.txt
@@ -106,10 +106,8 @@ if(RAFT_ANN_BENCH_USE_GGNN)
 endif()
 
 if(RAFT_ANN_BENCH_USE_FAISS)
-  # We need to ensure that faiss has all the conda
-  # information. So we currently use the very ugly
-  # hammer of `link_libraries` to ensure that all
-  # targets in this directory and the faiss directory
+  # We need to ensure that faiss has all the conda information. So we currently use the very ugly
+  # hammer of `link_libraries` to ensure that all targets in this directory and the faiss directory
   # will have the conda includes/link dirs
   link_libraries($<TARGET_NAME_IF_EXISTS:conda_env>)
   include(cmake/thirdparty/get_faiss.cmake)

diff --git a/cpp/bench/ann/src/common/ann_types.hpp b/cpp/bench/ann/src/common/ann_types.hpp
@@ -24,6 +24,11 @@
 
 namespace raft::bench::ann {
 
+enum Objective {
+  THROUGHPUT,  // See how many vectors we can push through
+  LATENCY      // See how fast we can push a vector through
+};
+
 enum class MemoryType {
   Host,
   HostMmap,
@@ -59,10 +64,17 @@ inline auto parse_memory_type(const std::string& memory_type) -> MemoryType
   }
 }
 
-struct AlgoProperty {
+class AlgoProperty {
+ public:
+  inline AlgoProperty() {}
+  inline AlgoProperty(MemoryType dataset_memory_type_, MemoryType query_memory_type_)
+    : dataset_memory_type(dataset_memory_type_), query_memory_type(query_memory_type_)
+  {
+  }
   MemoryType dataset_memory_type;
   // neighbors/distances should have same memory type as queries
   MemoryType query_memory_type;
+  virtual ~AlgoProperty() = default;
 };
 
 class AnnBase {
@@ -79,7 +91,8 @@ template <typename T>
 class ANN : public AnnBase {
  public:
   struct AnnSearchParam {
-    virtual ~AnnSearchParam() = default;
+    Objective metric_objective = Objective::LATENCY;
+    virtual ~AnnSearchParam()  = default;
     [[nodiscard]] virtual auto needs_dataset() const -> bool { return false; };
   };