Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Separating the way the benhcmarks are measured into `throughput` and `latency` modes. - `latency` mode accumulates the times for each batch to be processed and then estimates QPS and provides the average time spent doing processing on the GPU. For batch size of 1, this becomes a fairly estimate of average latency per query. For larger batches, it becomes a fairly accurate estimate of time spent per batch. - `throughput` mode pipelines the individual batches using a thread pool (and stream pool for the GPU algos). For both smaller and larger batches, this gives a good estimate of the amount of data we can push through the hardware in a period of time. A good comprehensive comparison will include both of these numbers. Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #1920
- Loading branch information