Fix floating point data generation in benchmarks (#10372)

`numeric_limits::lowest` and `numeric_limits::max` are used as bounds for numeric type generation. However, for normal generators, bounds are shifted to `[0, upper_bound - lower_bound]`, and the random value is shifted back by `lower_bound`. with `lowest` and `max`, `upper_bound - lower_bound` is out of range for floats and generated values are `nan` and `inf`. This PR halves the ranges so that `upper_bound - lower_bound` is still within the type range. Expected to affect benchmarks that use floating point columns (e.g. Parquet reader benchmarks). Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Bradley Dice (https://github.com/bdice) - https://github.com/nvdbaranec URL: #10372
rapidsai · Mar 7, 2022 · 7d67093 · 7d67093
1 parent 4f8c60a
commit 7d67093
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/cpp/benchmarks/common/generate_input.hpp b/cpp/benchmarks/common/generate_input.hpp
@@ -114,7 +114,8 @@ std::pair<int64_t, int64_t> default_range()
 template <typename T, std::enable_if_t<cudf::is_numeric<T>()>* = nullptr>
 std::pair<T, T> default_range()
 {
-  return {std::numeric_limits<T>::lowest(), std::numeric_limits<T>::max()};
+  // Limits need to be such that `upper - lower` does not overflow
+  return {std::numeric_limits<T>::lowest() / 2, std::numeric_limits<T>::max() / 2};
 }
 }  // namespace