Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FLOAT32 rounding more inaccurate than necessary #14528

Open
jlowe opened this issue Nov 29, 2023 · 10 comments
Open

[BUG] FLOAT32 rounding more inaccurate than necessary #14528

jlowe opened this issue Nov 29, 2023 · 10 comments
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS wontfix This will not be worked on

Comments

@jlowe
Copy link
Member

jlowe commented Nov 29, 2023

Describe the bug
When rounding single-precision floats to a specified number of decimal places, the result can be slightly inaccurate due to the intermediate computations being forced into FLOAT32 as well. round.cu has rounding functors for non-fixed-point types, but all of the intermediate results are in the same type as the input rather than the highest precision type, double. This means more error is introduced during the rounding computation than is necessary.

Steps/Code to reproduce bug
The following code demonstrates the problem:

#include <cudf/round.hpp>
#include <cudf_test/column_utilities.hpp>
#include <cudf_test/column_wrapper.hpp>
#include <cudf_test/debug_utilities.hpp>

int main(int argc, char** argv) {
  auto const input =
    cudf::test::fixed_width_column_wrapper<float>{6.121944898040965e-05f};
  cudf::test::print(input);
  auto const result = cudf::round(input, 10, cudf::rounding_method::HALF_UP);
  cudf::test::print(*result);
  return 0;
}

Rounding the value to the tenth decimal place should round down to approximately 6.12194e-05 but instead the value is rounded up to approximately 6.12195e-05 as shown in the output when running the program:

6.1219449e-05
6.12194999e-05

Expected behavior
FLOAT32 rounding should use FLOAT64 for intermediate results during computation to try to avoid injecting errors beyond what is necessary when dealing with floating point numbers. When I manually performed the computations on this example input value for round.cu's half_up_positive logic but using double instead of float for the intermediate values, the answer came out rounded down as expected.

It seems that the functors for floating point rounding in round.cu should not be using whatever the input type is but rather double explicitly to avoid unnecessary additional error during the computation.

@jlowe jlowe added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels Nov 29, 2023
@bdice
Copy link
Contributor

bdice commented Nov 29, 2023

Looking at the implementation of our rounding code (specifically half_up_positive), I am concerned that it's not just a matter of float becoming double. I suspect double values could see the same type of problem, if the number of decimal places is large enough. I think we may need to use an algorithm that is more numerically stable and has less precision loss due to truncation. See also #14169, which is a separate problem with rounding algorithms in libcudf affecting float-to-decimal conversions.

@jlowe
Copy link
Member Author

jlowe commented Nov 29, 2023

Yes, totally agree that double will have similar issues with larger decimal place rounding. As such I don't see this so much as "solving" all the rounding issues as significantly improving the results for FLOAT32 with minimal effort.

@wence-
Copy link
Contributor

wence- commented Nov 30, 2023

I think this might be possible to handle by using appropriately selected CUDA intrinsics for the division step. If we're rounding half-up we should use division with round to positive infinity, rather than round to nearest ties to even (which is the default):

Edit: I misunderstood the definition of half_up rounding, which only breaks ties. Though one might still be able to get away with something like this

#include <cstdio>
#include <cuda_runtime_api.h>
__global__ void test(float val, float *out) {
  float decimals = std::pow(10, 10);
  float ipart;
  float remainder = modff(val, &ipart);
  float scaled = roundf(remainder * decimals);
  out[0] = ipart + __fdiv_ru(scaled, decimals);
  out[1] = ipart + __fdiv_rn(scaled, decimals);
}

int main(void) {
  float input = 6.121944898040965e-05f;
  float *val;
  cudaMalloc(&val, 2 * sizeof(float));
  test<<<1, 1, 1>>>(input, val);
  float h_val[2];
  cudaMemcpy(&h_val, val, 2*sizeof(float), cudaMemcpyDeviceToHost);
  printf("in:      %.20f\n", input);
  printf("out(ru): %.20f\n", h_val[0]);
  printf("out(rn): %.20f\n", h_val[1]);
  cudaFree(val);
}

Produces:

in:      0.00006121944898040965
out(ru): 0.00006121950718807057
out(rn): 0.00006121949991211295

But I haven't thought through all the consequences of this change.

@wence-
Copy link
Contributor

wence- commented Nov 30, 2023

That said, I claim that half_up rounding doesn't make much sense as tiebreaking scheme for binary floating point values since the only fractional floating point values that exactly end in 5 are those of the form $2^{-n}$ for $n \ge 1$.

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment and removed Needs Triage Need team to review and classify labels Dec 14, 2023
@pmattione-nvidia
Copy link
Contributor

I started looking into this, and noticed a problem with our current implementation: HALF_UP rounding is bugged for negative numbers:

if you round (e.g.) -0.5f to zero decimal places, it should round up to zero, but instead results in -1 because it calls roundf(), which rounds away from zero.

Do we want to change the rounding code to round up, or change the name/comment to ROUND_AWAY?

@jlowe
Copy link
Member Author

jlowe commented Jan 31, 2024

I think the "UP" here refers to higher magnitude rather than higher value. FWIW Java's HALF_UP rounding has same round-away-from-zero semantics (see Javadocs here), so the HALF_UP semantics match what we want from the Spark perspective.

@pmattione-nvidia
Copy link
Contributor

That's fine with me, I'll fix the code comment then in round.hpp (which should instead point to wikipedia's rounding half away from zero).

@harrism
Copy link
Member

harrism commented Feb 5, 2024

When rounding single-precision floats to a specified number of decimal places

The above is a nonsense statement, in my book. By now you should know my opinion on this: libcudf should not support round to decimal places on binary floating-point types.

#406
#1270
#1340
#7195

I'm sure there are more.

@pmattione-nvidia
Copy link
Contributor

The most accurate way to round would be to convert floating-point to fixed_point (to more decimal places than float/doubles precision), round as desired, then convert that back to floating.

But as @harrism brings up this will still produce "unexpected" results (e.g. 1.151f to 2 decimal places -> fixed point 115 w/ scale -1 -> 1.1499999f -> string "1.14").

So I think the question is: do we want to do these conversions in this kernel, or should we block using floating point for the round kernel, and instead require users to explicitly choose the chain of operations above?

@harrism
Copy link
Member

harrism commented Jul 22, 2024

I think licudf shouldn't support rounding binary float types to specified number of decimal places. Just like the C++ standard provides no function for this.

https://en.cppreference.com/w/cpp/numeric/math/round

Computes the nearest integer value to num (in floating-point format)

@davidwendt davidwendt added the wontfix This will not be worked on label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS wontfix This will not be worked on
Projects
Status: No status
Development

No branches or pull requests

7 participants