-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Test failure in arrow-compute-aggregate-test #12681
Comments
Swapping from gcc to clang and the tests than pass. I verified the same four tests fails on my system using gcc 11.2.0. |
We've recently had some issues with SIMD instructions seeming incorrect in certain compilers, which sounds like this might be related to. (See #12422 (comment)). Stumped us so far, but I think we are still looking into it. cc @rok @jonkeane |
Adding any of |
I confirm the problem on my machine. As I understand my CPU does not support AVX2 instructions and that was the main problem. arrow-flight-test'Running arrow-flight-test, redirecting output into /home/rstanislav/Desktop/arrow/src/build/build/test-logs/arrow-flight-test.txt (attempt 1/1) Traceback (most recent call last): File "/home/rstanislav/Desktop/arrow/src/apache-arrow-8.0.1/cpp/build-support/asan_symbolize.py", line 368, in loop.process_stdin() File "/home/rstanislav/Desktop/arrow/src/apache-arrow-8.0.1/cpp/build-support/asan_symbolize.py", line 340, in process_stdin line = sys.stdin.readline() File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 3112: invalid continuation byte ~/Desktop/arrow/src/build/src/arrow/flight Test time = 1.36 secMy system:OS: Manjaro 21.3.6 Ruah lscpu:Архитектура: x86_64 |
Hmm, it's a pity that we're not displaying the actual values being compared on failure. |
@erydit The GDAL is issue is unrelated, please let's not conflate these. Also, the fact that Arrow doesn't work on an AVX2 CPU if compiled with AVX2 enabled is entirely expected. It is not a bug. |
These are failing on Fedora aarch64 with 16.1.0. It appears that these now contain more information, and seem like they're small floating-point discrepancies: `TestRandomInt64QuantileKernel` failures
|
I tested arrow 16.1.0 tag in fedora42 container on an arm neoverse n2 server, but didn't reproduce this issue. All tests passed. |
From hardware info, the failure happens on neoverse-n1, probably in a virtual machine due to low cpu count. |
Also fails on s390x in a similar way: s390x failures
|
Perhaps it may depend on FPU settings, or on compiler options? |
By the way, looking at one of the failing values, the values are only different at the last bit of precision: >>> x = 124.6864762710842
>>> y = 124.68647627108422
>>> x.hex()
'0x1.f2bef3a2b7259p+6'
>>> y.hex()
'0x1.f2bef3a2b725ap+6'
>>> math.nextafter(x, math.inf)
124.68647627108422
>>> math.nextafter(x, math.inf) == y
True |
Managed to reproduce this error by adding below compiler options. Trying to narrow down the culprit.
|
Per my test, |
I took a look at the kind of things that I think what happens is that enabling What happens if you keep |
I only toggled CMAKE_CXX_FLAGS option
|
Looks to me it's a compiler bug. The point is from below code line in aggregate_test.cc. The same equation is used at quantile kernel side.
with
without
I tried deliberately swap the two addends at the quantile kernel side, the test passed. |
That sounds unexpected to me? At least with "normal" (non-NaN) operands. Can you give examples of operands whose addition is non-commutative? |
I cannot find an example. Build with gcc -O3.
On x86 server, no error:
test code
|
Okay, so that might be because the compiled code uses FMADD which potentially removes one rounding step, yielding slightly different results. We can probably simply relax the test a tiny little bit. Given that this is gonna come up in other situations, we may add some helper functions: bool WithinUlp(float left, float right, int n_ulps);
bool WithinUlp(double left, double right, int n_ulps);
void AssertWithinUlp(float left, float right, int n_ulps);
void AssertWithinUlp(double left, double right, int n_ulps); I can write those helpers if that sounds ok. |
Yes, thanks @pitrou ! |
You are right. x86 also fails the test if build with |
(JFTR, addition and multiplication should be commutative for floating point numbers, associative is what they might not be) |
Indeed. But for fma operation, IMO compiler should always stick to one ordering to prevent surprising result. |
While building for Arch Linux, I’m observing 4 tests failures in the aforementioned suite:
They are also happening with 6.0.1, but were not sometime ago so I suspect an update in some of arrow dependencies to be responsible for this. I’m happy to provide any information that could be useful, but I don’t want to create an account on JIRA, I try to limit the number of accounts I have everywhere.
The text was updated successfully, but these errors were encountered: