You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The distance values returned with IVF flat search method might be incorrect in certain cases
IVF flat search is used by ANN refinement (if the dataset is in device memory) #1052. The Python tests of refinement revealed this problem. This only happens with int8 data (and not with uint8). IVF flat search has specific kernels for different data types, this might point towards a bug in the int8 specialization.
Note: the returned neighbor indices are correct, only the distance values show this problem.
Steps/Code to reproduce bug
To reproduce the issue:
autooperator==(const idx_dist_pair<IdxT, DistT, CompareDist>& a) const -> bool
{
if (idx == a.idx) returntrue;
if (eq_compare(dist, a.dist)) returntrue;
returnfalse;
The unit test shall be extended with a test of the actual distance values. The actual distance between the neighbor and the query vector shall be inspected, and compared to a naive distance calculation. Such test is done in the IVF-PQ python tests and in the refinment test. But such test is not yet applied for the IVF-flat results.
Note, even IVF PQ had an issue with the distance values, but that was fixed in #892
The text was updated successfully, but these errors were encountered:
Solves #1058
This is a tricky bug so the fix deserves some explanation. The previous implementation of `euclidean_dist` was the following in vectorized cases, where `x` and `y` are `int32` vectors of 4 `int8` each and `acc` is a single `int32` number to accumulate the distance in:
```c++
// Compute vectorized absolute differences independently.
const auto diff = static_cast<int32_t>(__vabsdiffs4(x, y));
// Square, reduce, and add to the accumulator.
acc = dp4a(diff, diff, acc);
```
Now consider the following case:
```c++
x = 0x80; // -128, 0, 0, 0
y = 0x7f; // 127, 0, 0, 0
```
The difference between -128 and 127 is 255, represented as `FF` (`__vabsdiffs4` is smart enough not to compute `abs(a-b)` which would result in `01`). However, if we call the signed version of `dp4a`, `FF` is cast from `int8` to `int32` as `FFFFFFFF` (or -1). The square of -1 is 1, which is added to `acc` (instead of 65025).
As the output of `__vabsdiffs4` is correct when considered as an unsigned number, and as addition is the same for signed and unsigned in 2's complement (and `acc` is positive anyway), the easiest fix is to use the unsigned version of `dp4a`, which will cast overflowed differences properly to 32 bits. The previous code simply becomes:
```c++
const auto diff = __vabsdiffs4(x, y);
acc = dp4a(diff, diff, static_cast<uint32_t>(acc));
```
-----
Additionally, to avoid underflows in the non-vectorized unsigned case, I replaced the subtraction with `__usad` (absolute difference of unsigned numbers). Note that using the subtraction was correct anyway, because the addition/subtraction is the same for unsigned and signed integers, as well as the least significant half of the multiplication (which is the part that is stored), and the square of a number is also the square of its opposite. Consider:
```c++
uint32_t a = 10;
uint32_t b = 20;
uint32_t c = a - b; // fffffff6, i.e -10 or 4294967286
uint32_t d = c * c; // (ffffffec)00000064, i.e 100
```
Authors:
- Louis Sugy (https://github.com/Nyrio)
Approvers:
- Tamas Bela Feher (https://github.com/tfeher)
- Corey J. Nolet (https://github.com/cjnolet)
- Artem M. Chirkin (https://github.com/achirkin)
URL: #1122
Describe the bug
The distance values returned with IVF flat search method might be incorrect in certain cases
IVF flat search is used by ANN refinement (if the dataset is in device memory) #1052. The Python tests of refinement revealed this problem. This only happens with int8 data (and not with uint8). IVF flat search has specific kernels for different data types, this might point towards a bug in the int8 specialization.
Note: the returned neighbor indices are correct, only the distance values show this problem.
Steps/Code to reproduce bug
To reproduce the issue:
Uncomment the following lines
https://github.com/tfeher/raft/blob/c34952ba42ca313e280f89fe4a0bc6a495a03852/python/pylibraft/pylibraft/test/test_refine.py#L127-L128
Run the refinement test
Output:
Expected behavior
Passing tests with int8 data.
Additional context
The current C++ unit test only inspect the distance values if the neighbor indices are different.
raft/cpp/test/neighbors/ann_utils.cuh
Lines 195 to 199 in 11c5105
The unit test shall be extended with a test of the actual distance values. The actual distance between the neighbor and the query vector shall be inspected, and compared to a naive distance calculation. Such test is done in the IVF-PQ python tests and in the refinment test. But such test is not yet applied for the IVF-flat results.
Note, even IVF PQ had an issue with the distance values, but that was fixed in #892
The text was updated successfully, but these errors were encountered: