[BUG] IVF flat incorrect distance values with int8 data #1058

tfeher · 2022-11-30T18:25:30Z

Describe the bug
The distance values returned with IVF flat search method might be incorrect in certain cases

IVF flat search is used by ANN refinement (if the dataset is in device memory) #1052. The Python tests of refinement revealed this problem. This only happens with int8 data (and not with uint8). IVF flat search has specific kernels for different data types, this might point towards a bug in the int8 specialization.

Note: the returned neighbor indices are correct, only the distance values show this problem.

Steps/Code to reproduce bug
To reproduce the issue:

Uncomment the following lines
https://github.com/tfeher/raft/blob/c34952ba42ca313e280f89fe4a0bc6a495a03852/python/pylibraft/pylibraft/test/test_refine.py#L127-L128
Run the refinement test

pytest pylibraft/test/test_refine.py -k test_refine_dtypes

Output:

>       assert np.mean(diff) < eps
E       assert 0.41265658 < 0.001
E        +  where 0.41265658 = <function mean at 0x14cd30085e50>(array([[0.46209675, 0.3915514 , 0.41272217, 0.41572857, 0.40346152,\n        0.37558016, 0.39662588, 0.36034888, 0.3658...373, 0.33194792, 0.20105657,\n        0.38698354, 0.3945595 , 0.40543446, 0.3377944 , 0.3658476 ]],\n      dtype=float32))
E        +    where <function mean at 0x14cd30085e50> = np.mean

pylibraft/test/test_ivf_pq.py:79: AssertionError
========================================================================================================== short test summary info ==========================================================================================================
FAILED pylibraft/test/test_refine.py::test_refine_dtypes[device-int8-l2_expanded-True-100] - assert 0.41855982 < 0.001
FAILED pylibraft/test/test_refine.py::test_refine_dtypes[device-int8-l2_expanded-True-1024] - assert 0.41032004 < 0.001
FAILED pylibraft/test/test_refine.py::test_refine_dtypes[device-int8-l2_expanded-True-37] - assert 0.40580684 < 0.001
FAILED pylibraft/test/test_refine.py::test_refine_dtypes[device-int8-l2_expanded-False-100] - assert 0.41458178 < 0.001
FAILED pylibraft/test/test_refine.py::test_refine_dtypes[device-int8-l2_expanded-False-1024] - assert 0.41299787 < 0.001
FAILED pylibraft/test/test_refine.py::test_refine_dtypes[device-int8-l2_expanded-False-37] - assert 0.41265658 < 0.001
========================================================================================== 6 failed, 42 passed, 24 skipped, 20 deselected in 7.37s ==========================================================================================

Expected behavior
Passing tests with int8 data.

Additional context
The current C++ unit test only inspect the distance values if the neighbor indices are different.

raft/cpp/test/neighbors/ann_utils.cuh

Lines 195 to 199 in 11c5105

    
           auto operator==(const idx_dist_pair<IdxT, DistT, CompareDist>& a) const -> bool 
        
           { 
        
             if (idx == a.idx) return true; 
        
             if (eq_compare(dist, a.dist)) return true; 
        
             return false;

The unit test shall be extended with a test of the actual distance values. The actual distance between the neighbor and the query vector shall be inspected, and compared to a naive distance calculation. Such test is done in the IVF-PQ python tests and in the refinment test. But such test is not yet applied for the IVF-flat results.

Note, even IVF PQ had an issue with the distance values, but that was fixed in #892

The text was updated successfully, but these errors were encountered:

Solves #1058 This is a tricky bug so the fix deserves some explanation. The previous implementation of `euclidean_dist` was the following in vectorized cases, where `x` and `y` are `int32` vectors of 4 `int8` each and `acc` is a single `int32` number to accumulate the distance in: ```c++ // Compute vectorized absolute differences independently. const auto diff = static_cast<int32_t>(__vabsdiffs4(x, y)); // Square, reduce, and add to the accumulator. acc = dp4a(diff, diff, acc); ``` Now consider the following case: ```c++ x = 0x80; // -128, 0, 0, 0 y = 0x7f; // 127, 0, 0, 0 ``` The difference between -128 and 127 is 255, represented as `FF` (`__vabsdiffs4` is smart enough not to compute `abs(a-b)` which would result in `01`). However, if we call the signed version of `dp4a`, `FF` is cast from `int8` to `int32` as `FFFFFFFF` (or -1). The square of -1 is 1, which is added to `acc` (instead of 65025). As the output of `__vabsdiffs4` is correct when considered as an unsigned number, and as addition is the same for signed and unsigned in 2's complement (and `acc` is positive anyway), the easiest fix is to use the unsigned version of `dp4a`, which will cast overflowed differences properly to 32 bits. The previous code simply becomes: ```c++ const auto diff = __vabsdiffs4(x, y); acc = dp4a(diff, diff, static_cast<uint32_t>(acc)); ``` ----- Additionally, to avoid underflows in the non-vectorized unsigned case, I replaced the subtraction with `__usad` (absolute difference of unsigned numbers). Note that using the subtraction was correct anyway, because the addition/subtraction is the same for unsigned and signed integers, as well as the least significant half of the multiplication (which is the part that is stored), and the square of a number is also the square of its opposite. Consider: ```c++ uint32_t a = 10; uint32_t b = 20; uint32_t c = a - b; // fffffff6, i.e -10 or 4294967286 uint32_t d = c * c; // (ffffffec)00000064, i.e 100 ``` Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) URL: #1122

Nyrio · 2023-01-18T13:57:13Z

The fix has been merged.

tfeher added the bug Something isn't working label Nov 30, 2022

tfeher assigned Nyrio Dec 1, 2022

Nyrio mentioned this issue Jan 5, 2023

Fix euclidean_dist in IVF-Flat search #1122

Merged

Nyrio closed this as completed Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] IVF flat incorrect distance values with int8 data #1058

[BUG] IVF flat incorrect distance values with int8 data #1058

tfeher commented Nov 30, 2022

Nyrio commented Jan 18, 2023

[BUG] IVF flat incorrect distance values with int8 data #1058

[BUG] IVF flat incorrect distance values with int8 data #1058

Comments

tfeher commented Nov 30, 2022

Nyrio commented Jan 18, 2023