[BUG] Cuml nearest neighbors returns wrong distances #4624

siegrikw opened this issue Mar 9, 2022 · 4 comments

bug Something isn't working


siegrikw commented Mar 9, 2022

Describe the bug
When n_neighbors is in the following range (1 to 64) for 256 rows exactly (no less no more) the distances and indices returned are invalid:

n_rows = 256
n_neighbors = 1 < n_neighbors <= 64

Steps/Code to reproduce bug

import cupy as cp
from cuml.neighbors import NearestNeighbors
from sklearn.neighbors import NearestNeighbors as NearestNeighborsCPU

X = cp.random.rand(256,3)
n_neighbors = 32

cuml_model = NearestNeighbors(n_neighbors=n_neighbors,algorithm="brute")
cuml_distances, cuml_indices = cuml_model.kneighbors(X,two_pass_precision=True)

sklearn_model = NearestNeighborsCPU(n_neighbors=n_neighbors,algorithm="brute")
sklearn_distances,sklearn_indices =  sklearn_model.kneighbors(X.get())

#Distance Results
print(f"CUML Distances :: \n{cuml_distances}\n")
print(f"Sklearn Distances :: \n{sklearn_distances}\n")

#Indices Results
print(f"CUML Indices :: \n{cuml_indices}\n")
print(f"Sklearn Indices :: \n{sklearn_indices}\n")

Expected behavior
The first distance for every point should be 0 (i.e. the point should be its own nearest-neighbor) as returned by sklearn.neighbors.NearestNeighbors, and all of the indices should not be identically set to 0

Environment details (please complete the following information):

  • Environment location: [Bare-metal]
  • Linux Distro/Architecture: [Ubuntu 20.04 amd64]
  • GPU Model/Driver: [3090 and driver 495.46]
  • CUDA: [11.5]
  • Method of cuDF & cuML install: [conda]
    • If method of install is [conda], run conda list and include results here
@siegrikw siegrikw added ? - Needs Triage Need team to review and classify bug Something isn't working labels Mar 9, 2022
The bug does not appear in Rapids 21.06. Reverting back to 21.06 and running the code above returned the intended behavior

@teju85 @mdoijade could I ask one of you to check this out?

This might be due to rapidsai/raft#568. k<=64 is when the fused kernel is enabled.

@divyegala divyegala removed the ? - Needs Triage Need team to review and classify label Mar 28, 2022
rapids-bot bot pushed a commit to rapidsai/raft that referenced this issue Mar 31, 2022

This PR fixes issue - #568 and rapidsai/cuml#4624
-- fix issue in fusedL2knn which happens when rows are multiple of 256.
-- make index value to be size_t to avoid int overflow though this doesn't hamper these issues but it may for higher input sizes. 
-- also add some additional test cases in fusedL2knn test.

  - Mahesh Doijade (

  - Corey J. Nolet (

URL: #604
@siegrikw we have fixed this from our side. Closing this issue for now, but feel free to open it again if the bug persists

