Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] matrixVectorOp in Dbscan::Vertexdeg::Algo should set bcastAlongRows as false #5360

Open
georgeliu95 opened this issue Apr 13, 2023 · 0 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@georgeliu95
Copy link

georgeliu95 commented Apr 13, 2023

Describe the bug
A clear and concise description of what the bug is.
IIUC, in Dbscan::Vertexdeg::Algo we hope to apply normalization to the input matrix when using cosine metric. However, in the current implementation (v23.06), with bcastAlongRows = true passed to raft::linalg::matrixVectorOp, it actually does something not as we expect, like:

for i in [0..nLines) and j in [0..lineLen):
    out[i, j] = op(in[i, j], row_norm[j])
where matrix indexing is row-major ([i, j] = [i * lineLen + j])

Obviously, the columns are mistakenly divided by the l2 norm of each row with same index. We should do it by crossing rows instead of along rows.
Moreover, if the size of rows is smaller than the size of columns, UB occurs because of access to an illegal memory.
BTW, we also cannot restore the input matrix if it becomes inf for some elements due to division by zero.

Steps/Code to reproduce bug
A simple demonstration:

/* 
    if data.x contains an (4 x 6) matrix like
                      [[-1.03713, 1.07055, 1.20798, 0.148802, -0.00810787, -0.735827],
                       [-0.476402, 0.641141, -1.28886, 0.244033, -0.790808, -0.434079], 
                       [ 1.20568, 1.38347, 0.983189, -0.000811279, -0.803834, -1.28266],
                       [ 0.553955, -1.41257, 1.32544, 0.670778, -1.36426, 0.0659274]]
    then rowNorms.data() would be [2.06023, 1.78116, 2.57404, 2.52478]
*/

raft::linalg::matrixVectorOp(
      const_cast<value_t*>(data.x),
      data.x,
      rowNorms.data(),
      k,
      m,
      true,
      true,
      [] __device__(value_t mat_in, value_t vec_in) { return mat_in / vec_in; },
      stream);

/*
    now the data stored in data.x becomes 
                      [[-0.503402, 0.601041, 0.469292, 0.0589366, -inf, -inf], 
                       [-0.231237, 0.359957, -0.500713, 0.096655, -inf, -inf], 
                       [ 0.585215, 0.776726, 0.381963, -0.000321326, -inf, -inf], 
                       [ 0.26888, -0.793064, 0.514927, 0.265678, -inf, inf]]
*/


// Restoring input matrix after normalization
raft::linalg::matrixVectorOp(
      const_cast<value_t*>(data.x),
      data.x,
      rowNorms.data(),
      k,
      m,
      true,
      true,
      [] __device__(value_t mat_in, value_t vec_in) { return mat_in * vec_in; },
      stream);
/*
    as a result, we cannot restore the original input matrix here
                      [[-1.03713, 1.07055, 1.20798, 0.148802, nan, nan], 
                       [-0.476402, 0.641141, -1.28886, 0.244033, nan, nan], 
                       [ 1.20568, 1.38347, 0.983189, -0.000811279, nan, nan], 
                       [ 0.553955, -1.41257, 1.32544, 0.670778, nan, nan]]
*/

Expected behavior
I think there are two things:

  1. Divide rows by the l2 norm of themselves, just passing false to bcastAlongRows (line 1 & line 2) would be enough.
  2. Check whether the divisor is zero or not here, or we can simply let the minimum of rowNorms greater than zero here.

Environment details (please complete the following information):

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant