Optimize DFS while marking connected components #14022

viswanathk · 2024-11-27T17:15:42Z

Stack depth was growing more than it should causing excessive allocations. This should help reduce them, and may potentially speed up process.

Benchmark while indexing 100k docs:

cat max_stack_depth_optimized.txt | grep maxStackDepth | sort -t= -k2 -n | tail
maxStackDepth=8592
maxStackDepth=8605
maxStackDepth=8666
maxStackDepth=8738
maxStackDepth=8779
maxStackDepth=8825
maxStackDepth=9084
maxStackDepth=39925
maxStackDepth=67764
maxStackDepth=68239

cat max_stack_depth_optimized.txt | tail


Results:
recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.532         0.224  100000   100      50       16        250     4 bits     3.56      28129.40           6.71             1            47.82         42.915         4.768
 0.661         0.206  100000   100      50       16        250     7 bits     3.42      29231.22           6.35             1            50.85         47.684         9.537
 0.830         0.263  100000   100      50       16        250         no     3.16      31665.61           5.40             1            42.36         38.147        38.147

BUILD SUCCESSFUL in 34s
2 actionable tasks: 1 executed, 1 up-to-date

cat max_stack_depth_non_optimized.txt | grep maxStackDepth | sort -t= -k2 -n | tail
maxStackDepth=138439
maxStackDepth=139713
maxStackDepth=140014
maxStackDepth=140365
maxStackDepth=140955
maxStackDepth=147255
maxStackDepth=152292
maxStackDepth=618303
maxStackDepth=1128533
maxStackDepth=1505067

cat max_stack_depth_non_optimized.txt | tail


Results:
recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.532         0.244  100000   100      50       16        250     4 bits     3.52      28376.84           8.04             1            47.85         42.915         4.768
 0.662         0.309  100000   100      50       16        250     7 bits     3.54      28288.54           7.64             1            50.86         47.684         9.537
 0.810         0.271  100000   100      50       16        250         no     3.20      31230.48           5.65             1            41.99         38.147        38.147

BUILD SUCCESSFUL in 37s
2 actionable tasks: 1 executed, 1 up-to-date

cc: @msokolov @vigyasharma

viswanathk · 2024-11-27T18:28:40Z

The force merge time shows some improvement.

viswanathk · 2024-12-04T16:44:35Z

Please let me know if we need to run full benchmark suit on this

vigyasharma

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I don't think we have a need to support the partial bitset case (do we @msokolov ?), and the optimization seems worth it. But let's document this change (that the biset should be empty) in the function docstring.

lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java

msokolov · 2024-12-16T17:43:06Z

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I'm not sure -- doesn't it still expect that connectedNodes is preserved between calls? The overall flow is like: find the next not-connected node, and traverse all of its connections -- it might run into an already-connected node (marked as connected in the bitset) because the relation is asymmetric. We used to continue traversing anyway although it's kind of pointless. Maybe it would tell you the size of the "rooted" component of the graph, but we don't really use this size information, so I think it's OK to early-terminate once you find something that is already rooted in an earlier component. And we still expect to remember the visited set across calls.

msokolov · 2024-12-16T17:43:25Z

Benchmark while indexing 100k docs:

could you say what data set you used here -- is this random vectors? If so, it would be great to use some non-random vectors so we can have realistic expectations for impact

msokolov

Looks good overall - are you able to address the comments? It's probably OK as is, but it would be great if we could remove the empties and address the testing question

msokolov · 2024-12-16T17:27:20Z

lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java

@@ -163,6 +164,10 @@ private static Component markRooted(
      throws IOException {
    // Start at entry point and search all nodes on this level
    // System.out.println("markRooted level=" + level + " entryPoint=" + entryPoint);
+    if (connectedNodes.get(entryPoint)) {
+      return new Component(entryPoint, 0);


this should never happen, right? because we enter with the next non-connected node. Can we add an assert false here before the return statement so we catch during testing.

oh wait, this can happen because we iterate over all the entryPoints. Q: do we need this zero-size component for anything? Can we recall what happens with these componentws when we're done - the only purpose is to use them for reconnecting the graph. Yeah it looks like we will try to connect them again, which we could skip. Let's not add these empty components to the list.

oh wait, this can happen because we iterate over all the entryPoints. Q: do we need this zero-size component for anything? Can we recall what happens with these componentws when we're done - the only purpose is to use them for reconnecting the graph. Yeah it looks like we will try to connect them again, which we could skip. Let's not add these empty components to the list.

I don't think we are adding the empty components to the list though. We are adding to the list with the total of the entryPoints for that level (which seems unlikely).

In the other places we add, we start the markRooted process with the nextClearBit, so it won't return 0.

I don't think we are adding the empty components to the list though. We are adding to the list with the total of the entryPoints for that level (which seems unlikely).

But seems like a good check. Updated the PR.

lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java

viswanathk · 2024-12-18T17:13:30Z

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I'm not sure -- doesn't it still expect that connectedNodes is preserved between calls? The overall flow is like: find the next not-connected node, and traverse all of its connections -- it might run into an already-connected node (marked as connected in the bitset) because the relation is asymmetric. We used to continue traversing anyway although it's kind of pointless. Maybe it would tell you the size of the "rooted" component of the graph, but we don't really use this size information, so I think it's OK to early-terminate once you find something that is already rooted in an earlier component. And we still expect to remember the visited set across calls.

Yes, this was my understanding too.

viswanathk · 2024-12-18T17:54:55Z

Benchmark while indexing 100k docs:

could you say what data set you used here -- is this random vectors? If so, it would be great to use some non-random vectors so we can have realistic expectations for impact

I used the knnPerfTest to run the benchmark. It uses enwiki-20120502-lines-1k for doc vectors, and glove-6B-100 for query vectors.

vigyasharma · 2024-12-18T19:23:57Z

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I'm not sure -- doesn't it still expect that connectedNodes is preserved between calls? The overall flow is like: find the next not-connected node, and traverse all of its connections -- it might run into an already-connected node (marked as connected in the bitset) because the relation is asymmetric. We used to continue traversing anyway although it's kind of pointless. Maybe it would tell you the size of the "rooted" component of the graph, but we don't really use this size information, so I think it's OK to early-terminate once you find something that is already rooted in an earlier component. And we still expect to remember the visited set across calls.

Okay, I hadn't looked at the calling function HnswUtil.components() and was thrown off by the early return if entry point is already visited. We do need to pass the same bitset for each entry point.

Since we skip visited nodes now, can this function be impacted if new nodes got added to the graph in between the markRooted() calls? I'm not sure if we allow adding nodes once finish() has been invoked (but not completed). Does it even matter if we don't traverse some new nodes (looks like we assert on the total here?).

Optimize DFS while marking connected components

1d0539d

Use IntHashSet instead of FixedBitSet

ea82561

viswanathk mentioned this pull request Dec 1, 2024

[Discuss] Reducing allocations in HnswUtil::markRooted #14002

Open

vigyasharma reviewed Dec 16, 2024

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java Show resolved Hide resolved

msokolov requested changes Dec 16, 2024

View reviewed changes

Add only components with non zero nodes

d6ed90d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DFS while marking connected components #14022

Optimize DFS while marking connected components #14022

viswanathk commented Nov 27, 2024 •

edited

Loading

viswanathk commented Nov 27, 2024

viswanathk commented Dec 4, 2024

vigyasharma left a comment

msokolov commented Dec 16, 2024

msokolov commented Dec 16, 2024 •

edited

Loading

msokolov left a comment

msokolov Dec 16, 2024

msokolov Dec 16, 2024

viswanathk Dec 18, 2024

viswanathk Dec 18, 2024

viswanathk commented Dec 18, 2024

viswanathk commented Dec 18, 2024

vigyasharma commented Dec 18, 2024

Optimize DFS while marking connected components #14022

Are you sure you want to change the base?

Optimize DFS while marking connected components #14022

Conversation

viswanathk commented Nov 27, 2024 • edited Loading

viswanathk commented Nov 27, 2024

viswanathk commented Dec 4, 2024

vigyasharma left a comment

Choose a reason for hiding this comment

msokolov commented Dec 16, 2024

msokolov commented Dec 16, 2024 • edited Loading

msokolov left a comment

Choose a reason for hiding this comment

msokolov Dec 16, 2024

Choose a reason for hiding this comment

msokolov Dec 16, 2024

Choose a reason for hiding this comment

viswanathk Dec 18, 2024

Choose a reason for hiding this comment

viswanathk Dec 18, 2024

Choose a reason for hiding this comment

viswanathk commented Dec 18, 2024

viswanathk commented Dec 18, 2024

vigyasharma commented Dec 18, 2024

viswanathk commented Nov 27, 2024 •

edited

Loading

msokolov commented Dec 16, 2024 •

edited

Loading