Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diversity check bugfix #11781

Merged
merged 2 commits into from
Sep 19, 2022
Merged

Diversity check bugfix #11781

merged 2 commits into from
Sep 19, 2022

Conversation

msokolov
Copy link
Contributor

I was looking into the changes in recall we have been observing in various test cases. Thanks @jtibshirani for pointing out we should not see any change at all! I found a couple of related bugs in the diversity-checking code that I had introduced when refactoring as part of splitting into float[]- and byte[]- oriented versions. I found some gaps in our test coverage and added tests. One of these demonstrates the bug; the other is just filling in another case.

With this fix I was able to measure the identical recall comparing against test runs with 9.3

@msokolov msokolov force-pushed the diversity-check-bugfix branch from 3ba19fd to 5df91b2 Compare September 18, 2022 20:01
{0, 1, 0},
{0, 0, 2},
{1, 0, 0},
{0, 0.4f, 0}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I guess this works for bytes too, but we should probably multiply everything here to make it non-fractional

@mayya-sharipova
Copy link
Contributor

@msokolov Thanks for tacking this.

I ran ann benchmarks with this change, and happy to confirm that in my test recall with this PR is the same as in 9.3 branch, although QPS is lower, but we can investigate QPSs later.

glove-100-angular M:16 efConstruction:100

9.3 recall 9.3 QPS this PR recall this PR QPS
n_cands=10 0.620 2745.933 0.620 1675.500
n_cands=20 0.680 2288.665 0.680 1512.744
n_cands=40 0.746 1770.243 0.746 1040.240
n_cands=80 0.809 1226.738 0.809 695.236
n_cands=120 0.843 948.908 0.843 525.914
n_cands=200 0.878 671.781 0.878 351.529
n_cands=400 0.918 392.265 0.918 207.854
n_cands=600 0.937 282.403 0.937 144.311
n_cands=800 0.949 214.620 0.949 116.875

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msokolov Thanks.

int candidateIndex, BytesRef candidate, NeighborArray neighbors, float minAcceptedSimilarity)
throws IOException {
for (int i = candidateIndex - 1; i > -0; i--) {
int candidateIndex, BytesRef candidateVector, NeighborArray neighbors) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that with this big change, we had only a small reduction in recall. I guess the reason could be that in our tests diversity check was really relevant only for small number of nodes; in majority of cases the algorithm just eliminated the most distant node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know - how did this garbage even work at all! ☹️ It's kind of astonishing how insensitive this whole process is to the diversity checking. Initially we didn't have it at all though (just always pick the closest neighbors), and things still kind of work. Then I had the wonky implementation that did not sort the neighbors while indexing, but did some best effort kind of thing, and still it mostly worked. So we need good tests here to ensure we are doing the right thing! Because bugs here can lead to small degradation.

assertLevel0Neighbors(builder.hnsw, 2, 0);

builder.addGraphNode(3, vectorsCopy);
// this is one case we are testing; 2 has been displaced by 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

@msokolov msokolov merged commit 07af358 into apache:main Sep 19, 2022
@msokolov msokolov deleted the diversity-check-bugfix branch September 19, 2022 15:49
msokolov added a commit that referenced this pull request Sep 19, 2022
* Fixes bug in HNSW diversity checks introduced in LUCENE-10577
msokolov added a commit that referenced this pull request Sep 19, 2022
* Fixes bug in HNSW diversity checks introduced in LUCENE-10577
@msokolov msokolov added this to the 9.4.0 milestone Sep 19, 2022
@jpountz
Copy link
Contributor

jpountz commented Sep 20, 2022

It looks like some recent test failures affecting the 9.4 branch are caused by this change, e.g. https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.4/18/

@jpountz
Copy link
Contributor

jpountz commented Sep 20, 2022

Apologies, I just noticed that you opened #11787.

wjp719 added a commit to wjp719/lucene that referenced this pull request Sep 21, 2022
* main: (324 commits)
  Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix
  Guard FieldExistsQuery against null pointers (apache#11794)
  Fix handling of ghost fields in string sorts. (apache#11792)
  LUCENE-10365 Wizard changes contributed from Solr (apache#591)
  GitHub Workflows security hardening (apache#11789)
  Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (apache#11786)
  update DOAP and releaseWizard to reflect migration to github (apache#11747)
  Diversity check bugfix (apache#11781)
  Fix rare bug in TestKnnVectorQuery when we have multiple segments
  GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (apache#11779)
  LUCENE-10674: Move changes entry to 9.4.
  apacheGH-11172: remove WindowsDirectory and native subproject. (apache#11774)
  LUCENE-10674: Ensure BitSetConjDISI returns NO_MORE_DOCS when sub-iterator exhausts. (apache#1068)
  Removed duplicate check in SpanGradientFormatter (apache#11762)
  Fix integer overflow in tests.
  GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children (apache#11764)
  Retry gradle wrapper download on http 500 and 503. (apache#11766)
  Fix a typo affecting Luke (apache#11763)
  Fix IntervalBuilder.NO_INTERVALS docId when unpositioned (apache#11760)
  LUCENE-10592 Better estimate memory for HNSW graph (apache#11743)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants