LUCENE-10428: Avoid infinite loop under error conditions. #711

jpountz · 2022-02-24T17:33:55Z

No description provided.

dblock · 2022-03-01T17:48:33Z

Definitely needs tests.

What can we do to get this change into the next version of Lucene? I am worried that we still don't have a consistent way to reproduce this issue.

jpountz · 2022-03-01T18:31:14Z

I was hoping for a 👍 from Ankit before merging.

Regarding tests, I was not able to identify the case that would break the existing logic, so I'm not sure what new test to add. There are existing tests for this class in TestMaxScoreSumPropagator.

dblock · 2022-03-01T18:39:21Z

@jpountz I'd write a test that uses invalid input to do more than 2 iterations and expect IllegalStateException with the right values/description. Otherwise no test confirms that it can't loop infinitely.

dblock · 2022-03-01T18:41:27Z

Link: https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-10428 for future reference.

cc: @jainankitk

jainankitk · 2022-03-01T19:14:43Z

@jpountz I'd write a test that uses invalid input to do more than 2 iterations and expect IllegalStateException with the right values/description. Otherwise no test confirms that it can't loop infinitely.

@dblock I guess @jpountz's is trying to say that we don't have an existing case (set of inputs / repro) which runs for more than 2 iterations. But, I agree with @dblock's point as well to test this code somehow.

Should we parameterize the number of iterations parameter and use it as test hook for getting the method to throw IllegalArgumentException? Also, does this exception reflect as 4xx or 5xx on the client side?

vigyasharma · 2022-03-01T20:19:30Z

@jpountz Out of curiosity, what is the implication of returning whatever minScore get's computed after 2 iterations? Like we do a best case effort to converge, but return with whatever we get after n attempts?
I guess it could be bad if the function doesn't keep converging with every iteration, and stopping abruptly at n iters can land in some random, divergent values.. I'm not sure how this works.

Also, do you have any pointers on what user should look at once she gets this exception? I understand it's too early as we're not able to repro the issue, so is the plan to keep a jira open to collate more data around how frequently people run into this exception?

vigyasharma · 2022-03-01T20:31:03Z

lucene/core/src/java/org/apache/lucene/search/MaxScoreSumPropagator.java

      // Important: use ulp of minScoreSum and not minScore to make sure that we
      // converge quickly.
      minScore -= Math.ulp(minScoreSum);
      // this should converge in at most two iterations:
      //  - one because of the subtraction rounding error
      //  - one because of the error introduced by sumUpperBound
-      assert ++iters <= 2 : iters;
+      if (iter > 2) {
+        throw new IllegalStateException(


Also publish the last computed value of minScore here?

I included this part of jpountz#2.

vigyasharma · 2022-03-01T20:32:29Z

Should we parameterize the number of iterations parameter and use it as test hook for getting the method to throw IllegalArgumentException?

I'm not sure how useful that is as a test. Would also avoid parameterizing a value (maxIterations) that is not supposed to change across different callers. But I like the idea of collating data from users who run into this exception (via the params printed in stacktrace). Maybe add a ToDo to test/repro/fix the edge case later?

dblock · 2022-03-01T21:02:06Z

Should we parameterize the number of iterations parameter and use it as test hook for getting the method to throw IllegalArgumentException?

I'm not sure how useful that is as a test.

I won't argue on this specific approach, but I have hard time accepting that we can fix a real bug, and write no new tests.

msokolov · 2022-03-02T14:03:59Z

I won't argue on this specific approach, but I have hard time accepting that we can fix a real bug, and write no new tests.

Please feel free to contribute one, this is open source after all. Nobody is here to do your bidding.

jpountz · 2022-03-02T15:28:12Z

Out of curiosity, what is the implication of returning whatever minScore get's computed after 2 iterations? Like we do a best case effort to converge, but return with whatever we get after n attempts?

@vigyasharma This might cause Lucene to miss some top hits. The point of this code is to propagate minimum competitive scores to child clauses. For instance imagine that you have a disjunction with 3 clauses, whose max scores are 1, 2 and 3. If the collector sets a min competitive score of 5.5 on the disjunction at some point, then we can deduce that the first clause must have a score that it at least 5.5-2-3=0.5, the second clause must have a score that is at least 5.5-1-3=1.5 and the 3rd clause must have a score that is at least 5.5-1-2=2.5. This is very simple in principle but the fact that operations on floating-point numbers produce rounded values makes it a bit tricky, hence the current logic. The risk with your proposal is that we would return a score that is too high, e.g. 0.6 on the first clause instead of 0.5, which would in-turn make Lucene skip hits whose score is between 0.5 and 0.6 on the first clause even though some of them might be top hits. The only "safe" return value would be 0, which would in-turn disable skipping on the child clause.

jpountz · 2022-03-02T15:30:56Z

@dblock Yeah I'm unhappy about this change too. FWIW this isn't really a bug fix, this mostly turns a potential bug that we haven't managed to understand into a different bug that has less severe implications and would make the root cause easier to identify.

dblock · 2022-03-02T15:47:34Z

I think we have a case of floating point here.

If you change getMinCompetitiveScore to take a double or minScoreSum, it becomes easy to reproduce. It also almost always starts tripping testNClausesRandomScore and often test2ClausesRandomScore.

private float getMinCompetitiveScore(double minScoreSum, double sumOfOtherMaxScores)

  public void testGetMinCompetitiveScore() throws Exception {   

    List<FakeScorer> scorers = new ArrayList<>();
    scorers.add(new FakeScorer(0.39404958f));
    scorers.add(new FakeScorer(0.0028006434f));
    scorers.add(new FakeScorer(0.24155897f));

    float minScoreSum = 2.7810444831848145f;
    MaxScoreSumPropagator p = new MaxScoreSumPropagator(scorers);
    p.setMinCompetitiveScore(minScoreSum);
  }

org.apache.lucene.search.TestMaxScoreSumPropagator > testGetMinCompetitiveScore STANDARD_ERROR
    minScoreSum=2.7810442447662354, sumOfOtherMaxScores=0.24435961246490479
    minScoreSum=2.7810442447662354, sumOfOtherMaxScores=0.39685022830963135
    minScoreSum=2.7810442447662354, sumOfOtherMaxScores=0.6356085538864136

org.apache.lucene.search.TestMaxScoreSumPropagator > testGetMinCompetitiveScore FAILED
    java.lang.AssertionError: 3
        at __randomizedtesting.SeedInfo.seed([4705BBC182B572C6:C8E41A539F61A77B]:0)
        at org.apache.lucene.search.MaxScoreSumPropagator.getMinCompetitiveScore(MaxScoreSumPropagator.java:160)
        at org.apache.lucene.search.MaxScoreSumPropagator.setMinCompetitiveScore(MaxScoreSumPropagator.java:121)
        at org.apache.lucene.search.TestMaxScoreSumPropagator.testGetMinCompetitiveScore(TestMaxScoreSumPropagator.java:244)

I wasn't able to trip it with a float minScoreSum though with the values in the issue, but it has to exist, given that it happened in production. Or there's something different about that setup than my linux test machine? JVM 11.0.12+7-LTS vs. 17? I can't find anything conclusive. This works:

org.apache.lucene.search.TestMaxScoreSumPropagator > testGetMinCompetitiveScore STANDARD_ERROR
    minScoreSum=2.7810442, sumOfOtherMaxScores=0.24435961246490479
    minScoreSum=2.7810442, sumOfOtherMaxScores=0.39685022830963135
    minScoreSum=2.7810442, sumOfOtherMaxScores=0.6356085538864136

It looks like converging in getMinCompetitiveScore relies on truncating (rounding?) doubles to floats, so this looks suspicious to me.

Any ideas of what we can do next? I will gladly write a unit test on top of the proposed changes in this PR if I can get to reproduce it.

jpountz · 2022-03-02T17:22:09Z

It looks like converging in getMinCompetitiveScore relies on truncating (rounding?) doubles to floats, so this looks suspicious to me.

I certainly wish it was simpler, but it's done this way on purpose. The way that boolean queries combine scores of their sub clauses together is by summing up the float scores into a double and finally casting back that double back to a float. So this method undoes this operation by taking the float score sum and subtracting the sum of other scores as a double as a first approximation of the minimum score that this clause must have.

If you change getMinCompetitiveScore to take a double or minScoreSum, it becomes easy to reproduce.

That's expected. If minScoreSum is a double then we end up subtracting ulps of a double from a float, which doesn't make much sense.

it has to exist, given that it happened in production.

Right, it's bad. My current thinking is that there are 3 main options:

A bug in this code,
A bug in other code that breaks assumptions that this code is making, e.g. a query that produces negative or NaN scores.
A problem in something external like a JVM bug or bad RAM.

Signed-off-by: dblock <[email protected]>

dblock · 2022-03-02T20:30:56Z

Thank you for the detailed explanation, this makes a lot of sense.

A bug in other code that breaks assumptions that this code is making, e.g. a query that produces negative or NaN scores.

This gave me the idea that we can use negative values for a test at least. I PRed jpountz#2 into your branch: it reproduces this issue with negative scores, and can now validate the new expectation of an IllegalStateException being thrown.

dblock · 2022-03-02T20:31:33Z

I won't argue on this specific approach, but I have hard time accepting that we can fix a real bug, and write no new tests.

Please feel free to contribute one, this is open source after all. Nobody is here to do your bidding.

jpountz#2

Signed-off-by: dblock <[email protected]>

jainankitk · 2022-03-02T21:12:32Z

This gave me the idea that we can use negative values for a test at least. I PRed jpountz#2 into your branch: it reproduces this issue with negative scores, and can now validate the new expectation of an IllegalStateException being thrown.

Thats great! and much better than parameterizing for testing IllegalStateException. I have reviewed and approved the changes for jpountz#2

jainankitk

This combined with changes from @dblock in jpountz#2 is good to go!

Unit test for apache#711

jpountz · 2022-03-03T08:42:25Z

Thanks all for the feedback and contributions!

Co-authored-by: dblock <[email protected]>

dblock · 2022-03-03T14:16:58Z

Thanks for fixing the issue @jpountz and everyone for your support and ideas!

Co-authored-by: dblock <[email protected]>

LUCENE-10428: Avoid infinite loop under error conditions.

9493a35

vigyasharma reviewed Mar 1, 2022

View reviewed changes

Publish the last computed value of minScore.

6024dc5

Signed-off-by: dblock <[email protected]>

dblock mentioned this pull request Mar 2, 2022

Unit test for https://github.com/apache/lucene/pull/711 jpountz/lucene#2

Merged

Unit test for https://issues.apache.org/jira/browse/LUCENE-10428.

5a89e1c

Signed-off-by: dblock <[email protected]>

jainankitk approved these changes Mar 2, 2022

View reviewed changes

jpountz added 3 commits March 3, 2022 08:40

Merge pull request #2 from dblock/lucene10428

5dd40bd

Unit test for apache#711

Tidy.

8e26c82

CHANGES entry

1d45d0e

jpountz merged commit 44a2a82 into apache:main Mar 3, 2022

jpountz deleted the lucene10428 branch March 3, 2022 08:42

jpountz added a commit that referenced this pull request Mar 3, 2022

LUCENE-10428: Avoid infinite loop under error conditions. (#711)

0d35e38

Co-authored-by: dblock <[email protected]>

dblock mentioned this pull request Mar 7, 2022

Fix: typo + +minScore. #735

Merged

3 tasks

dantuzi pushed a commit to SeaseLtd/lucene that referenced this pull request Mar 10, 2022

LUCENE-10428: Avoid infinite loop under error conditions. (apache#711)

d6faa69

Co-authored-by: dblock <[email protected]>

dblock mentioned this pull request Oct 2, 2024

Execute item failure callback on bulk request error opensearch-project/opensearch-go#627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-10428: Avoid infinite loop under error conditions. #711

LUCENE-10428: Avoid infinite loop under error conditions. #711

jpountz commented Feb 24, 2022

dblock commented Mar 1, 2022 •

edited

Loading

jpountz commented Mar 1, 2022

dblock commented Mar 1, 2022 •

edited

Loading

dblock commented Mar 1, 2022

jainankitk commented Mar 1, 2022

vigyasharma commented Mar 1, 2022

vigyasharma Mar 1, 2022

dblock Mar 2, 2022

vigyasharma commented Mar 1, 2022

dblock commented Mar 1, 2022 •

edited

Loading

msokolov commented Mar 2, 2022

jpountz commented Mar 2, 2022

jpountz commented Mar 2, 2022

dblock commented Mar 2, 2022 •

edited

Loading

jpountz commented Mar 2, 2022

dblock commented Mar 2, 2022 •

edited

Loading

dblock commented Mar 2, 2022

jainankitk commented Mar 2, 2022 •

edited

Loading

jainankitk left a comment

jpountz commented Mar 3, 2022

dblock commented Mar 3, 2022

LUCENE-10428: Avoid infinite loop under error conditions. #711

LUCENE-10428: Avoid infinite loop under error conditions. #711

Conversation

jpountz commented Feb 24, 2022

dblock commented Mar 1, 2022 • edited Loading

jpountz commented Mar 1, 2022

dblock commented Mar 1, 2022 • edited Loading

dblock commented Mar 1, 2022

jainankitk commented Mar 1, 2022

vigyasharma commented Mar 1, 2022

vigyasharma Mar 1, 2022

Choose a reason for hiding this comment

dblock Mar 2, 2022

Choose a reason for hiding this comment

vigyasharma commented Mar 1, 2022

dblock commented Mar 1, 2022 • edited Loading

msokolov commented Mar 2, 2022

jpountz commented Mar 2, 2022

jpountz commented Mar 2, 2022

dblock commented Mar 2, 2022 • edited Loading

jpountz commented Mar 2, 2022

dblock commented Mar 2, 2022 • edited Loading

dblock commented Mar 2, 2022

jainankitk commented Mar 2, 2022 • edited Loading

jainankitk left a comment

Choose a reason for hiding this comment

jpountz commented Mar 3, 2022

dblock commented Mar 3, 2022

dblock commented Mar 1, 2022 •

edited

Loading

dblock commented Mar 1, 2022 •

edited

Loading

dblock commented Mar 1, 2022 •

edited

Loading

dblock commented Mar 2, 2022 •

edited

Loading

dblock commented Mar 2, 2022 •

edited

Loading

jainankitk commented Mar 2, 2022 •

edited

Loading