-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10428: Avoid infinite loop under error conditions. #711
Conversation
Definitely needs tests. What can we do to get this change into the next version of Lucene? I am worried that we still don't have a consistent way to reproduce this issue. |
I was hoping for a 👍 from Ankit before merging. Regarding tests, I was not able to identify the case that would break the existing logic, so I'm not sure what new test to add. There are existing tests for this class in |
@jpountz I'd write a test that uses invalid input to do more than 2 iterations and expect |
Link: https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-10428 for future reference. cc: @jainankitk |
@dblock I guess @jpountz's is trying to say that we don't have an existing case (set of inputs / repro) which runs for more than 2 iterations. But, I agree with @dblock's point as well to test this code somehow. Should we parameterize the number of iterations parameter and use it as test hook for getting the method to throw IllegalArgumentException? Also, does this exception reflect as 4xx or 5xx on the client side? |
@jpountz Out of curiosity, what is the implication of returning whatever minScore get's computed after 2 iterations? Like we do a best case effort to converge, but return with whatever we get after n attempts? Also, do you have any pointers on what user should look at once she gets this exception? I understand it's too early as we're not able to repro the issue, so is the plan to keep a jira open to collate more data around how frequently people run into this exception? |
// Important: use ulp of minScoreSum and not minScore to make sure that we | ||
// converge quickly. | ||
minScore -= Math.ulp(minScoreSum); | ||
// this should converge in at most two iterations: | ||
// - one because of the subtraction rounding error | ||
// - one because of the error introduced by sumUpperBound | ||
assert ++iters <= 2 : iters; | ||
if (iter > 2) { | ||
throw new IllegalStateException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also publish the last computed value of minScore
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included this part of jpountz#2.
I'm not sure how useful that is as a test. Would also avoid parameterizing a value (maxIterations) that is not supposed to change across different callers. But I like the idea of collating data from users who run into this exception (via the params printed in stacktrace). Maybe add a ToDo to test/repro/fix the edge case later? |
I won't argue on this specific approach, but I have hard time accepting that we can fix a real bug, and write no new tests. |
Please feel free to contribute one, this is open source after all. Nobody is here to do your bidding. |
@vigyasharma This might cause Lucene to miss some top hits. The point of this code is to propagate minimum competitive scores to child clauses. For instance imagine that you have a disjunction with 3 clauses, whose max scores are 1, 2 and 3. If the collector sets a min competitive score of 5.5 on the disjunction at some point, then we can deduce that the first clause must have a score that it at least 5.5-2-3=0.5, the second clause must have a score that is at least 5.5-1-3=1.5 and the 3rd clause must have a score that is at least 5.5-1-2=2.5. This is very simple in principle but the fact that operations on floating-point numbers produce rounded values makes it a bit tricky, hence the current logic. The risk with your proposal is that we would return a score that is too high, e.g. 0.6 on the first clause instead of 0.5, which would in-turn make Lucene skip hits whose score is between 0.5 and 0.6 on the first clause even though some of them might be top hits. The only "safe" return value would be 0, which would in-turn disable skipping on the child clause. |
@dblock Yeah I'm unhappy about this change too. FWIW this isn't really a bug fix, this mostly turns a potential bug that we haven't managed to understand into a different bug that has less severe implications and would make the root cause easier to identify. |
I think we have a case of floating point here. If you change private float getMinCompetitiveScore(double minScoreSum, double sumOfOtherMaxScores) public void testGetMinCompetitiveScore() throws Exception {
List<FakeScorer> scorers = new ArrayList<>();
scorers.add(new FakeScorer(0.39404958f));
scorers.add(new FakeScorer(0.0028006434f));
scorers.add(new FakeScorer(0.24155897f));
float minScoreSum = 2.7810444831848145f;
MaxScoreSumPropagator p = new MaxScoreSumPropagator(scorers);
p.setMinCompetitiveScore(minScoreSum);
}
I wasn't able to trip it with a float
It looks like converging in Any ideas of what we can do next? I will gladly write a unit test on top of the proposed changes in this PR if I can get to reproduce it. |
I certainly wish it was simpler, but it's done this way on purpose. The way that boolean queries combine scores of their sub clauses together is by summing up the float scores into a double and finally casting back that double back to a float. So this method undoes this operation by taking the float score sum and subtracting the sum of other scores as a double as a first approximation of the minimum score that this clause must have.
That's expected. If
Right, it's bad. My current thinking is that there are 3 main options:
|
Signed-off-by: dblock <[email protected]>
Thank you for the detailed explanation, this makes a lot of sense.
This gave me the idea that we can use negative values for a test at least. I PRed jpountz#2 into your branch: it reproduces this issue with negative scores, and can now validate the new expectation of an |
|
Signed-off-by: dblock <[email protected]>
Thats great! and much better than parameterizing for testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the feedback and contributions! |
Co-authored-by: dblock <[email protected]>
Thanks for fixing the issue @jpountz and everyone for your support and ideas! |
Co-authored-by: dblock <[email protected]>
No description provided.