-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764
GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764
Conversation
…children instead of all children
topN = Math.min(topN, counts.length); | ||
|
||
PriorityQueue<Entry> pq = | ||
new PriorityQueue<>(topN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to use the variant which takes a supplier of sentinel objects? It will allow you to avoid the null
checks below and in general improve the performance of the algorithm below (unless we don't expect topN
to be much smaller than counts
. The usage pattern then becomes comparing the current count+label to the top()
and if it's better you update the top()
and call updateTop()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think this is a good suggestion. Thanks!
|
||
LabelAndValue[] labelValues = new LabelAndValue[topN]; | ||
for (int i = topN - 1; i >= 0; i--) { | ||
Entry e = pq.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you accept my proposal above, you will need to break the loop when hitting a sentinel value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd actually need to continue
the loop while we have sentinel values on top right? Not break
? Then we'd actually need to truncate those sentinel values off the final array? This actually makes me wonder if we don't have a pre-existing bug in our other faceting implementations in the case that we have fewer non-zero counts than requested top-n (but in that case I think we'd NPE). Hmm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, never mind on the NPE concern. Since we rely on PQ#size as a bound (and since we don't use sentinel values in the other implementations), we're safe there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... and I think we can actually do better at dealing with sentinels than continuing
our loop and truncating the resulting array. We can just pop
them all off initially. I'll update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I meant continue
:). I forgot that the sentinel values are "on top". And yes, pop()
all the sentinel values before we populate the array is the way to get rid of them.
@@ -46,6 +49,105 @@ public class TestExactFacetSetMatcher extends FacetTestCase { | |||
private static final int[] MANUFACTURER_ORDS = {FORD_ORD, TOYOTA_ORD, CHEVY_ORD, NISSAN_ORD}; | |||
private static final int[] YEARS = {2010, 2011, 2012}; | |||
|
|||
public void testTopChildren() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the change is only in MatchingFacetSetsCounts
and has nothing to do with the matchers, I think the test should be in TestMatchingFacetSetsCounts
and remove the duplication in the "Range" test class. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense to me. Thanks!
LabelAndValue[] labelValues = new LabelAndValue[Math.min(topN, childCount)]; | ||
for (int i = pq.size() - 1; i >= 0; i--) { | ||
Entry e = pq.pop(); | ||
assert e != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think this assertion contributes much because if it's null
we'll fail in the next line.
|
||
LabelAndValue[] labelValues = new LabelAndValue[topN]; | ||
for (int i = topN - 1; i >= 0; i--) { | ||
Entry e = pq.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I meant continue
:). I forgot that the sentinel values are "on top". And yes, pop()
all the sentinel values before we populate the array is the way to get rid of them.
…children instead of all children (#11764)
* main: (324 commits) Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix Guard FieldExistsQuery against null pointers (apache#11794) Fix handling of ghost fields in string sorts. (apache#11792) LUCENE-10365 Wizard changes contributed from Solr (apache#591) GitHub Workflows security hardening (apache#11789) Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (apache#11786) update DOAP and releaseWizard to reflect migration to github (apache#11747) Diversity check bugfix (apache#11781) Fix rare bug in TestKnnVectorQuery when we have multiple segments GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (apache#11779) LUCENE-10674: Move changes entry to 9.4. apacheGH-11172: remove WindowsDirectory and native subproject. (apache#11774) LUCENE-10674: Ensure BitSetConjDISI returns NO_MORE_DOCS when sub-iterator exhausts. (apache#1068) Removed duplicate check in SpanGradientFormatter (apache#11762) Fix integer overflow in tests. GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children (apache#11764) Retry gradle wrapper download on http 500 and 503. (apache#11766) Fix a typo affecting Luke (apache#11763) Fix IntervalBuilder.NO_INTERVALS docId when unpositioned (apache#11760) LUCENE-10592 Better estimate memory for HNSW graph (apache#11743) ...
Description
See: GITHUB#11742