GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764

gsmiller · 2022-09-12T16:23:55Z

Description

See: GITHUB#11742

…children instead of all children

shaie · 2022-09-12T19:12:45Z

lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java

+    topN = Math.min(topN, counts.length);
+
+    PriorityQueue<Entry> pq =
+        new PriorityQueue<>(topN) {


Would you like to use the variant which takes a supplier of sentinel objects? It will allow you to avoid the null checks below and in general improve the performance of the algorithm below (unless we don't expect topN to be much smaller than counts. The usage pattern then becomes comparing the current count+label to the top() and if it's better you update the top() and call updateTop().

Sure, I think this is a good suggestion. Thanks!

shaie · 2022-09-12T19:13:53Z

lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java

+
+    LabelAndValue[] labelValues = new LabelAndValue[topN];
+    for (int i = topN - 1; i >= 0; i--) {
+      Entry e = pq.pop();


If you accept my proposal above, you will need to break the loop when hitting a sentinel value

I think we'd actually need to continue the loop while we have sentinel values on top right? Not break? Then we'd actually need to truncate those sentinel values off the final array? This actually makes me wonder if we don't have a pre-existing bug in our other faceting implementations in the case that we have fewer non-zero counts than requested top-n (but in that case I think we'd NPE). Hmm...

Ah, never mind on the NPE concern. Since we rely on PQ#size as a bound (and since we don't use sentinel values in the other implementations), we're safe there.

Hmm... and I think we can actually do better at dealing with sentinels than continuing our loop and truncating the resulting array. We can just pop them all off initially. I'll update the PR.

Right, I meant continue :). I forgot that the sentinel values are "on top". And yes, pop() all the sentinel values before we populate the array is the way to get rid of them.

shaie · 2022-09-12T19:18:50Z

lucene/facet/src/test/org/apache/lucene/facet/facetset/TestExactFacetSetMatcher.java

@@ -46,6 +49,105 @@ public class TestExactFacetSetMatcher extends FacetTestCase {
  private static final int[] MANUFACTURER_ORDS = {FORD_ORD, TOYOTA_ORD, CHEVY_ORD, NISSAN_ORD};
  private static final int[] YEARS = {2010, 2011, 2012};

+  public void testTopChildren() throws Exception {


Since the change is only in MatchingFacetSetsCounts and has nothing to do with the matchers, I think the test should be in TestMatchingFacetSetsCounts and remove the duplication in the "Range" test class. WDYT?

Sure, makes sense to me. Thanks!

shaie · 2022-09-13T05:03:08Z

lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java

+    LabelAndValue[] labelValues = new LabelAndValue[Math.min(topN, childCount)];
+    for (int i = pq.size() - 1; i >= 0; i--) {
+      Entry e = pq.pop();
+      assert e != null;


nit: I don't think this assertion contributes much because if it's null we'll fail in the next line.

shaie · 2022-09-13T05:03:14Z

lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java

+
+    LabelAndValue[] labelValues = new LabelAndValue[topN];
+    for (int i = topN - 1; i >= 0; i--) {
+      Entry e = pq.pop();


Right, I meant continue :). I forgot that the sentinel values are "on top". And yes, pop() all the sentinel values before we populate the array is the way to get rid of them.

…children instead of all children (#11764)

* main: (324 commits) Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix Guard FieldExistsQuery against null pointers (apache#11794) Fix handling of ghost fields in string sorts. (apache#11792) LUCENE-10365 Wizard changes contributed from Solr (apache#591) GitHub Workflows security hardening (apache#11789) Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (apache#11786) update DOAP and releaseWizard to reflect migration to github (apache#11747) Diversity check bugfix (apache#11781) Fix rare bug in TestKnnVectorQuery when we have multiple segments GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (apache#11779) LUCENE-10674: Move changes entry to 9.4. apacheGH-11172: remove WindowsDirectory and native subproject. (apache#11774) LUCENE-10674: Ensure BitSetConjDISI returns NO_MORE_DOCS when sub-iterator exhausts. (apache#1068) Removed duplicate check in SpanGradientFormatter (apache#11762) Fix integer overflow in tests. GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children (apache#11764) Retry gradle wrapper download on http 500 and 503. (apache#11766) Fix a typo affecting Luke (apache#11763) Fix IntervalBuilder.NO_INTERVALS docId when unpositioned (apache#11760) LUCENE-10592 Better estimate memory for HNSW graph (apache#11743) ...

GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top …

60a586e

…children instead of all children

gsmiller requested a review from shaie September 12, 2022 18:55

shaie reviewed Sep 12, 2022

View reviewed changes

gsmiller added 3 commits September 12, 2022 13:55

pr feedback

4b98ae6

better approach to extra sentinels

2ab6e81

array size fixup

cb741d0

shaie approved these changes Sep 13, 2022

View reviewed changes

remove assert

be29865

shaie approved these changes Sep 13, 2022

View reviewed changes

gsmiller merged commit 4463a0b into apache:main Sep 13, 2022

gsmiller deleted the GH-11742/facetset-gettopchildren branch September 13, 2022 13:50

gsmiller added a commit that referenced this pull request Sep 13, 2022

GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top …

3334697

…children instead of all children (#11764)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764

GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764

gsmiller commented Sep 12, 2022

shaie Sep 12, 2022

gsmiller Sep 12, 2022

shaie Sep 12, 2022

gsmiller Sep 12, 2022

gsmiller Sep 12, 2022

gsmiller Sep 12, 2022

shaie Sep 13, 2022

shaie Sep 12, 2022

gsmiller Sep 12, 2022

shaie Sep 13, 2022

shaie Sep 13, 2022

GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764

GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764

Conversation

gsmiller commented Sep 12, 2022

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment