Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children #11764

Merged
merged 5 commits into from
Sep 13, 2022

Conversation

gsmiller
Copy link
Contributor

Description

See: GITHUB#11742

@gsmiller gsmiller requested a review from shaie September 12, 2022 18:55
topN = Math.min(topN, counts.length);

PriorityQueue<Entry> pq =
new PriorityQueue<>(topN) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to use the variant which takes a supplier of sentinel objects? It will allow you to avoid the null checks below and in general improve the performance of the algorithm below (unless we don't expect topN to be much smaller than counts. The usage pattern then becomes comparing the current count+label to the top() and if it's better you update the top() and call updateTop().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think this is a good suggestion. Thanks!


LabelAndValue[] labelValues = new LabelAndValue[topN];
for (int i = topN - 1; i >= 0; i--) {
Entry e = pq.pop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you accept my proposal above, you will need to break the loop when hitting a sentinel value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd actually need to continue the loop while we have sentinel values on top right? Not break? Then we'd actually need to truncate those sentinel values off the final array? This actually makes me wonder if we don't have a pre-existing bug in our other faceting implementations in the case that we have fewer non-zero counts than requested top-n (but in that case I think we'd NPE). Hmm...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, never mind on the NPE concern. Since we rely on PQ#size as a bound (and since we don't use sentinel values in the other implementations), we're safe there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... and I think we can actually do better at dealing with sentinels than continuing our loop and truncating the resulting array. We can just pop them all off initially. I'll update the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I meant continue :). I forgot that the sentinel values are "on top". And yes, pop() all the sentinel values before we populate the array is the way to get rid of them.

@@ -46,6 +49,105 @@ public class TestExactFacetSetMatcher extends FacetTestCase {
private static final int[] MANUFACTURER_ORDS = {FORD_ORD, TOYOTA_ORD, CHEVY_ORD, NISSAN_ORD};
private static final int[] YEARS = {2010, 2011, 2012};

public void testTopChildren() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the change is only in MatchingFacetSetsCounts and has nothing to do with the matchers, I think the test should be in TestMatchingFacetSetsCounts and remove the duplication in the "Range" test class. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense to me. Thanks!

LabelAndValue[] labelValues = new LabelAndValue[Math.min(topN, childCount)];
for (int i = pq.size() - 1; i >= 0; i--) {
Entry e = pq.pop();
assert e != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think this assertion contributes much because if it's null we'll fail in the next line.


LabelAndValue[] labelValues = new LabelAndValue[topN];
for (int i = topN - 1; i >= 0; i--) {
Entry e = pq.pop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I meant continue :). I forgot that the sentinel values are "on top". And yes, pop() all the sentinel values before we populate the array is the way to get rid of them.

@gsmiller gsmiller merged commit 4463a0b into apache:main Sep 13, 2022
@gsmiller gsmiller deleted the GH-11742/facetset-gettopchildren branch September 13, 2022 13:50
gsmiller added a commit that referenced this pull request Sep 13, 2022
wjp719 added a commit to wjp719/lucene that referenced this pull request Sep 21, 2022
* main: (324 commits)
  Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix
  Guard FieldExistsQuery against null pointers (apache#11794)
  Fix handling of ghost fields in string sorts. (apache#11792)
  LUCENE-10365 Wizard changes contributed from Solr (apache#591)
  GitHub Workflows security hardening (apache#11789)
  Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (apache#11786)
  update DOAP and releaseWizard to reflect migration to github (apache#11747)
  Diversity check bugfix (apache#11781)
  Fix rare bug in TestKnnVectorQuery when we have multiple segments
  GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (apache#11779)
  LUCENE-10674: Move changes entry to 9.4.
  apacheGH-11172: remove WindowsDirectory and native subproject. (apache#11774)
  LUCENE-10674: Ensure BitSetConjDISI returns NO_MORE_DOCS when sub-iterator exhausts. (apache#1068)
  Removed duplicate check in SpanGradientFormatter (apache#11762)
  Fix integer overflow in tests.
  GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children (apache#11764)
  Retry gradle wrapper download on http 500 and 503. (apache#11766)
  Fix a typo affecting Luke (apache#11763)
  Fix IntervalBuilder.NO_INTERVALS docId when unpositioned (apache#11760)
  LUCENE-10592 Better estimate memory for HNSW graph (apache#11743)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants