Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc_count field mapper #58339

Closed
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
4b5fab3
Initial version of doc_count field mapper
csoulios Apr 28, 2020
cd515b3
added tests
csoulios May 12, 2020
655e112
Build fixes
csoulios May 14, 2020
db13d83
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios May 14, 2020
191d793
Added tests for doc_count fieldmapper
csoulios May 14, 2020
5f81bee
doc count tests
csoulios Jun 17, 2020
dab8219
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Jun 23, 2020
ecdc603
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Jun 30, 2020
520ac9a
Resolve conflicts after merge from master
csoulios Jun 30, 2020
676ffc6
Added yaml test for doc_count field type
csoulios Jun 30, 2020
7c7139c
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Jun 30, 2020
d3b9c45
Minor changes to test
csoulios Jun 30, 2020
c36ecac
Fix issue with not-registering field mapper
csoulios Jul 2, 2020
4dca391
Simplify terms agg test
csoulios Jul 2, 2020
912d943
Add doc_count provider in the buckets aggregator
csoulios Jul 9, 2020
be46a00
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Jul 9, 2020
c0f23ae
Initialize doc_count provider once
csoulios Jul 14, 2020
f7b43c1
Added tests for FieldBasedDocCountProvider
csoulios Jul 15, 2020
5e1b96a
Added more tests to DocCountFieldMapper
csoulios Jul 16, 2020
80d832b
Fixed NPE at AggregatorTestCase
csoulios Jul 16, 2020
1e8b472
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Jul 17, 2020
e24d680
Updated branch to fix build after merge
csoulios Jul 17, 2020
74c727b
Added validation for single doc_count field
csoulios Jul 17, 2020
cd2c84d
Added version skips to fix broken tests
csoulios Jul 17, 2020
91246eb
Added documentation for doc_count
csoulios Jul 17, 2020
77aa346
Changes to address review comments:
csoulios Jul 20, 2020
39c43a0
Use _doc_count as Lucene field for doc count
csoulios Jul 20, 2020
8ca3fbc
Minor change: field rename
csoulios Jul 20, 2020
83929cb
Minor change to yml test.
csoulios Jul 27, 2020
848fc77
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Sep 2, 2020
0a1731d
Fix errors from merge
csoulios Sep 2, 2020
82f092a
Converted _doc_count to metadata field type
csoulios Sep 2, 2020
ba92359
Throw an error if parsed value is not a number
csoulios Sep 10, 2020
cb61366
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Sep 21, 2020
522c385
Make _doc_count field a metadata field
csoulios Sep 21, 2020
df2a2eb
Fixed broken tests
csoulios Sep 21, 2020
838436f
Fix bug in low cardinality ordinal terms aggs
csoulios Sep 22, 2020
4a92c80
Update docs that _doc_count is a metadata field
csoulios Sep 22, 2020
5d6d037
Fix broken ML tests
csoulios Sep 23, 2020
0ff6fe1
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Sep 23, 2020
23e4b30
Fix errors after merge
csoulios Sep 23, 2020
b258653
Addressed review comments
csoulios Oct 2, 2020
f5ed1df
Addressed reviewer comments
csoulios Oct 19, 2020
2fcdcf6
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Oct 20, 2020
4138d16
Added DocCountFieldTypeTests
csoulios Oct 21, 2020
5d38b7f
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Oct 27, 2020
654847e
Fix errors after merge
csoulios Oct 27, 2020
7b7ca43
Make composite agg respect _doc_count field
csoulios Oct 27, 2020
ce44e87
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Oct 27, 2020
5621c44
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Oct 27, 2020
1d969a1
DocCountProvider rethrows IOException instead of swallowing it
csoulios Oct 27, 2020
cb05034
Set familyTypeName of _doc_count to integer
csoulios Nov 2, 2020
d7d80f4
Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
csoulios Nov 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix bug in low cardinality ordinal terms aggs
  • Loading branch information
csoulios committed Sep 22, 2020

Verified

This commit was signed with the committer’s verified signature. The key has expired.
renovate-bot Mend Renovate
commit 838436f85d884feaffa48657bdb973e8011d64b1
Original file line number Diff line number Diff line change
@@ -18,15 +18,15 @@ setup:
refresh: true
body:
- '{"index": {}}'
- '{"_doc_count": 10, "str": "abc", "number" : 500 }'
- '{"_doc_count": 10, "str": "abc", "number" : 500, "unmapped": "abc" }'
- '{"index": {}}'
- '{"_doc_count": 5, "str": "xyz", "number" : 100 }'
- '{"_doc_count": 5, "str": "xyz", "number" : 100, "unmapped": "xyz" }'
- '{"index": {}}'
- '{"_doc_count": 7, "str": "foo", "number" : 100 }'
- '{"_doc_count": 7, "str": "foo", "number" : 100, "unmapped": "foo" }'
- '{"index": {}}'
- '{"_doc_count": 1, "str": "foo", "number" : 200 }'
- '{"_doc_count": 1, "str": "foo", "number" : 200, "unmapped": "foo" }'
- '{"index": {}}'
- '{"str": "abc", "number" : 500 }'
- '{"str": "abc", "number" : 500, "unmapped": "abc" }'

---
"Test numeric terms agg with doc_count":
@@ -50,7 +50,7 @@ setup:


---
"Test string terms agg with doc_count":
"Test keyword terms agg with doc_count":
- skip:
version: " - 7.99.99"
reason: "Doc count fields are only implemented in 8.0"
@@ -67,3 +67,29 @@ setup:
- match: { aggregations.str_terms.buckets.1.doc_count: 8 }
- match: { aggregations.str_terms.buckets.2.key: "xyz" }
- match: { aggregations.str_terms.buckets.2.doc_count: 5 }

---

"Test unmapped string terms agg with doc_count":
- skip:
version: " - 7.99.99"
reason: "Doc count fields are only implemented in 8.0"
- do:
bulk:
index: test_2
refresh: true
body:
- '{"index": {}}'
- '{"_doc_count": 10, "str": "abc" }'
- '{"index": {}}'
- '{"str": "abc" }'
- do:
search:
index: test_2
rest_total_hits_as_int: true
body: { "size" : 0, "aggs" : { "str_terms" : { "terms" : { "field" : "str.keyword" } } } }

- match: { hits.total: 2 }
- length: { aggregations.str_terms.buckets: 1 }
- match: { aggregations.str_terms.buckets.0.key: "abc" }
- match: { aggregations.str_terms.buckets.0.doc_count: 11 }
Original file line number Diff line number Diff line change
@@ -54,7 +54,7 @@ public abstract class BucketsAggregator extends AggregatorBase {
private final BigArrays bigArrays;
private final IntConsumer multiBucketConsumer;
private IntArray docCounts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we switch to long now that a single document can add more than 1 ?
That's probably a follow up though, no need to change in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in this PR

private DocCountProvider docCountProvider;
protected final DocCountProvider docCountProvider;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a NumericDocValues directly and have the logic in a protected function like getDocCountForDoc(int) ? We don't really need the new interface and classes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method DocCountProvider#getDocCount(int) is used by BucketsAggregator and SortedDocsProducer classes

I think the easier way to share the code of reading _doc_count field is to encapsulate it in a small simple class and call it from both BucketsAggregator and SortedDocsProducer classes. To simplify the class even more, I have modified DocCountProvider interface and made it a concrete class with the implementation.


public BucketsAggregator(String name, AggregatorFactories factories, SearchContext context, Aggregator parent,
CardinalityUpperBound bucketCardinality, Map<String, Object> metadata) throws IOException {
Original file line number Diff line number Diff line change
@@ -305,6 +305,7 @@ public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBucketCol
assert sub == LeafBucketCollector.NO_OP_COLLECTOR;
final SortedDocValues singleValues = DocValues.unwrapSingleton(segmentOrds);
mapping = valuesSource.globalOrdinalsMapping(ctx);
docCountProvider.setLeafReaderContext(ctx);
// Dense mode doesn't support include/exclude so we don't have to check it here.
if (singleValues != null) {
segmentsWithSingleValuedOrds++;
@@ -316,7 +317,8 @@ public void collect(int doc, long owningBucketOrd) throws IOException {
return;
}
int ord = singleValues.ordValue();
segmentDocCounts.increment(ord + 1, 1);
int docCount = docCountProvider.getDocCount(doc);
segmentDocCounts.increment(ord + 1, docCount);
}
});
}
@@ -329,7 +331,8 @@ public void collect(int doc, long owningBucketOrd) throws IOException {
return;
}
for (long segmentOrd = segmentOrds.nextOrd(); segmentOrd != NO_MORE_ORDS; segmentOrd = segmentOrds.nextOrd()) {
segmentDocCounts.increment(segmentOrd + 1, 1);
int docCount = docCountProvider.getDocCount(doc);
segmentDocCounts.increment(segmentOrd + 1, docCount);
}
}
});