Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aggregators/metric): Add a top_hits aggregator #2198

Merged
merged 32 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
39b1684
feat(aggregators/metric): Implement a top_hits aggregator
ditsuke Sep 29, 2023
453ac23
fix: Expose get_fields
ditsuke Oct 3, 2023
d574384
fix: Serializer for top_hits request
ditsuke Oct 3, 2023
ba5e23f
chore: Avert panick on parsing invalid top_hits query
ditsuke Oct 8, 2023
00cce7c
refactor: Allow multiple field names from aggregations
ditsuke Oct 8, 2023
a7cf3f0
perf: Replace binary heap with TopNComputer
ditsuke Oct 8, 2023
b4de0c6
fix: Avoid comparator inversion by ComparableDoc
ditsuke Oct 10, 2023
ab49acc
fix: Rank missing field values lower than present values
ditsuke Oct 10, 2023
4e1b9c4
refactor: Make KeyOrder a struct
ditsuke Oct 10, 2023
4238c63
feat: Rough attempt at docvalue_fields
ditsuke Oct 15, 2023
9858402
feat: Complete stab at docvalue_fields
ditsuke Oct 18, 2023
21a8b1d
test(unit): Add tests for top_hits aggregator
ditsuke Oct 19, 2023
34df32d
fix: docfield_value field globbing
ditsuke Oct 19, 2023
cdfe4de
test(unit): Include dynamic fields
ditsuke Oct 19, 2023
f9430e6
fix: Value -> OwnedValue
ditsuke Oct 20, 2023
22407c0
fix: Use OwnedValue's native Null variant
ditsuke Oct 20, 2023
0e7dea9
chore: Improve readability of test asserts
ditsuke Oct 23, 2023
0d89133
chore: Remove DocAddress from top_hits result
ditsuke Oct 25, 2023
2fb0018
docs: Update aggregator doc
ditsuke Oct 25, 2023
ae4a9e5
Merge `tantivy/main` into `feat/aggregators/top-hits`
ditsuke Oct 26, 2023
de9f113
revert: accidental doc test
ditsuke Oct 26, 2023
537cebd
chore: enable time macros only for tests
ditsuke Nov 4, 2023
9dea259
chore: Apply suggestions from review
ditsuke Nov 4, 2023
57a811c
chore: Apply suggestions from review
ditsuke Nov 16, 2023
d760c6c
fix: Retrieve all values for fields
ditsuke Nov 16, 2023
bc8a4cf
test(unit): Update for multi-value retrieval
ditsuke Nov 17, 2023
5162e14
chore: Assert term existence
ditsuke Nov 17, 2023
33b12a8
feat: Include all columns for a column name
ditsuke Nov 17, 2023
46d1cf7
fix: Resolve json fields
ditsuke Nov 27, 2023
467b70b
chore: Address review on mutability
ditsuke Jan 16, 2024
da13a1c
chore: s/segment_id/segment_ordinal instances of SegmentOrdinal
ditsuke Jan 18, 2024
e2ba462
chore: Revert erroneous grammar change
ditsuke Jan 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ futures = "0.3.21"
paste = "1.0.11"
more-asserts = "0.3.1"
rand_distr = "0.4.3"
time = { version = "0.3.10", features = ["serde-well-known", "macros"] }

[target.'cfg(not(windows))'.dev-dependencies]
criterion = "0.5"
Expand Down
4 changes: 2 additions & 2 deletions src/aggregation/agg_limits.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@ impl AggregationLimits {
/// Create a new ResourceLimitGuard, that will release the memory when dropped.
pub fn new_guard(&self) -> ResourceLimitGuard {
ResourceLimitGuard {
/// The counter which is shared between the aggregations for one request.
// The counter which is shared between the aggregations for one request.
memory_consumption: Arc::clone(&self.memory_consumption),
/// The memory_limit in bytes
// The memory_limit in bytes
memory_limit: self.memory_limit,
allocated_with_the_guard: 0,
}
Expand Down
39 changes: 24 additions & 15 deletions src/aggregation/agg_req.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ use super::bucket::{
};
use super::metric::{
AverageAggregation, CountAggregation, MaxAggregation, MinAggregation,
PercentilesAggregationReq, StatsAggregation, SumAggregation,
PercentilesAggregationReq, StatsAggregation, SumAggregation, TopHitsAggregation,
};

/// The top-level aggregation request structure, which contains [`Aggregation`] and their user
Expand Down Expand Up @@ -93,7 +93,12 @@ impl Aggregation {
}

fn get_fast_field_names(&self, fast_field_names: &mut HashSet<String>) {
fast_field_names.insert(self.agg.get_fast_field_name().to_string());
fast_field_names.extend(
self.agg
.get_fast_field_names()
.iter()
.map(|s| s.to_string()),
);
fast_field_names.extend(get_fast_field_names(&self.sub_aggregation));
}
}
Expand Down Expand Up @@ -147,23 +152,27 @@ pub enum AggregationVariants {
/// Computes the sum of the extracted values.
#[serde(rename = "percentiles")]
Percentiles(PercentilesAggregationReq),
/// Finds the top k values matching some order
#[serde(rename = "top_hits")]
TopHits(TopHitsAggregation),
}

impl AggregationVariants {
/// Returns the name of the field used by the aggregation.
pub fn get_fast_field_name(&self) -> &str {
/// Returns the name of the fields used by the aggregation.
pub fn get_fast_field_names(&self) -> Vec<&str> {
match self {
AggregationVariants::Terms(terms) => terms.field.as_str(),
AggregationVariants::Range(range) => range.field.as_str(),
AggregationVariants::Histogram(histogram) => histogram.field.as_str(),
AggregationVariants::DateHistogram(histogram) => histogram.field.as_str(),
AggregationVariants::Average(avg) => avg.field_name(),
AggregationVariants::Count(count) => count.field_name(),
AggregationVariants::Max(max) => max.field_name(),
AggregationVariants::Min(min) => min.field_name(),
AggregationVariants::Stats(stats) => stats.field_name(),
AggregationVariants::Sum(sum) => sum.field_name(),
AggregationVariants::Percentiles(per) => per.field_name(),
AggregationVariants::Terms(terms) => vec![terms.field.as_str()],
AggregationVariants::Range(range) => vec![range.field.as_str()],
AggregationVariants::Histogram(histogram) => vec![histogram.field.as_str()],
AggregationVariants::DateHistogram(histogram) => vec![histogram.field.as_str()],
AggregationVariants::Average(avg) => vec![avg.field_name()],
AggregationVariants::Count(count) => vec![count.field_name()],
AggregationVariants::Max(max) => vec![max.field_name()],
AggregationVariants::Min(min) => vec![min.field_name()],
AggregationVariants::Stats(stats) => vec![stats.field_name()],
AggregationVariants::Sum(sum) => vec![sum.field_name()],
AggregationVariants::Percentiles(per) => vec![per.field_name()],
AggregationVariants::TopHits(top_hits) => top_hits.field_names(),
}
}

Expand Down
Loading