Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add size parameter to time_series aggregation #93496

Merged
merged 6 commits into from
Feb 7, 2023

Conversation

tmgordeeva
Copy link
Contributor

Adds an optional size parameter which caps the number of buckets in responses.

After aggregation, caps the number of buckets we send back in responses and in reduction.

Adds an optional size parameter which caps the number of buckets in responses.
@elasticsearchmachine elasticsearchmachine added v8.7.0 needs:triage Requires assignment of a team area label labels Feb 6, 2023
@tmgordeeva tmgordeeva added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >feature and removed needs:triage Requires assignment of a team area label labels Feb 6, 2023
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Feb 6, 2023
@tmgordeeva tmgordeeva added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Feb 6, 2023
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Feb 6, 2023
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments.

@@ -247,6 +251,9 @@ protected boolean lessThan(IteratorAndCurrent<InternalBucket> a, IteratorAndCurr
BytesRef tsid = reducedBucket.key;
assert prevTsid == null || tsid.compareTo(prevTsid) > 0;
reduced.buckets.add(reducedBucket);
if (size != null && ++count >= size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reduced.size() can be used instead of using a count variable here?

public static final InstantiatingObjectParser<TimeSeriesAggregationBuilder, String> PARSER;

private boolean keyed;
private int size;

private static final int DEFAULT_SIZE = 10000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be: private static final int DEFAULT_SIZE = MultiBucketConsumerService.DEFAULT_MAX_BUCKETS

@@ -14,7 +14,7 @@ public class TimeSeriesAggregationBuilderTests extends AggregationBuilderTestCas

@Override
protected TimeSeriesAggregationBuilder createTestAggregatorBuilder() {
return new TimeSeriesAggregationBuilder(randomAlphaOfLength(10), randomBoolean());
return new TimeSeriesAggregationBuilder(randomAlphaOfLength(10), randomBoolean(), 10000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe randomise size here? randomIntBetween(0, 100_000)

@@ -217,6 +217,10 @@ protected boolean lessThan(IteratorAndCurrent<InternalBucket> a, IteratorAndCurr
}

InternalTimeSeries reduced = new InternalTimeSeries(name, new ArrayList<>(initialCapacity), keyed, getMetadata());
Integer size = reduceContext.builder() instanceof TimeSeriesAggregationBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 then we don't have to serialise size as part of InternalTimeSeries class.
Is reduceContext.builder() instanceof TimeSeriesAggregationBuilder also ways true?
If so we should throw an IllagalStateException if the builder isn't instance of TimeSeriesAggregationBuilder.
If this isn't the case then we should default to TimeSeriesAggregationBuilder#DEFAULT_SIZE.
I think size variable can just be an int?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests may use a fake builder

I was unaware of this. So I think using Integer as type here is ok.

@@ -66,6 +70,9 @@ public InternalAggregation[] buildAggregations(long[] owningBucketOrds) throws I
);
bucket.bucketOrd = ordsEnum.ord();
buckets.add(bucket);
if (++count >= size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think buckets.size() also works here instead of count variable?

@tmgordeeva tmgordeeva added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 6, 2023
@elasticsearchmachine elasticsearchmachine removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 6, 2023
@tmgordeeva tmgordeeva added :Analytics/Geo Indexing, search aggregations of geo points and shapes and removed needs:triage Requires assignment of a team area label labels Feb 6, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @tmgordeeva, I've created a changelog YAML for you.

@tmgordeeva tmgordeeva added :StorageEngine/TSDB You know, for Metrics and removed :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Feb 6, 2023
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticsearchmachine
Copy link
Collaborator

Hi @tmgordeeva, I've updated the changelog YAML for you.

@martijnvg martijnvg changed the title Size for time series Add size parameter to time_series aggregation Feb 7, 2023
@martijnvg martijnvg merged commit 5c38d4c into elastic:main Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants