Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket #32692

ghost · 2018-08-07T22:55:32Z

Elasticsearch version (bin/elasticsearch --version): 6.3.0 official docker image

Plugins installed: []

JVM version (java -version): 10.0.1

OS version (uname -a if on a Unix-like system):
Linux 389f11186e5b 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Haven't seen the Git issue posted for this thread, so I'm posting it to get the ball rolling since my team has encountered it as well.

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket. However, composite aggregations are multi-bucket, so this should work.

Steps to reproduce:

Template mapping

PUT _template/template_default
   {
   	"mappings": {
   	  "_doc": {
   		"_all": {
   				  "enabled": false
   			  },
   			"dynamic": "strict",
   			"properties": {
   				"itemId": {
   				   "type": "keyword",
   				   "norms": false
   				},
   				"inputQty": {
   				   "type": "integer",
   				   "index": false
   				},
   				"orderQty": {
   				   "type": "integer",
   				   "index": false
   				},
   				"centerId": {
   					"type": "keyword",
   					"eager_global_ordinals": true,
   					"norms": false
   				},
   				"submittedQty": {
   					"type": "integer",
   					"index": false
   				},
   				"confirmedQty": {
   					"type": "integer",
   					"index": false
   				}
   		}
   	  }
   	}
   }

REST call

POST items-0*/_search?ignore_unavailable=true
   {
   	"size": 0,
   	"track_total_hits": false,
   	"aggs" : {
   	  "myBuckets" : {
   		"composite" : {
   		  "size" : 100000,
   		  "sources" : [
   			  { "center_name" : { "terms" : { "field" : "centerId"} } }
   			]
   		  },
   		  "aggs" : {
   			"requested_units" : { "sum": { "field" : "inputQty" } },
   			"approved_units"  : { "sum": { "field" : "orderQty" } },
   			"submitted_quantity" : { "sum" : { "field" : "submittedQty"} },
   			"confirmed_quantity" : { "sum" : { "field" : "confirmedQty"} }
   		  }
   	  },
   	  "check_pipeline_agg": {
   		"sum_bucket": {
   		  "buckets_path": "fc_buckets>requested_units"
   		}
   	  }
   	}
   }

Provide logs (if relevant):
The error that comes back will be similar to:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "1_8dwXRuT565uQg11iZ_SA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
      }
    }
  },
  "status": 400
}

The text was updated successfully, but these errors were encountered:

parallelthought · 2018-11-21T22:49:24Z

Observing the same error in v6.4.3.

Without the ability to use the Pipeline aggregation, Composite aggregation loses a lot of flexibility.

I'm willing to work on this missing feature. Looking forward to any hints about the "marker" that is missing as mentioned in the forum link.

There is an error (or typo) in the above mentioned query for reproducing this behaviour i.e. the pipeline aggregation should be as mentioned below:

       check_pipeline_agg": {
   		"sum_bucket": {
   		  "buckets_path": "mybuckets>requested_units"
   		}
   	  }

consulthys · 2018-12-13T05:32:53Z

@jimczi Any updates on this?

jehutywong · 2019-01-21T05:08:56Z

+1

homerogon · 2019-04-25T19:10:18Z

Also having the same issue. Any update on this?

dorr-fg · 2019-05-26T09:15:15Z

Looking for a way to output a report specifying all customers having sum of sales larger than X.
I'm afraid that a regular aggregation would probably not scale to the amount of customers I want to output.

The composite aggregation seems to be the right tool, but now I'll have to get all customer aggregated buckets first, and then do application-level filtering for the sum. This isn't terrible, but the amount of traffic might be wasteful.

So, very interested in combining these two great abilities.

jimczi · 2019-07-05T17:04:08Z

While the fix would be simple it might be misleading for users since pipeline buckets are applied to the final buckets. One thing I'd like to understand is why this would be needed since the idea of the composite aggregation is to paginate over the buckets instead of returning them all in a single response. Applying a bucket pipeline aggregation on a single page of result is not very helpful if the goal is to get the total sum of the buckets.

polyfractal · 2019-08-02T19:00:34Z

Forgot to update this ticket. We discussed this in a team meeting. We think it might be possible to support a subset of the pipeline aggs, namely the ones that are "self contained" like a bucket_script. E.g. the pipeline agg is only enriching an existing bucket, and doesn't rely on data from any other buckets. This would be safe and compatible with composite agg.

Other pipelines like a derivative are not safe, because there is no guarantee that the page has all the required data.

Not entirely sure how we'd go about implementing this -- some kind of new marker interface? Or if it's worth the effort to implement for such a limited functionality.

wchaparro · 2024-07-02T19:21:24Z

This is not something we plan to implement in the near future in aggregations, and has been superceded by our focus on ESQL. Closing as not planned. If your feel strongly about this one, please let us know.

jimczi added >bug :Analytics/Aggregations Aggregations labels Aug 9, 2018

jimczi self-assigned this Aug 9, 2018

jehutywong mentioned this issue Jan 21, 2019

achieve unlimited bucket size aggregation #37598

Closed

jimczi assigned jimczi and unassigned jimczi Jul 5, 2019

This was referenced Jul 10, 2019

Document that pipeline aggs are not compatible with composite agg #44180

Merged

Allow composite agg keys to be selected inside bucket selector #36853

Closed

$@polyfractal$ polyfractal mentioned this issue Apr 3, 2020

Composite aggregation doesn't support min_doc_count option #32452

Closed

rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

This was referenced May 7, 2021

BucketSelector pipeline aggregation extension opendistro-for-elasticsearch/alerting#374

Merged

BucketSelector pipeline aggregation extension opensearch-project/OpenSearch#674

Open

This was referenced Jul 6, 2021

SQL: Cannot use HAVING clause with aggregates and fields #71350

Open

Docs: Mention pipeline aggregations that work together with composite aggregations #74951

Closed

rishabhmaurya mentioned this issue Aug 20, 2021

Add BucketSelector pipeline aggregation extension opensearch-project/alerting#144

Merged

wchaparro added >enhancement and removed >bug labels Jan 11, 2022

not-napoleon mentioned this issue Feb 22, 2022

Meta: Refactor pipeline aggregations #82808

Open

15 tasks

wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket #32692

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket #32692

ghost commented Aug 7, 2018

parallelthought commented Nov 21, 2018 •

edited

Loading

consulthys commented Dec 13, 2018

jehutywong commented Jan 21, 2019

homerogon commented Apr 25, 2019

dorr-fg commented May 26, 2019

jimczi commented Jul 5, 2019

polyfractal commented Aug 2, 2019

wchaparro commented Jul 2, 2024

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket #32692

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket #32692

Comments

ghost commented Aug 7, 2018

parallelthought commented Nov 21, 2018 • edited Loading

consulthys commented Dec 13, 2018

jehutywong commented Jan 21, 2019

homerogon commented Apr 25, 2019

dorr-fg commented May 26, 2019

jimczi commented Jul 5, 2019

polyfractal commented Aug 2, 2019

wchaparro commented Jul 2, 2024

parallelthought commented Nov 21, 2018 •

edited

Loading