Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket #32692

Closed
Tracked by #82808
ghost opened this issue Aug 7, 2018 · 8 comments
Closed
Tracked by #82808
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@ghost
Copy link

ghost commented Aug 7, 2018

Elasticsearch version (bin/elasticsearch --version): 6.3.0 official docker image

Plugins installed: []

JVM version (java -version): 10.0.1

OS version (uname -a if on a Unix-like system):
Linux 389f11186e5b 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Haven't seen the Git issue posted for this thread, so I'm posting it to get the ball rolling since my team has encountered it as well.

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket. However, composite aggregations are multi-bucket, so this should work.

Steps to reproduce:

  1. Template mapping
PUT _template/template_default
   {
   	"mappings": {
   	  "_doc": {
   		"_all": {
   				  "enabled": false
   			  },
   			"dynamic": "strict",
   			"properties": {
   				"itemId": {
   				   "type": "keyword",
   				   "norms": false
   				},
   				"inputQty": {
   				   "type": "integer",
   				   "index": false
   				},
   				"orderQty": {
   				   "type": "integer",
   				   "index": false
   				},
   				"centerId": {
   					"type": "keyword",
   					"eager_global_ordinals": true,
   					"norms": false
   				},
   				"submittedQty": {
   					"type": "integer",
   					"index": false
   				},
   				"confirmedQty": {
   					"type": "integer",
   					"index": false
   				}
   		}
   	  }
   	}
   }
  1. REST call
POST items-0*/_search?ignore_unavailable=true
   {
   	"size": 0,
   	"track_total_hits": false,
   	"aggs" : {
   	  "myBuckets" : {
   		"composite" : {
   		  "size" : 100000,
   		  "sources" : [
   			  { "center_name" : { "terms" : { "field" : "centerId"} } }
   			]
   		  },
   		  "aggs" : {
   			"requested_units" : { "sum": { "field" : "inputQty" } },
   			"approved_units"  : { "sum": { "field" : "orderQty" } },
   			"submitted_quantity" : { "sum" : { "field" : "submittedQty"} },
   			"confirmed_quantity" : { "sum" : { "field" : "confirmedQty"} }
   		  }
   	  },
   	  "check_pipeline_agg": {
   		"sum_bucket": {
   		  "buckets_path": "fc_buckets>requested_units"
   		}
   	  }
   	}
   }

Provide logs (if relevant):
The error that comes back will be similar to:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "1_8dwXRuT565uQg11iZ_SA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
      }
    }
  },
  "status": 400
}
@jimczi jimczi self-assigned this Aug 9, 2018
@parallelthought
Copy link

parallelthought commented Nov 21, 2018

Observing the same error in v6.4.3.

Without the ability to use the Pipeline aggregation, Composite aggregation loses a lot of flexibility.

I'm willing to work on this missing feature. Looking forward to any hints about the "marker" that is missing as mentioned in the forum link.

There is an error (or typo) in the above mentioned query for reproducing this behaviour i.e. the pipeline aggregation should be as mentioned below:

       check_pipeline_agg": {
   		"sum_bucket": {
   		  "buckets_path": "mybuckets>requested_units"
   		}
   	  } 

@consulthys
Copy link
Contributor

@jimczi Any updates on this?

@jehutywong
Copy link

+1

@homerogon
Copy link

Also having the same issue. Any update on this?

@dorr-fg
Copy link

dorr-fg commented May 26, 2019

Looking for a way to output a report specifying all customers having sum of sales larger than X.
I'm afraid that a regular aggregation would probably not scale to the amount of customers I want to output.

The composite aggregation seems to be the right tool, but now I'll have to get all customer aggregated buckets first, and then do application-level filtering for the sum. This isn't terrible, but the amount of traffic might be wasteful.

So, very interested in combining these two great abilities.

@jimczi jimczi assigned jimczi and unassigned jimczi Jul 5, 2019
@jimczi
Copy link
Contributor

jimczi commented Jul 5, 2019

While the fix would be simple it might be misleading for users since pipeline buckets are applied to the final buckets. One thing I'd like to understand is why this would be needed since the idea of the composite aggregation is to paginate over the buckets instead of returning them all in a single response. Applying a bucket pipeline aggregation on a single page of result is not very helpful if the goal is to get the total sum of the buckets.

@polyfractal
Copy link
Contributor

Forgot to update this ticket. We discussed this in a team meeting. We think it might be possible to support a subset of the pipeline aggs, namely the ones that are "self contained" like a bucket_script. E.g. the pipeline agg is only enriching an existing bucket, and doesn't rely on data from any other buckets. This would be safe and compatible with composite agg.

Other pipelines like a derivative are not safe, because there is no guarantee that the page has all the required data.

Not entirely sure how we'd go about implementing this -- some kind of new marker interface? Or if it's worth the effort to implement for such a limited functionality.

@wchaparro
Copy link
Member

This is not something we plan to implement in the near future in aggregations, and has been superceded by our focus on ESQL. Closing as not planned. If your feel strongly about this one, please let us know.

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

9 participants