Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scriptable Metrics Aggregation #7075

Merged
merged 1 commit into from
Aug 20, 2014
Merged

Scriptable Metrics Aggregation #7075

merged 1 commit into from
Aug 20, 2014

Conversation

colings86
Copy link
Contributor

A metrics aggregation which runs specified scripts at the init, collect, combine, and reduce phases

Closes #5923

@clintongormley
Copy link
Contributor

Looking forward to reading the docs :)

@colings86
Copy link
Contributor Author

@clintongormley yep. Long way from finished yet as need to write tests and read the docs. Put the PR up as @brwe wanted to have a look at it and so others could have a play around with it but its not ready for review yet

@mattweber
Copy link
Contributor

+1! So cool, can't wait to give this a try. Thanks!

@colings86
Copy link
Contributor Author

Following gist can be followed as an example of using the scriptable metrics aggregation.

https://gist.github.com/colings86/0994f1b188b41be6052a

@@ -156,4 +157,8 @@ public static TopHitsBuilder topHits(String name) {
public static GeoBoundsBuilder geoBounds(String name) {
return new GeoBoundsBuilder(name);
}

public static ScriptBuilder script(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the name should reflect the fact that this is a metric aggregation to leave some namespace if we have a scripted bucket agg some day?

@jpountz
Copy link
Contributor

jpountz commented Jul 30, 2014

I left some comments but my major concern is about the client: do we need to expose it to the scripts? I'm worried that it could allow to do crazy things?

@colings86
Copy link
Contributor Author

@jpountz I agree that exposing the client is dangerous and it allows for some crazy things. I spoke to @imotov about this and he said that it was something explicitly requested for the scripted facets. I would be happy to not provide it to the scripts, it would also make the code a bit cleaner

@uboness
Copy link
Contributor

uboness commented Jul 30, 2014

+1 on not exposing it... we shouldn't encourage doing crazy things with scripts

@rjernst
Copy link
Member

rjernst commented Aug 5, 2014

One note: expression scripts can only be used for "search", they do not allow "executable". It looks like only the map phase uses search, and the rest executable?

@colings86
Copy link
Contributor Author

Yeah you are right. Actually since the expression language doesn't support assignment (i think, correct me if I am wrong) I don't think you would be able to use it even in the map script since the map script has to update some state in the params. Also, currently it is assumed that the same language is used for all the scripts in a single aggregations. I think the use cases for this aggregation are almost always going to need more complex scripts than the Lucene expression language can provide, so I see them mostly using groovy and maybe other languages through plugins

@@ -18,6 +18,8 @@
*/
package org.elasticsearch.search.aggregations;

import org.elasticsearch.search.aggregations.metrics.scripted.ScriptedMetricParser;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this import with the other ones?

@jpountz
Copy link
Contributor

jpountz commented Aug 6, 2014

Left some comments but it looks great in general!

@jpountz jpountz removed the review label Aug 6, 2014
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : "aggregation['transactions'] = []",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a shorter variable than aggregation in the script. How about _agg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@colings86
Copy link
Contributor Author

@jpountz I made the code changes you suggested. I also updated the docs with a worked example and left some replies to your comments. Let me know what you think.

@jpountz jpountz added the review label Aug 20, 2014
@jpountz
Copy link
Contributor

jpountz commented Aug 20, 2014

It looks good in general, docs are great. In my opinion, the only things that need to be done before merging it would be:

  • removing the aggregation context from the script context
  • maybe rename aggregation to _agg in the script context as @clintongormley suggested

@jpountz
Copy link
Contributor

jpountz commented Aug 20, 2014

LGTM

@jpountz jpountz removed the review label Aug 20, 2014
A metrics aggregation which runs specified scripts at the init, collect, combine, and reduce phases

Closes #5923
@colings86 colings86 merged commit 7f943f0 into elastic:master Aug 20, 2014
@colings86 colings86 assigned colings86 and unassigned colings86 Aug 21, 2014
@colings86 colings86 deleted the feature/scriptableMetricsAggregation branch August 21, 2014 15:10
@diegoasth
Copy link

Great News! I am trying to use a scripted metric aggregation inside a date_histogram aggregation but unfortunately it seems to take docs outside the date-bucket into consideration, but this is not what I expected..

@diegoasth
Copy link

{
"size": 0,
"aggs": {
"histo": {
"date_histogram": {
"field": "timestamp",
"interval": "1h",
"min_doc_count": 0,
"format": "yyyy-MM-dd HH:mm"
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script": " _agg['kpiA'] =0;",
"map_script": "_agg.kpiA += doc['kpiA'].value; ",
"combine_script": "teste =0; teste += _agg.kpiA; return teste;",
"reduce_script": "profit = 0; for (a in _aggs) { profit += a;}; return profit;"
}
}
}
}
}
}

@colings86
Copy link
Contributor Author

@diegoasth I have tried to reproduce the issue your are seeing in the gist below [1], in both 1.4 and in the master branch, without success. Could you take a look at the gist and try it on your system to see if it reproduces your issue. If not, could you create a similar gist to illustrate the steps to reproduce the issue you are seeing?

[1] https://gist.github.com/colings86/d1a9c76898b4b2783b89

@diegoasth
Copy link

Hi @colings86 , thanks or your attention.

in deed, our gist does not reproduce the issue, so I created another one that shows it. My goal is to use a scripted_metric inside a date_histogram (let me know if it is not possible). I expect that the scripts (init_script, map_script, combine_script and reduce_script) consider only the documents in that specific date-bucket. For the sake of comparison, I used the native SUM aggregation inside the date_histogram and it worked fine, returning the expected results per bucket.

https://gist.github.com/diegoasth/406148525616f9f527ab

@colings86
Copy link
Contributor Author

@diegoasth this is indeed a bug. Thanks for raising it and for providing the steps to reproduce it. I'll look into getting it fixed

@colings86
Copy link
Contributor Author

@diegoasth I have raised the following issue for this bug:

#8036

@diegoasth
Copy link

thank you very much. The ability to wite my custom aggregations (inside ES) is a very awaited feature.

@clintongormley clintongormley changed the title Aggregations: Scriptable Metrics Aggregation Scriptable Metrics Aggregation Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scriptable Metric Aggregation
7 participants