Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible script query #21479

Closed
nik9000 opened this issue Nov 10, 2016 · 21 comments
Closed

More flexible script query #21479

nik9000 opened this issue Nov 10, 2016 · 21 comments
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@nik9000
Copy link
Member

nik9000 commented Nov 10, 2016

Now that Kibana supports painless, some strange things have come up around filtering. This is a proposal to implement a thing in Elasticsearch so Kibana doesn't have to do upsetting textual manipulation to user generated script fields.

Background: Kibana lets users configure script fields once and reuse them. The specific usage that is troubling is filtering which Kibana implements by creating a query like this:

                "script" : {
                    "script" : {
                        "inline" : "($scriptText) == params.value",
                        "lang"   : "painless",
                        "params" : {
                            "value" : "whatever the user picked"
                        }
                    }
                }

Specifically, the ($scriptText) == params.value part is scary because script fields that are perfectly valid painless most places are not valid in this script context. Scripts that contain return, ;, or variable declarations are in trouble with a construct like this.

I propose a new version of the script query that'd look like this:

                "script" : {
                    "script" : {
                        "inline"  : $scriptText,
                        "lang"    : "painless",
                        "matches" : "whatever the user picked"
                    }
                }

Instead of casting the return of the script to a boolean we cast it to whatever type is sent in the matches field and then check for equality.

@nik9000
Copy link
Member Author

nik9000 commented Nov 10, 2016

@clintongormley I'd like to fast track this a bit because I feel like doing text manipulation on the query is a bit fraught.

I also figure that if Kibana does this thing someone else might be doing it as well so it is likely worth fixing in a nice way.

I certainly don't want to rush it though.

I also think it is a thing we could build with our "contextless" scripts today and we could improve with the context work @jdconrad is doing. With "contextless" scripts we'd do a checked on the result from the script before comparing. With script context we'd force the return types to match so the script engine'd make the user perform the appropriate casts and give compile errors if they don't.

@Bargs
Copy link

Bargs commented Nov 10, 2016

I chatted with @nik9000 about this and agreed it would be nice to have first class support for this in the ES API.

We could work around this issue in Kibana by wrapping the user's script in a function, but as Nik said such textual manipulation is fragile. For one it would only work in painless, or we would have to have to hardcode the syntax for functions in every possible language the user might have enabled. We already do that in some cases but it's hacky and a nice side effect of Nik's proposal is that it would allow us to remove those existing conditionals from the code.

Nik also mentioned that the error reporting would be much worse if we wrapped the script in a function.

@jdconrad
Copy link
Contributor

jdconrad commented Nov 10, 2016

Hey @nik9000, would you mind clarifying what you mind by the following: "Specifically, the ($scriptText) == params.value part is scary because script fields that are perfectly valid painless most places are not valid in this script context." What script context is this, and what makes it invalid? Thanks.

@nik9000
Copy link
Member Author

nik9000 commented Nov 10, 2016

What script context is this, and what makes it invalid?

Say you have a script that looks like return 1;. It is valid painless. But when Kibana builds it as a filter you get a script that looks like (return 1;) == params.value which is all kinds of not going to work.

My point is that you can't do string manipulation to turn a painless script that returns a number into one that compares that number to another number. Sure, it works sometimes, but in general it doesn't. And it is super confusing.

@jdconrad
Copy link
Contributor

Got it. Thanks!

@rjernst
Copy link
Member

rjernst commented Nov 10, 2016

Kibana lets users configure script fields once and reuse them

This sounds like stored scripts. Why aren't they using that?

the ($scriptText) == params.value part is scary because script fields that are perfectly valid painless most places are not valid in this script context

But they will return a compile error if the user has something bogus there like return 1;, so what is the problem?

@Bargs
Copy link

Bargs commented Nov 11, 2016

This sounds like stored scripts. Why aren't they using that?

Not sure about that since I wasn't around when kibana scripted fields were implemented but my guess is that either stored scripts didn't exist at the time, or it was simply easier to store the script in .kibana since we needed to maintain a master list of kibana scripted fields and metadata about each one anyway.

But they will return a compile error if the user has something bogus there like return 1;, so what is the problem?

Think of Kibana scripted fields as a separate, higher level of abstraction than scripted fields in Elasticsearch. The user writes a single script, gives it a name, and we display it alongside all the regular fields anywhere they appear in the UI. From the Kibana user's perspective it's almost like a real field - they can search on it, filter on it, display its values, and aggregate on it. Under the hood we're taking that one script they wrote and using it in a bunch of different APIs, along with a little of the above mentioned magical text manipulation to get filtering to work. That worked fine when we only allowed lucene expressions.

Now with a painless (or groovy, etc) script that's more than a single expression, the user is going to get a nasty compile error when they click the little filter icon in the UI. Users expect that functionality to work and we can't limit them to single expression scripts because that would seriously reduce the benefits of having painless in the first place.

@rjernst
Copy link
Member

rjernst commented Nov 11, 2016

Then it should be done as a function in the generated wrapper script. I don't think we should complicate the script apis for this.

@jdconrad
Copy link
Contributor

jdconrad commented Nov 11, 2016

Trying to reuse a script across multiple contexts doesn't really make sense in a lot of cases. I know this is a bummer for our users, but a user should understand what each script does in each context. Though some scripts may be similar, it makes sense to have one per.

@nik9000
Copy link
Member Author

nik9000 commented Nov 11, 2016

Trying to reuse a script across multiple contexts doesn't really make sense in a lot of cases

Kibana has been doing this for quite a while. It works well in aggregations and script fields, I believe. They do it for filters too but they use this string manipulation hack. Those three cases have fairly similar contexts so long as you stick to doc values.

I want to give Kibana some way so that it can stop this brittle textual manipulation of the scripts. It make filters super brittle with painless. I don't want Kibana to go deeper down the textual manipulation of user script hole because we'll end up with some twisty stuff and it still won't work well.

@rjernst, what do you mean by "generated wrapper script"? Like a script that delegates to another script for one of it's inputs? The sounds fairly complex but might end up being a better solution. Something like:

                "script" : {
                    "script" : {
                        "inline" : "params.dv == params.value",
                        "lang"   : "painless",
                        "params" : {
                            "dv" : {
                              "script": {
                                "inline": $scriptText,
                                "lang": "painless"
                              }
                            },
                            "value" : "whatever the user picked"
                        }
                    }
                }

That'd solve the brittleness problem and be fairly interesting from a compostability perspective. It seems like a fairly heavy, abstract solution to this problem but it'd work. I'm not sure how I feel about it.

@jdconrad
Copy link
Contributor

jdconrad commented Nov 11, 2016

This is exactly the situation I'm talking about. They're using a script here where it doesn't make sense. The user should make a separate script for this that is expected to return a boolean. The value can be passed in automatically as a parameter.

@Bargs
Copy link

Bargs commented Nov 14, 2016

Asking the user to copy paste their script with a slight modification for filtering isn't a viable solution IMO. It'll lead to subtle copy paste errors. It also isn't scalable. What happens when there's another slightly different context we need to support? We ask them to copy/paste 3 times, then 4 times, etc? Not a good user experience.

I've also been playing with the idea of a function wrapper workaround as mentioned in my previous comment and I don't think it's going to work. As far as I can tell Painless requires functions to have return type declarations, and we can't infer that in Kibana so we can't auto generate a wrapper function for the user's script.

I'm not married to the original solution proposed in this ticket but I think we need some way to take a single script which produces a return value and apply it in multiple contexts. From the user's perspective there shouldn't be a difference between a script that has a single expression and one that has some logic and a return statement.

@nik9000
Copy link
Member Author

nik9000 commented Nov 15, 2016

I talked with @jdconrad about this and I convinced him that the places where Kibana is reusing the same script are similar enough that it is pretty reasonable to reuse the scripts. Places like the script field in the terms aggregation. The script filter is the most weird place where the scripts are reused because it returns the wrong thing.

We want to remove the need for the brittle text manipulation but we aren't yet sure of the nicest way to do it.

Bargs added a commit to Bargs/kibana that referenced this issue Nov 15, 2016
Multi-statement scripts are currently incompatible with the way we
create filters on scripted fields. We're investigating the possibility
of enhancing elasticsearch to fix this issue, but we may have to
fallback on showing users a warning if we can't reach an agreement in
the near term.

See elastic#9024
Related elastic/elasticsearch#21479
@Bargs
Copy link

Bargs commented Nov 15, 2016

It might be cool if we could treat the user's script as a named function, so that we could create the filter like so:

"script" : {
    "script" : {
        "inline" : "userScript(doc) == params.value",
        "lang"   : "painless",
        "params" : {
            "value" : "whatever the user picked"
        }
    }
}

That might give us a lot of flexibility to use the provided user script in interesting ways down the road as well. It would be super swell if it didn't matter what language the user script is written in, but without knowing how this is all implemented I'm probably just spewing crazy talk.

@s1monw
Copy link
Contributor

s1monw commented Nov 16, 2016

I might miss a thing but would it make sense to specialize that particular usercase userScript(doc) == params.value in ScriptQuery? we already have this:

  leafScript.setDocument(doc);
Object val = leafScript.run();
if(val instanceof Number) {
  return ((Number) val).longValue() != 0;
}

if this is the usecase we can maybe turn it into:

  leafScript.setDocument(doc);
Object val = leafScript.run();
if(val instanceof Number) {
  return ((Number) val).longValue() == userSpecified; // we gotta make it a bit more complicated by bascially that's it...
}

then we can just use the script and compare it internally without any API being more complicated?

@nik9000
Copy link
Member Author

nik9000 commented Nov 16, 2016

@s1monw I think that fairly similar to what I proposed first....

@jpountz
Copy link
Contributor

jpountz commented Nov 16, 2016

I don't feel good about this feature: making it easy to match scripts against exact values feels wrong to me since it means that the data was not indexed correctly. I think it is fine for one-time usage, but here we are discussing the ability to search on reusable scripts, which means the user should have an index pipeline that creates a proper field based on this script so that querying it is fast.

I think it's great to have the script query exposed in Kibana for these one-time queries that one did not think about at index time. But I don't think we should aim at making it easy to wire existing reusable scripts and the script query. This looks like an anti-pattern to me.

@nik9000
Copy link
Member Author

nik9000 commented Nov 16, 2016

I don't feel good about this feature: making it easy to match scripts against exact values feels wrong to me since it means that the data was not indexed correctly.

Indeed the fact that this is just a couple of clicks in Kibana is a bit troubling. It isn't obvious while you are doing it that this might have non-trivial performance cost. Part of what makes this not so bad in Kibana is that to get to the page where you make the selection we've already shown you an aggregation across the entire date range so the user has already waited once. And we've potentially already lit up the disk cache for the field.

@alexbrasetvik
Copy link
Contributor

Seconding @jpountz here, this looks like an easy way to make a seemingly innocuous query become super-expensive. In my experience, scripts are a common source of unexpected (to the novice user) CPU-cost, as a user doesn't necessarily think of the massive cost difference in execution of {"term": {"foo": "bar"}} vs. "doc['foo'].value == params.value". If this happens behind the scenes in Kibana, it's even less likely that the user considers (or is even aware) of this.

@clintongormley
Copy link
Contributor

Another issue here is that script fields (will) run in a different context to script queries: scripted fields should have access to the _source, while script queries should only have access to doc[], so there is no guarantee that a scripted field would work as a script query in the future.

epixa pushed a commit to elastic/kibana that referenced this issue Nov 17, 2016
Multi-statement scripts are currently incompatible with the way we
create filters on scripted fields. We're investigating the possibility
of enhancing elasticsearch to fix this issue, but we may have to
fallback on showing users a warning if we can't reach an agreement in
the near term.

See #9024
Related elastic/elasticsearch#21479
elastic-jasper added a commit to elastic/kibana that referenced this issue Nov 17, 2016
Backports PR #9090

**Commit 1:**
Add a note on multi-statement scripted field limitation

Multi-statement scripts are currently incompatible with the way we
create filters on scripted fields. We're investigating the possibility
of enhancing elasticsearch to fix this issue, but we may have to
fallback on showing users a warning if we can't reach an agreement in
the near term.

See #9024
Related elastic/elasticsearch#21479

* Original sha: b4aaed8
* Authored by Matthew Bargar <[email protected]> on 2016-11-15T22:47:26Z
elastic-jasper added a commit to elastic/kibana that referenced this issue Nov 17, 2016
Backports PR #9090

**Commit 1:**
Add a note on multi-statement scripted field limitation

Multi-statement scripts are currently incompatible with the way we
create filters on scripted fields. We're investigating the possibility
of enhancing elasticsearch to fix this issue, but we may have to
fallback on showing users a warning if we can't reach an agreement in
the near term.

See #9024
Related elastic/elasticsearch#21479

* Original sha: b4aaed8
* Authored by Matthew Bargar <[email protected]> on 2016-11-15T22:47:26Z
epixa pushed a commit to elastic/kibana that referenced this issue Nov 17, 2016
Backports PR #9090

**Commit 1:**
Add a note on multi-statement scripted field limitation

Multi-statement scripts are currently incompatible with the way we
create filters on scripted fields. We're investigating the possibility
of enhancing elasticsearch to fix this issue, but we may have to
fallback on showing users a warning if we can't reach an agreement in
the near term.

See #9024
Related elastic/elasticsearch#21479

* Original sha: b4aaed8
* Authored by Matthew Bargar <[email protected]> on 2016-11-15T22:47:26Z
epixa pushed a commit to elastic/kibana that referenced this issue Nov 17, 2016
Backports PR #9090

**Commit 1:**
Add a note on multi-statement scripted field limitation

Multi-statement scripts are currently incompatible with the way we
create filters on scripted fields. We're investigating the possibility
of enhancing elasticsearch to fix this issue, but we may have to
fallback on showing users a warning if we can't reach an agreement in
the near term.

See #9024
Related elastic/elasticsearch#21479

* Original sha: b4aaed8
* Authored by Matthew Bargar <[email protected]> on 2016-11-15T22:47:26Z
@jdconrad jdconrad added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Nov 17, 2016
@nik9000 nik9000 removed the discuss label Nov 21, 2016
@nik9000
Copy link
Member Author

nik9000 commented Nov 21, 2016

After an over video discussion we came the to the conclusion that #21635 is good enough for Kibana and hopefully should be good enough for other users. This is an example of doing a comparison. You use text manipulation to wrap the scripted field in a lambda and run the comparison. This isn't perfect - if the script you are wrapping defines a function then it'll fail, but this is much simpler on our side than any of the other proposed solutions and it has the advantage of making it difficult to do a potentially slow thing. And that is kind of a good thing.

@nik9000 nik9000 closed this as completed Nov 21, 2016
airow pushed a commit to airow/kibana that referenced this issue Feb 16, 2017
Backports PR elastic#9090

**Commit 1:**
Add a note on multi-statement scripted field limitation

Multi-statement scripts are currently incompatible with the way we
create filters on scripted fields. We're investigating the possibility
of enhancing elasticsearch to fix this issue, but we may have to
fallback on showing users a warning if we can't reach an agreement in
the near term.

See elastic#9024
Related elastic/elasticsearch#21479

* Original sha: b4aaed8
* Authored by Matthew Bargar <[email protected]> on 2016-11-15T22:47:26Z

Former-commit-id: 24840a9
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

8 participants