Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime fields to optionally ignore script errors #92380

Merged
merged 37 commits into from
Dec 23, 2022

Conversation

cbuescher
Copy link
Member

@cbuescher cbuescher commented Dec 14, 2022

Currently Elasticsearch always returns a shard failure once a runtime error arises from using a runtime field, the exception being script-less runtime fields. This also means that execution of the query for that shard stops, which is okay for development and exploration. In a production scenario, however, it is often desirable to ignore runtime errors and continue with the query execution.

This change adds a new a new on_script_error parameter to runtime field definitions similar to the already existing
parameter for scripted fields.
When on_script_error is set to continue, errors from script execution are effectively ignored. This means affected
documents don't show up in query results, but also don't prevent other matches from the same shard.
Runtime fields accessed through the fields API don't return values on errors, aggregations will ignore documents that
throw errors.

Note that this change affects scripted runtime fields only, while leaving default behaviour untouched. Also, ignored errors are not reported back to users for now.

Relates to #72143

@cbuescher cbuescher added the WIP label Dec 14, 2022
@cbuescher cbuescher force-pushed the errors-on-rtf branch 2 times, most recently from bb23d7c to a913be7 Compare December 15, 2022 16:53
@cbuescher cbuescher changed the title WIP: Allow script errors on runtime fields Add option to allow script errors on runtime fields Dec 16, 2022
@cbuescher cbuescher added :Search/Search Search-related issues that do not fall into other categories >enhancement and removed WIP labels Dec 16, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Dec 16, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @cbuescher, I've created a changelog YAML for you.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments that should simplifying things quite a bit. Could you also leave a summary in the description about current behaviour for script-less runtime fields and mention that it is not affected by your change, as well as clearly state that we are for now not exposing any error reporting but only ignoring errors?

docs/reference/mapping/runtime.asciidoc Outdated Show resolved Hide resolved
docs/reference/mapping/runtime.asciidoc Outdated Show resolved Hide resolved
if (onErrorContinue) {
// activate error tolerance for this field in the error handler
searchLookup.getExceptionHandler().continueOnErrorForField(name());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit counter-intuitive. The field type holds the new flag, and is also responsible for creating the concrete script instance using the leafFactory method, and that is where the script gets effectively executed. I feels unnatural to have to hold a centralized set of all the continue on error fields. Could the error skipping logic be somehow shared in the AbstractFieldScript? I think that if we stick to no error reporting for now, search execution context should not need any updating, while it will somehow become the natural place to accumulate errors once we add error reporting, but the error handling logic still may not fit in it.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very close to me. There is one thing to be addressed around the composite runtime field. All the rest looks great. I have not paid a lot of attention to tests but I will have another look tomorrow.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of comments around things that we should iterate on, but we can address them in followup PRs. LGTM!

@@ -139,7 +139,7 @@ private FieldValues<Boolean> scriptValues() {
BooleanFieldScript.Factory scriptFactory = scriptCompiler.compile(script.get(), BooleanFieldScript.CONTEXT);
return scriptFactory == null
? null
: (lookup, ctx, doc, consumer) -> scriptFactory.newFactory(name, script.get().getParams(), lookup)
: (lookup, ctx, doc, consumer) -> scriptFactory.newFactory(name, script.get().getParams(), lookup, OnScriptError.FAIL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gave me a bit of headache. It seems like we may be able to reuse the error handling added at the script level for index time scripts. Their catch is at a higher level (FieldMapper#executeScript), and we may be able to remove that in favour of populating the flag in scriptValues according to the mapping parameter. Not something I would do in this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this, it is possible, yet it is a bit of work. We would need to have a way to provide a callback to accumulate errors or at least field names within the script, which we would call when as part of catching the error. It is not so bad to keep on using FAIL and then have an additional catch only for index time scripted fields.

public LeafFactory newFactory(String field, Map<String, Object> params, SearchLookup lookup) {
return ctx -> new StringFieldScript(field, params, lookup, ctx) {
public LeafFactory newFactory(String field, Map<String, Object> params, SearchLookup lookup, OnScriptError onScriptError) {
return ctx -> new StringFieldScript(field, params, lookup, OnScriptError.FAIL, ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of confusing, but I see why you did it that way. I guess we should look at consolidating the error handling logic in script-less runtime fields, as a follow-up. They have a deeper catch when extracting values from _source, hence effectively will not throw exception ever, so I suspect saying FAIL or CONTINUE won't make a difference. The controversial aspect here is that we are introducing support for the flag for scriptless runtime fields too, yet it has no effect on them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controversial aspect here is that we are introducing support for the flag for scriptless runtime fields too, yet it has no effect on them.

Scratch that. You can only specify on_script_error when a script is also provided, so we are good. It may be confusing for users that script-less runtime fields can't be configured to throw error, but it kind of makes sense as effectively there is no script provided for them. We may want to clarify the docs around this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised #92550 to clean up some of the existing ad-hoc try catch in favour of reusing the newly added centralized error handling.

Copy link
Member

@javanna javanna Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of documentation, I think it may make sense to explain in the docs what happens when a script fails after some values have been successfully emitted . Currently, the script exits but anything that was emitted before the error is taken into account, which means that having an error for a field on a specific document does not necessarily mean that that doc won't be seen by aggs and queries. Maybe this is ok given that most fields will be in practice single valued, yet it deserves some discussions. I would imagine that this will be good input for how to count errors once we report them back.

@javanna
Copy link
Member

javanna commented Dec 22, 2022

run elasticsearch-ci/part-3

@javanna javanna changed the title Add option to allow script errors on runtime fields Runtime fields to optionally ignore script errors Dec 23, 2022
@javanna javanna merged commit 8067f01 into elastic:main Dec 23, 2022
javanna added a commit that referenced this pull request Dec 23, 2022
Reflect updated PR title in the changelog entry
javanna added a commit to javanna/elasticsearch that referenced this pull request Dec 23, 2022
…ields

We recently introduced configurable error handling for runtime fields through the `on_script_error` mappings parameter (see elastic#92380). Script-less runtime fields are
though lenient and non configurable. This commit removes error handling that's
specific for script-less runtime fields, in favour of reusing the common runtime fields
error handling mechanism and harcoding `continue` as the only option for script-less
runtime fields.
}
}

public final void testOnScriptErrorFail() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at the merged changes - I'm trying to understand why this test isn't covered by the second code block in testOnScriptError(), looks identical to me so far. What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I am not sure what I was trying to do. It was last year :) I may have wanted to remove that second block in favour of the separate test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants