[pkg/ottl] Improve Errors from Functions #16519

TylerHelmuth · 2022-11-29T15:20:48Z

@bogdandrutu I think OTTL needs a larger discussion about errors since most functions were made before errors could be returned.

One argument is that functions should always return errors when something undesirable occurs, like the wrong type being returned by a Getter. This allows the components that use OTTL to decide what to do with the error.

Another argument is that functions should ignore situations where Getters return the incorrect type and other "lesser" errors. The argument being these aren't problems that should stop processing and the caller doesn't need to worry about them except for a debug log (Debug logging is spotty but the concept is possible).

The OTTl functions are inconsistent right now as the package has grown, but I'd say most follow the second pattern since they were created before errors could be returned.

My personal opinion is that OTTL functions should be returning errors in all undesirable situations. It feels best if the caller handles the error since OTTL doesn't know what's using it and shouldn't make assumptions. I would like to enhance the OTTL to be able to return errors with different levels of severity. I don't know if that's against Go idiosyncrasies, but I like the concept of being able to differentiate between an error caused by a Getter returning the wrong type and something truest bad like trying to parse a string as JSON that isn't JSON.

I will open an issue tomorrow to track this work and continue the discussion.

I will also update this function tomorrow to return an error. At the moment the implication of this is that in the TransformProcessor if you pass a non-string to this function then all processing for the current batch will terminate and the data and error will be returned from the consume function.

Originally posted by @TylerHelmuth in #16444 (comment)

TylerHelmuth · 2022-11-29T15:22:42Z

Continuing conversation from thread:

Not sure I buy this, let's say someone writes ParseJSON("123") is an "error" but ParseJSON(123) is not?

That's fair, being unable to parse a string as json is essentially the same as trying to parse a number. Maybe a better example would be something like an error returned from a Setter/Getter, which would indicate an issue interacting with the underlying telemetry.

It's also possible that errors are just errors and the user should handle them all the same. I think for the transform processor it could be made to be less aggressive and instead of giving up on all telemetry transformation when receiving an error just give up on the current tCtx and move on to the next.

evan-bradley · 2022-11-29T15:59:22Z

I think for the transform processor it could be made to be less aggressive and instead of giving up on all telemetry transformation when receiving an error just give up on the current tCtx and move on to the next.

I think this should probably be the default behavior for the transform processor. The one consideration is that if transformation has important implications later on such as in routing or in rules in the backend, users should be made aware that they will need to perform additional validation, or we should allow configuring what happens if an error occurs.

TylerHelmuth · 2022-11-29T16:02:03Z

@evan-bradley thats a good point. I think the transformprocessor can be made to continue on err by default, but can be configurable if the user wants something stricter.

TylerHelmuth · 2022-12-07T18:44:14Z

Even if we continue to transform data and accrue errors along the way, when the processor returns those errors it will still stop the data from being passed to the next consumer. This means all that extra processing is a waste.

Instead, we can ensure any transform processor errors are logged before we return.

evan-bradley · 2022-12-08T19:54:04Z

In my opinion, the transform processor should by default continue processing in the case of an error, but offer disabling this behavior for a group of statements, possibly with a processor-level override. I think it will likely depend on exact user needs, but in general I think we should avoid dropping entire telemetry items. This would likely mean that by default the transform processor rarely returns a processing error since most errors are type errors.

evan-bradley · 2022-12-09T18:08:28Z

It looks like Stanza also supports configuring error behavior. This is configurable at the Stanza operator level, which I think would roughly translate to an OTTL statement, but I think it would be okay to allow configuring it in the OTTL for a group of statements. All Stanza operators default to allow processing to continue in the case of an error according to the docs.

TylerHelmuth · 2023-01-26T18:28:40Z

@evan-bradley @bogdandrutu @kentquirk I've been thinking about this issue again as I think it is a source of inconsistency within our functions.

Transform Processor Error Handling

I agree with @evan-bradley that the transformprocessor/ottl should support configuring error handling behavior. Since a processor returning an error means that the collector will drop all the data in that batch, it is quite severe if the transformprocessor experiences an error from a function today. I think we should allow users to decide if they want the transformprocessor to:

Halt execution of statements on any error and return the error.
Halt execution of the statements against the current TransformContext, log the error (probably as Debug), and continue to the next TransformContext without returning the error.
log the error (probably as Debug), and continue to the next statement for the current TransformContext without returning the error.

All three of these error behaviors would be changes to how the transformprocessor handles errors returned by statement.Execute or context.Consume[Traces|Metrics|Logs and require no changes to OTTL.

I think that we need to allow configuring error behavior between the Condition and the Invocation differently.

OTTL Error Handling

I think the larger question, and the one that is causing inconsistencies, is how OTTL functions produce errors during hot path execution. In my mind there are 2 core scenarios for us to discuss: type assertion and everything else. I think the "everything else" scenario is pretty easy: they should be returned as errors. I feel type assertion is where most of the inconsistencies arise.

Historically, OTTL functions have been written to be error-adverse concerning type assertion. They prefer to handle unwanted types or nil by exiting the function without. Since OTTL was built with the transform processor in mind, this allowed processing to continue even when an attribute didn't exist or was the wrong type.

This has become a very important capability within Conditions where the lack of a desired attribute or incorrect type of a desire attribute means that the TransformContext can continue to be processed despite an individual statement not being run. With the current handling, a missing attribute or incorrect type means that the Condition evaluates to false instead of error.

Functions themselves are a very difficult place to make error handling discussion, though. They don't know if they are being used in an Invocation, Condition, or as a parameter to another function. But with our current Getter interface, functions end up having to make decisions about nil and type assertions, so we've forced the burden of error handling upon them.

I think we need to come to a consensus about how OTTL is going to handle nil and type assertion. If a function uses a Getter and the returned type is either not the exact expected type or not a type the function knows how to handle what should the function do? As part of that decision I think we need to decide how OTTL the framework will enforce it.

Work in progress that may help:

Replace function interfaces with a factory #14712 (comment). This issue will end up giving our functions more structure.
[pkg/ottl] Add StringGetter #18042. An idea of how to enforce whatever decision we make around type assertion. By moving the assertion logic out of the function and into a specific Getter we can help standardize how functions get the types they expect. The current Getter interface would always stick around to handle hyper-specific types or scenarios where more than 1 type is allowed.

TylerHelmuth · 2023-01-26T18:30:37Z

@swiatekm-sumo @sumo-drosiek @astencel-sumo based on the PR you linked I am curious to hear your opinion as well.

swiatekm · 2023-01-27T15:38:45Z

I don't feel informed enough to speak to OTTL internals. My perspective as a user of processors which incorporate it:

We should not drop data or return errors the rest of the pipeline if we can avoid it or the user explicitly configured it to be so. This is the nuclear option.
By default, we should fail open whenever we can. This includes ignoring errors on processing functions and assuming errored conditions to have returned false.
In terms of configurability, @TylerHelmuth your proposal for transformprocessor looks reasonable. For conditions, being able to switch them to return true on error is probably a feature someone will ask for eventually, but it sounds a bit niche.

In particular, a function like ParseJSON is basically unusuable without being able to ignore parse errors.

TylerHelmuth · 2023-01-27T17:15:39Z

@swiatekm-sumo thank you for your perspective, it is very useful.

TylerHelmuth · 2023-01-27T17:26:19Z

By default, we should fail open whenever we can. This includes ignoring errors on processing functions and assuming errored conditions to have returned false.

I think we will be able to take advantage of the Parser's new Options API and it's Execute function to allow users to specify how it should handle errors. If they want to continue processing, the Execute function could log the error, and then continue. If they want to consume the error themselves, it could return the error like it does today.

An error handling system like this embedded in OTTL itself would then allow more consistent function requirements. Functions could more safely return errors knowing that the Parser will have the opportunity to handle them before a component does. We could make more definitive statements like "Functions that expect a string but receive an int should return an error".

Adds a new concept to Getters that allow specifying type. These type-specific Getters can be used by functions that require a specific type instead of using the normal Getter that always returns an interface. This PR implements a StringGetter, but we can add Getter's for int64, float64, bool, pcommon.Map, etc in the future. With these Getters functions and rely on OTTL for type assertion and with the ongoing error handling discussion these will help standardize how functions handle and return errors. Related to #16519

Updates the type getters to be stricter, erroring on nil instead of returning nil. While working on updating ConvertCase to using a StringGetter I realized that by allowing nil we had not committed to the strictness with typed getters that we should've. The goal of the typed getters and new error mode is to allow functions to be strict with errors without breaking future statements (IgnoreError mode). By allowing nil we put ourselves in a strange situation where ConvertCase(10, "upper") would return an error but ConvertCase(nil, "upper") wouldn't. When used in combination with other functions, our outcome would be inconsistent. If used with an int, set(attributes["test"], ConvertCase(10, "upper")) would cause an error, resulting in the "test" attribute not being set. If used with nil, set(attributes["test"], ConvertCase(nil, "upper")) would not error and would set "test" to nil. But this is inconsistent. nil is not a string and so we can't convert but we don't error and we end up mutating the telemetry. If a real life scenario where you might be converting an attribute that isn't present what should happen. Should set(attributes["an attribute that doesn't exist"], ConvertCase(attributes["an attribute that doesn't exist"], "upper")) actually set the attribute "an attribute that doesn't exist" to nil or should it error? I'm proposing it is more proper to error and not transmute the telemetry. With error_mode=ignore we can still move on to the next statement. When used in a condition it feels inappropriate to allow ConvertCase to return nil, which is what happens if we don't strictly error on nil. If we return nil then we end up doing an invalid comparison in my mind as we have a value we shouldn't have. I propose it is better to let the condition error, evaluate to false, and move on to the next statement. I believe this sentiment is applicable to all Converter functions. Converter functions should reliably returned a changed value as that is their strict intention. Link to tracking Issue: Related to #16519

TylerHelmuth · 2023-03-22T16:25:24Z

With the introduction of ErrorMode and the typed-getters I believe this issue is complete. More typed-getters will need added in the future as more functions get added, but the framework and pattern is in place.

TylerHelmuth added priority:p2 Medium pkg/ottl labels Nov 29, 2022

TylerHelmuth mentioned this issue Nov 29, 2022

[processor/transform] Improve error handling #16524

Closed

TylerHelmuth added the processor/transform Transform processor label Dec 7, 2022

TylerHelmuth mentioned this issue Dec 15, 2022

[pkg/ottl] support substring function #17036

Merged

swiatekm mentioned this issue Jan 10, 2023

feat(logs): parse JSON logs SumoLogic/sumologic-kubernetes-collection#2773

Merged

This was referenced Jan 30, 2023

[pkg/ottl] Add ErrorMode configuration option #18187

Merged

[pkg/ottl] Add StringGetter #18042

Merged

[pkg/ottl] Add NewStatements to each context #18385

Merged

This was referenced Feb 16, 2023

[pkg/ottl] Switch ErrorMode to string #18692

Merged

[processor/transform] Add error mode to transformprocessor #18693

Merged

[pkg/ottl] Improve Statement execution error message #18733

Merged

[pkg/ottl] Update type getters to error on nil #18735

Merged

This was referenced Feb 28, 2023

[pkg/ottl] Update functions to use StringGetter #19137

Merged

[processor/filter] Add error_mode configuration option #19142

Merged

[processor/routing] Add error_mode configuration option #19147

Merged

TylerHelmuth mentioned this issue Mar 13, 2023

[processor/transform] Improve function errror handling #12649

Closed

This was referenced Mar 18, 2023

[pkg/ottl] Add PMapGetter for getting pdata maps #19781

Merged

[pkg/ottl] Add StringLikeGetter for turning types into string #19782

Merged

[pkg/ottl] Allow StringGetter in slices #19783

Merged

TylerHelmuth closed this as completed Mar 22, 2023

evan-bradley mentioned this issue Mar 22, 2023

[pkg/telemetryquerylanguage] Concat factory should log when passed unprintable values #13390

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/ottl] Improve Errors from Functions #16519

[pkg/ottl] Improve Errors from Functions #16519

TylerHelmuth commented Nov 29, 2022 •

edited

Loading

TylerHelmuth commented Nov 29, 2022

evan-bradley commented Nov 29, 2022

TylerHelmuth commented Nov 29, 2022

TylerHelmuth commented Dec 7, 2022

evan-bradley commented Dec 8, 2022

evan-bradley commented Dec 9, 2022

TylerHelmuth commented Jan 26, 2023

TylerHelmuth commented Jan 26, 2023

swiatekm commented Jan 27, 2023 •

edited

Loading

TylerHelmuth commented Jan 27, 2023

TylerHelmuth commented Jan 27, 2023 •

edited

Loading

TylerHelmuth commented Mar 22, 2023

[pkg/ottl] Improve Errors from Functions #16519

[pkg/ottl] Improve Errors from Functions #16519

Comments

TylerHelmuth commented Nov 29, 2022 • edited Loading

TylerHelmuth commented Nov 29, 2022

evan-bradley commented Nov 29, 2022

TylerHelmuth commented Nov 29, 2022

TylerHelmuth commented Dec 7, 2022

evan-bradley commented Dec 8, 2022

evan-bradley commented Dec 9, 2022

TylerHelmuth commented Jan 26, 2023

Transform Processor Error Handling

OTTL Error Handling

TylerHelmuth commented Jan 26, 2023

swiatekm commented Jan 27, 2023 • edited Loading

TylerHelmuth commented Jan 27, 2023

TylerHelmuth commented Jan 27, 2023 • edited Loading

TylerHelmuth commented Mar 22, 2023

TylerHelmuth commented Nov 29, 2022 •

edited

Loading

swiatekm commented Jan 27, 2023 •

edited

Loading

TylerHelmuth commented Jan 27, 2023 •

edited

Loading