Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Improve Errors from Functions #16519

Closed
TylerHelmuth opened this issue Nov 29, 2022 · 12 comments
Closed

[pkg/ottl] Improve Errors from Functions #16519

TylerHelmuth opened this issue Nov 29, 2022 · 12 comments
Labels

Comments

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Nov 29, 2022

@bogdandrutu I think OTTL needs a larger discussion about errors since most functions were made before errors could be returned.

One argument is that functions should always return errors when something undesirable occurs, like the wrong type being returned by a Getter. This allows the components that use OTTL to decide what to do with the error.

Another argument is that functions should ignore situations where Getters return the incorrect type and other "lesser" errors. The argument being these aren't problems that should stop processing and the caller doesn't need to worry about them except for a debug log (Debug logging is spotty but the concept is possible).

The OTTl functions are inconsistent right now as the package has grown, but I'd say most follow the second pattern since they were created before errors could be returned.

My personal opinion is that OTTL functions should be returning errors in all undesirable situations. It feels best if the caller handles the error since OTTL doesn't know what's using it and shouldn't make assumptions. I would like to enhance the OTTL to be able to return errors with different levels of severity. I don't know if that's against Go idiosyncrasies, but I like the concept of being able to differentiate between an error caused by a Getter returning the wrong type and something truest bad like trying to parse a string as JSON that isn't JSON.

I will open an issue tomorrow to track this work and continue the discussion.

I will also update this function tomorrow to return an error. At the moment the implication of this is that in the TransformProcessor if you pass a non-string to this function then all processing for the current batch will terminate and the data and error will be returned from the consume function.

Originally posted by @TylerHelmuth in #16444 (comment)

@TylerHelmuth
Copy link
Member Author

Continuing conversation from thread:

Not sure I buy this, let's say someone writes ParseJSON("123") is an "error" but ParseJSON(123) is not?

That's fair, being unable to parse a string as json is essentially the same as trying to parse a number. Maybe a better example would be something like an error returned from a Setter/Getter, which would indicate an issue interacting with the underlying telemetry.

It's also possible that errors are just errors and the user should handle them all the same. I think for the transform processor it could be made to be less aggressive and instead of giving up on all telemetry transformation when receiving an error just give up on the current tCtx and move on to the next.

@evan-bradley
Copy link
Contributor

I think for the transform processor it could be made to be less aggressive and instead of giving up on all telemetry transformation when receiving an error just give up on the current tCtx and move on to the next.

I think this should probably be the default behavior for the transform processor. The one consideration is that if transformation has important implications later on such as in routing or in rules in the backend, users should be made aware that they will need to perform additional validation, or we should allow configuring what happens if an error occurs.

@TylerHelmuth
Copy link
Member Author

@evan-bradley thats a good point. I think the transformprocessor can be made to continue on err by default, but can be configurable if the user wants something stricter.

@TylerHelmuth
Copy link
Member Author

Even if we continue to transform data and accrue errors along the way, when the processor returns those errors it will still stop the data from being passed to the next consumer. This means all that extra processing is a waste.

Instead, we can ensure any transform processor errors are logged before we return.

@TylerHelmuth TylerHelmuth added the processor/transform Transform processor label Dec 7, 2022
@evan-bradley
Copy link
Contributor

In my opinion, the transform processor should by default continue processing in the case of an error, but offer disabling this behavior for a group of statements, possibly with a processor-level override. I think it will likely depend on exact user needs, but in general I think we should avoid dropping entire telemetry items. This would likely mean that by default the transform processor rarely returns a processing error since most errors are type errors.

@evan-bradley
Copy link
Contributor

It looks like Stanza also supports configuring error behavior. This is configurable at the Stanza operator level, which I think would roughly translate to an OTTL statement, but I think it would be okay to allow configuring it in the OTTL for a group of statements. All Stanza operators default to allow processing to continue in the case of an error according to the docs.

@TylerHelmuth
Copy link
Member Author

@evan-bradley @bogdandrutu @kentquirk I've been thinking about this issue again as I think it is a source of inconsistency within our functions.

Transform Processor Error Handling

I agree with @evan-bradley that the transformprocessor/ottl should support configuring error handling behavior. Since a processor returning an error means that the collector will drop all the data in that batch, it is quite severe if the transformprocessor experiences an error from a function today. I think we should allow users to decide if they want the transformprocessor to:

  • Halt execution of statements on any error and return the error.
  • Halt execution of the statements against the current TransformContext, log the error (probably as Debug), and continue to the next TransformContext without returning the error.
  • log the error (probably as Debug), and continue to the next statement for the current TransformContext without returning the error.

All three of these error behaviors would be changes to how the transformprocessor handles errors returned by statement.Execute or context.Consume[Traces|Metrics|Logs and require no changes to OTTL.

I think that we need to allow configuring error behavior between the Condition and the Invocation differently.

OTTL Error Handling

I think the larger question, and the one that is causing inconsistencies, is how OTTL functions produce errors during hot path execution. In my mind there are 2 core scenarios for us to discuss: type assertion and everything else. I think the "everything else" scenario is pretty easy: they should be returned as errors. I feel type assertion is where most of the inconsistencies arise.

Historically, OTTL functions have been written to be error-adverse concerning type assertion. They prefer to handle unwanted types or nil by exiting the function without. Since OTTL was built with the transform processor in mind, this allowed processing to continue even when an attribute didn't exist or was the wrong type.

This has become a very important capability within Conditions where the lack of a desired attribute or incorrect type of a desire attribute means that the TransformContext can continue to be processed despite an individual statement not being run. With the current handling, a missing attribute or incorrect type means that the Condition evaluates to false instead of error.

Functions themselves are a very difficult place to make error handling discussion, though. They don't know if they are being used in an Invocation, Condition, or as a parameter to another function. But with our current Getter interface, functions end up having to make decisions about nil and type assertions, so we've forced the burden of error handling upon them.

I think we need to come to a consensus about how OTTL is going to handle nil and type assertion. If a function uses a Getter and the returned type is either not the exact expected type or not a type the function knows how to handle what should the function do? As part of that decision I think we need to decide how OTTL the framework will enforce it.

Work in progress that may help:

  • Replace function interfaces with a factory #14712 (comment). This issue will end up giving our functions more structure.
  • [pkg/ottl] Add StringGetter #18042. An idea of how to enforce whatever decision we make around type assertion. By moving the assertion logic out of the function and into a specific Getter we can help standardize how functions get the types they expect. The current Getter interface would always stick around to handle hyper-specific types or scenarios where more than 1 type is allowed.

@TylerHelmuth
Copy link
Member Author

@swiatekm-sumo @sumo-drosiek @astencel-sumo based on the PR you linked I am curious to hear your opinion as well.

@swiatekm
Copy link
Contributor

swiatekm commented Jan 27, 2023

I don't feel informed enough to speak to OTTL internals. My perspective as a user of processors which incorporate it:

  • We should not drop data or return errors the rest of the pipeline if we can avoid it or the user explicitly configured it to be so. This is the nuclear option.
  • By default, we should fail open whenever we can. This includes ignoring errors on processing functions and assuming errored conditions to have returned false.
  • In terms of configurability, @TylerHelmuth your proposal for transformprocessor looks reasonable. For conditions, being able to switch them to return true on error is probably a feature someone will ask for eventually, but it sounds a bit niche.

In particular, a function like ParseJSON is basically unusuable without being able to ignore parse errors.

@TylerHelmuth
Copy link
Member Author

@swiatekm-sumo thank you for your perspective, it is very useful.

@TylerHelmuth
Copy link
Member Author

TylerHelmuth commented Jan 27, 2023

By default, we should fail open whenever we can. This includes ignoring errors on processing functions and assuming errored conditions to have returned false.

I think we will be able to take advantage of the Parser's new Options API and it's Execute function to allow users to specify how it should handle errors. If they want to continue processing, the Execute function could log the error, and then continue. If they want to consume the error themselves, it could return the error like it does today.

An error handling system like this embedded in OTTL itself would then allow more consistent function requirements. Functions could more safely return errors knowing that the Parser will have the opportunity to handle them before a component does. We could make more definitive statements like "Functions that expect a string but receive an int should return an error".

codeboten pushed a commit that referenced this issue Feb 10, 2023
Adds a new concept to Getters that allow specifying type. These type-specific Getters can be used by functions that require a specific type instead of using the normal Getter that always returns an interface. This PR implements a StringGetter, but we can add Getter's for int64, float64, bool, pcommon.Map, etc in the future.

With these Getters functions and rely on OTTL for type assertion and with the ongoing error handling discussion these will help standardize how functions handle and return errors.

Related to #16519
codeboten pushed a commit that referenced this issue Feb 17, 2023
Updates the type getters to be stricter, erroring on nil instead of returning nil.

While working on updating ConvertCase to using a StringGetter I realized that by allowing nil we had not committed to the strictness with typed getters that we should've.

The goal of the typed getters and new error mode is to allow functions to be strict with errors without breaking future statements (IgnoreError mode). By allowing nil we put ourselves in a strange situation where ConvertCase(10, "upper") would return an error but ConvertCase(nil, "upper") wouldn't. When used in combination with other functions, our outcome would be inconsistent.

If used with an int, set(attributes["test"], ConvertCase(10, "upper")) would cause an error, resulting in the "test" attribute not being set. If used with nil, set(attributes["test"], ConvertCase(nil, "upper")) would not error and would set "test" to nil. But this is inconsistent. nil is not a string and so we can't convert but we don't error and we end up mutating the telemetry.

If a real life scenario where you might be converting an attribute that isn't present what should happen. Should set(attributes["an attribute that doesn't exist"], ConvertCase(attributes["an attribute that doesn't exist"], "upper")) actually set the attribute "an attribute that doesn't exist" to nil or should it error? I'm proposing it is more proper to error and not transmute the telemetry. With error_mode=ignore we can still move on to the next statement.

When used in a condition it feels inappropriate to allow ConvertCase to return nil, which is what happens if we don't strictly error on nil. If we return nil then we end up doing an invalid comparison in my mind as we have a value we shouldn't have. I propose it is better to let the condition error, evaluate to false, and move on to the next statement.

I believe this sentiment is applicable to all Converter functions. Converter functions should reliably returned a changed value as that is their strict intention.

Link to tracking Issue:

Related to #16519
@TylerHelmuth
Copy link
Member Author

With the introduction of ErrorMode and the typed-getters I believe this issue is complete. More typed-getters will need added in the future as more functions get added, but the framework and pattern is in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants