-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] Improve Errors from Functions #16519
Comments
Continuing conversation from thread:
That's fair, being unable to parse a string as json is essentially the same as trying to parse a number. Maybe a better example would be something like an error returned from a Setter/Getter, which would indicate an issue interacting with the underlying telemetry. It's also possible that errors are just errors and the user should handle them all the same. I think for the transform processor it could be made to be less aggressive and instead of giving up on all telemetry transformation when receiving an error just give up on the current tCtx and move on to the next. |
I think this should probably be the default behavior for the transform processor. The one consideration is that if transformation has important implications later on such as in routing or in rules in the backend, users should be made aware that they will need to perform additional validation, or we should allow configuring what happens if an error occurs. |
@evan-bradley thats a good point. I think the transformprocessor can be made to continue on err by default, but can be configurable if the user wants something stricter. |
Even if we continue to transform data and accrue errors along the way, when the processor returns those errors it will still stop the data from being passed to the next consumer. This means all that extra processing is a waste. Instead, we can ensure any transform processor errors are logged before we return. |
In my opinion, the transform processor should by default continue processing in the case of an error, but offer disabling this behavior for a group of statements, possibly with a processor-level override. I think it will likely depend on exact user needs, but in general I think we should avoid dropping entire telemetry items. This would likely mean that by default the transform processor rarely returns a processing error since most errors are type errors. |
It looks like Stanza also supports configuring error behavior. This is configurable at the Stanza operator level, which I think would roughly translate to an OTTL statement, but I think it would be okay to allow configuring it in the OTTL for a group of statements. All Stanza operators default to allow processing to continue in the case of an error according to the docs. |
@evan-bradley @bogdandrutu @kentquirk I've been thinking about this issue again as I think it is a source of inconsistency within our functions. Transform Processor Error HandlingI agree with @evan-bradley that the transformprocessor/ottl should support configuring error handling behavior. Since a processor returning an error means that the collector will drop all the data in that batch, it is quite severe if the transformprocessor experiences an error from a function today. I think we should allow users to decide if they want the transformprocessor to:
All three of these error behaviors would be changes to how the transformprocessor handles errors returned by I think that we need to allow configuring error behavior between the Condition and the Invocation differently. OTTL Error HandlingI think the larger question, and the one that is causing inconsistencies, is how OTTL functions produce errors during hot path execution. In my mind there are 2 core scenarios for us to discuss: type assertion and everything else. I think the "everything else" scenario is pretty easy: they should be returned as errors. I feel type assertion is where most of the inconsistencies arise. Historically, OTTL functions have been written to be error-adverse concerning type assertion. They prefer to handle unwanted types or This has become a very important capability within Conditions where the lack of a desired attribute or incorrect type of a desire attribute means that the TransformContext can continue to be processed despite an individual statement not being run. With the current handling, a missing attribute or incorrect type means that the Condition evaluates to Functions themselves are a very difficult place to make error handling discussion, though. They don't know if they are being used in an Invocation, Condition, or as a parameter to another function. But with our current I think we need to come to a consensus about how OTTL is going to handle Work in progress that may help:
|
@swiatekm-sumo @sumo-drosiek @astencel-sumo based on the PR you linked I am curious to hear your opinion as well. |
I don't feel informed enough to speak to OTTL internals. My perspective as a user of processors which incorporate it:
In particular, a function like |
@swiatekm-sumo thank you for your perspective, it is very useful. |
I think we will be able to take advantage of the Parser's new Options API and it's An error handling system like this embedded in OTTL itself would then allow more consistent function requirements. Functions could more safely return errors knowing that the Parser will have the opportunity to handle them before a component does. We could make more definitive statements like "Functions that expect a string but receive an int should return an error". |
Adds a new concept to Getters that allow specifying type. These type-specific Getters can be used by functions that require a specific type instead of using the normal Getter that always returns an interface. This PR implements a StringGetter, but we can add Getter's for int64, float64, bool, pcommon.Map, etc in the future. With these Getters functions and rely on OTTL for type assertion and with the ongoing error handling discussion these will help standardize how functions handle and return errors. Related to #16519
Updates the type getters to be stricter, erroring on nil instead of returning nil. While working on updating ConvertCase to using a StringGetter I realized that by allowing nil we had not committed to the strictness with typed getters that we should've. The goal of the typed getters and new error mode is to allow functions to be strict with errors without breaking future statements (IgnoreError mode). By allowing nil we put ourselves in a strange situation where ConvertCase(10, "upper") would return an error but ConvertCase(nil, "upper") wouldn't. When used in combination with other functions, our outcome would be inconsistent. If used with an int, set(attributes["test"], ConvertCase(10, "upper")) would cause an error, resulting in the "test" attribute not being set. If used with nil, set(attributes["test"], ConvertCase(nil, "upper")) would not error and would set "test" to nil. But this is inconsistent. nil is not a string and so we can't convert but we don't error and we end up mutating the telemetry. If a real life scenario where you might be converting an attribute that isn't present what should happen. Should set(attributes["an attribute that doesn't exist"], ConvertCase(attributes["an attribute that doesn't exist"], "upper")) actually set the attribute "an attribute that doesn't exist" to nil or should it error? I'm proposing it is more proper to error and not transmute the telemetry. With error_mode=ignore we can still move on to the next statement. When used in a condition it feels inappropriate to allow ConvertCase to return nil, which is what happens if we don't strictly error on nil. If we return nil then we end up doing an invalid comparison in my mind as we have a value we shouldn't have. I propose it is better to let the condition error, evaluate to false, and move on to the next statement. I believe this sentiment is applicable to all Converter functions. Converter functions should reliably returned a changed value as that is their strict intention. Link to tracking Issue: Related to #16519
With the introduction of |
@bogdandrutu I think OTTL needs a larger discussion about errors since most functions were made before errors could be returned.
One argument is that functions should always return errors when something undesirable occurs, like the wrong type being returned by a Getter. This allows the components that use OTTL to decide what to do with the error.
Another argument is that functions should ignore situations where Getters return the incorrect type and other "lesser" errors. The argument being these aren't problems that should stop processing and the caller doesn't need to worry about them except for a debug log (Debug logging is spotty but the concept is possible).
The OTTl functions are inconsistent right now as the package has grown, but I'd say most follow the second pattern since they were created before errors could be returned.
My personal opinion is that OTTL functions should be returning errors in all undesirable situations. It feels best if the caller handles the error since OTTL doesn't know what's using it and shouldn't make assumptions. I would like to enhance the OTTL to be able to return errors with different levels of severity. I don't know if that's against Go idiosyncrasies, but I like the concept of being able to differentiate between an error caused by a Getter returning the wrong type and something truest bad like trying to parse a string as JSON that isn't JSON.
I will open an issue tomorrow to track this work and continue the discussion.
I will also update this function tomorrow to return an error. At the moment the implication of this is that in the TransformProcessor if you pass a non-string to this function then all processing for the current batch will terminate and the data and error will be returned from the consume function.
Originally posted by @TylerHelmuth in #16444 (comment)
The text was updated successfully, but these errors were encountered: