-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors are being raised in API and SDK #2148
Comments
Given the dynamic nature of python, I think we need to clearly define what is considered as a incorrect use by users (and make sure the overhead introduced by this should be kept low and insignificant). I personally don't think the example highlighted in the issue description (and other cases along similar lines) is something we should bother about and handle gracefully. |
The example highlighted above is significant, it can cause the application code to crash. As stated here: We assume that users would prefer to lose telemetry data rather than have the library significantly change the behavior of the instrumented application. |
I agree with @lonewolf3739. The way I understand this, we need to guard against unexpected failures crashing services. Things like network and I/O operations failing. We should defensively program such operations to ensure any failures arising from them do not bubble up and crash a service. I don't think it means we make it technically impossible for an exception to be raised especially from obvious wrong usage. I don't think we should protect against users passing wrong number of arguments to a function or trying to call an attribute as if it was a function. IMO wrong usage is something like a user passing a span to a function that expect a span context or passing an incompatible callback pre/post request to let's say the Flask instrumentation. We need to guard against any exceptions raised by such usage. In all such cases, IMO, the functions themselves should be coded defensively to guard against such cases. |
We can make our guards less guarding, we can make it possible to protect against some errors and not against some others. But why? The prototype in #2152 already guards against bad arguments being passed and any error happening in the inside of functions. The intention of OpenTelemetry is clearly stated here: We assume that users would prefer to lose telemetry data rather than have the library significantly change the behavior of the instrumented application. I say we guard against all errors we can possibly guard. |
Hi all! For clarity, never throwing exceptions or returning errors is a strict requirement for the OpenTelemetry API. Errors should definitely be piped to some form of diagnostic output so that developers can debug their code. But they should never have to guard an OpenTelemetry API call. There are two reasons for this:
In the case of bad parameters, methods that return an object should still return something functional. They should just use default parameters instead of the missing/malformed input. The default parameters can still indicate an error in some way - naming the span "UNNAMED SPAN" for example. Hope that makes sense. How to handle missing/bad parameters should be clarified more in the spec. It would be best if all implementations did the same thing. We also need to clarify how diagnostic data works in general - where should it be sent, what should it include, etc. My plan was to form a working group on diagnostics once we're finished with the current metrics and instrumentation work. But if others want to bottom line that work, by all means. :) |
I have been researching how to implement this mechanism in Python. For the time being, let's forget about the discussion regarding if we should or should not catch exceptions caused by wrong parameters being passed to methods or classes, let's consider only a mechanism that deals with exceptions raised by the code "inside" a method or function. So far I don't have any proof that it is impossible to implement, but I am starting to notice that several restrictions will probably have to be applied in the implementation and design of our project. I'll be listing the issues I find here: First, we need to define what our API is. I am defining it like so:
Second, we need to define what this mechanism does, I understand this mechanism as something that does the following when an exception is raised in a method:
This mechanism is being implemented in the @safety("predefined_string")
def function(a, b, c) -> str:
# do something that may raise an exception Pretty much the same happens for class methods: @safety("predefined_string")
def function(a, b, c) -> str:
# do something that may raise an exception So, if the code in I am trying to apply this mechanism to our code to find out what issues arise: Unsafe parent classesConsider this scenario:
Calling |
This problem happened to me while trying to make We do have code in the API package that is not part of the spec. I understand the reason why it is there, I believe this code is there because it is being used in several places around the API and SDK, it is utilitarian code. In order to not duplicate it, we added it into the API package because the API package is the root dependency, so it would be available for every other component of our project. I now think that was not the right approach. I am not sure where this code should be, I am now inclined to think this code should be in the SDK and each SDK should have its own utilitarian library. Code may end up being repeated, yes, but it is one for the other and safety is a specification requirement, code economy is not. The central design principle here is that every call that the user makes is done by calling an API function/method. Those functions/methods are safe. Those functions/methods pass the arguments to the SDK. If some exception is raised in the SDK code, it will be caught by the API safety mechanism. When an SDK function/method returns an OpenTelemetry defined object an SDK object is not returned, an API object is returned. Why? Because it is the API objects the ones who have the safety mechanism in their methods, so if a user is to call one of these methods, it must be an API one. The API object method then redirects the call to an underlying SDK object. The user only knows API proxies. |
Our most basic example is this one:
If we remove the argument
"foo"
in this line:then running this example fails with this error:
This should not happen, the spec states here:
API methods MUST NOT throw unhandled exceptions when used incorrectly by end users.
Our SDK methods are also unsafe, we even raise exceptions directly. Even if we did not raise an exception there, and instead we logged an error we must end up returning a No Op object, as required by the spec (which we don't do):
Whenever API call returns values that is expected to be non-null value - in case of error in processing logic - SDK MUST return a "no-op" or any other "default" object that was (ideally) pre-allocated and readily available. This way API call sites will not crash on attempts to access methods and properties of a null objects.
That means that just wrapping the code of every method of every class or every function in a
try
except
and logging an error is not enough. We need a way to define these NoOps and return them when necessary.The text was updated successfully, but these errors were encountered: