-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(exporter-jaeger-thrift): Exception in forked environments #2837
Comments
BatchSpanExporter
in forked environmentsBatchSpanProcessor
in forked environments
I believe the problem comes from the fact that, even if the I believe the solution here would be the processor receiving a callable that instantiates the exporter, so that it can create a new exporter after a fork. Currently, it receives a ready exporter instance that can't be easily recreated after the fork by the One specific instance, which is what causes the above exception, is the |
Or an exporter and any other component in the pipeline implements the necessary hooks itself |
I personally wouldn't implement the solution that way, because of the separation of concerns principle. We only need to make the processor fork-ready because it uses threads. A fork only forks the thread that calls the The fact that the batch processor uses threads is completely out of the exporter's league. The exporter is not worried about which processor is being used, it just gets the traces from any processor and composes the message to be sent over the wire, and sends it. So, in my opinion, implementing concepts related to fork-treatment in exporters is, from an architecture point of view, a mistake. I still believe that the correct treatment needs to reside entirely in the processor itself, which is where the existence of threads in this scenery comes from, and that the solution is to teach the processor how to instantiate the exporter so it can do that when it needs to: after it notices a fork. But I'll leave the judgement call to whoever decides to tackle the problem. [1] Barring some options, but in most systems that's the default, and in most cases that's what happens. |
What do you mean it only forks the thread that calls the fork?
If I understand you correctly, the UDP client used by the specific exporter will end up in some inconsistent state when the fork happens. The exception will be raised regardless of the processor? |
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html Emphasis mine. I mean that.
Yes, but it is not the concern of the exporter to work correctly in forked environments, because it is not expecting that and it is not forking anything. The concern belongs to the processor to make everything it needs work in the forked environment it is creating. |
Just want make sure we are on same page, is there a misunderstanding that batch processor is forking? No, it doesn't fork but it's implementation is made fork-aware to make it work in programs which use fork. |
My understanding of the issue so far is, that |
I don't remember saying the processor was forking. And it seems like GitHub doesn't either. That said, I did say the processor should be the one worried about making things work in a forked environment, and I said that because it is the processor creating the thing that causes the trouble, which is the thread that uses the exporter. If that thread didn't exist, this problem also probably wouldn't exist. For example, if the exporter was to be used in the main thread in the process, there wouldn't be a fork at the wrong time (by definition, since the exporter itself doesn't fork). There are plenty of code all over the place that doesn't work in forked environments. That is okay. In fact, most code does not work in forked environments. You can't pick any piece of software, insert a
That is very much not the case. If there is not a separate thread, there won't be a fork in the middle of the execution of the exporter, because the exporter itself doesn't fork, and so the exporter will work just fine in both the parent and the child processes. The problem only happens if the processor itself creates separate threads that operate the exporter, which is the case with the But let's say that there's a process using the exporter, and it calls something that begins a thrift struct, and forks before calling the function that ends the struct. Well, that is misuse of the exporter, and it's not the exporter's job to fix itself. It is the job of the process to make sure that, if it needs to do that, it reinstatiates or otherwise fixes the exporter after the fork in the child process. Or maybe the process just wants to let the child run and end up with duplicate messages. 🤷♂️ I don't know. Either way, my original point stands: the problem stems from the existence of a separate thread using the exporter, and that thread comes from the |
This is where we have differences of opinion. I believe either the exporter fixes and makes itself work or documents if doesn't. It shouldn't be the telemetry processor's (any component's) concern to make sure how the exporter (or another component in the trace pipeline) works in different environments. That's what I think, I will let other members chime in and share what have to say about this. |
Any update on this, I have the same problem. I got the exporter exception when the application starts and from time to time. My application is using forks. |
Currently, the
What "causes trouble" is when an exporter is used in conjunction that does NOT support being in a forked environment. If an exporter is designed so that it could be used in a forked environment, or if it does not even hold any state (simply does exporting as you've mentioned above), this "troubled state" would not happen either. It is not solely due to the thread existing in the batch span processor. This seems to me (as @srikanthccv pointed out) as a responsibility of the exporter. I do not believe it is the responsibility of the span processor to make sure that it's components (the exporter) supports being in a forked environment, just that itself is. With that being said, to say that "my OpenTelemetry telemetry pipeline works in a forked environment" is a bold statement that has not been promised. This probably would extend to other custom components as well (if they ever have workers), not just the Any other opinions? |
Is this issue specific to the Jaeger exporter? If so, could we update the title to clarity that? (Even if this only affects that exporter under the |
My case, it is in otel exporter and still have the problem with latest version. |
@falsedlah please could you clarify which exporter class you're using? (There are lots of exporters provided as part of Open Telemetry) |
What is the exception? You should minimum share the stack trace. |
In our case, we are using opentelemetry-exporter-jaeger-thrift==1.10.0 code snippet in
We ran into the exact same exception as reported occasionally This forced us to set the sampling rate VERY low at "0.0001" to prevent the exception from re-occuring. While we still get the spans, it's not very helpful due to the low sampling rate. We hope to get some thoughts on the implementation above. Perhaps, we are not supposed to run the trace with multi threads this way? |
@kayleejacques, you confirmed the point I made in the thread above, i.e. it doesn't matter whether the used span processor is
|
@srikanthccv much appreciate your speedy response. I will consult with the team to switch to |
@srikanthccv if the issue is not with BatchSpanProcessor, would you mind updating the issue title to reflect the issue better? |
BatchSpanProcessor
in forked environments
A fix could have been made to make the exporter fork-aware. The Jaeger exporters have been removed from the baseline. |
Describe your environment
OTel packages v1.11.1 and v0.30b1.
Python 3.8.11 but Python version should be irrelevant to this problem.
Bug spotted on MacOS, but should happen on all systems.
Steps to reproduce
This is really hard to reproduce because it depends heavily on timing. It's probably a race condition among threads.
What is the expected behavior?
Not see the exception.
What is the actual behavior?
This is the exception:
Additional context
None.
The text was updated successfully, but these errors were encountered: