Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opencensus-shim / api: GlobalOpenTelemetry.get() should not call set() #4180

Closed
gfelbing opened this issue Feb 17, 2022 · 14 comments
Closed
Labels
Bug Something isn't working

Comments

@gfelbing
Copy link

Describe the bug

We are currently migrating our microservices from OpenCensus to OpenTelemetry, using opencensus-shim to preserve traces from gcloud/grpc.
On startup, we are running into

Exception in thread "main" java.lang.IllegalStateException: GlobalOpenTelemetry.set has already been called. GlobalOpenTelemetry.set must be called only once before any calls to GlobalOpenTelemetry.get. If you are using the OpenTelemetrySdk, use OpenTelemetrySdkBuilder.buildAndRegisterGlobal instead. Previous invocation set to cause of this exception.

Looking at the stacktrace, this happens because we are trying to access the gcloud project id before setting up the global opentelemetry using ServiceOptions from google-cloud-core:

Caused by: java.lang.Throwable
    at io.opentelemetry.api.GlobalOpenTelemetry.set(GlobalOpenTelemetry.java:97)
    at io.opentelemetry.api.GlobalOpenTelemetry.get(GlobalOpenTelemetry.java:67)
    at io.opentelemetry.api.GlobalOpenTelemetry.getTracer(GlobalOpenTelemetry.java:116)
    at io.opentelemetry.opencensusshim.OpenTelemetrySpanBuilderImpl.<clinit>(OpenTelemetrySpanBuilderImpl.java:52)
    at io.opentelemetry.opencensusshim.OpenTelemetryTracerImpl.spanBuilderWithExplicitParent(OpenTelemetryTracerImpl.java:42)
    at io.opencensus.trace.Tracer.spanBuilder(Tracer.java:308)
    at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:865)
    at com.google.cloud.ServiceOptions.getAppEngineProjectIdFromMetadataServer(ServiceOptions.java:503)
    at com.google.cloud.ServiceOptions.getAppEngineProjectId(ServiceOptions.java:472)
    at com.google.cloud.ServiceOptions.getDefaultProjectId(ServiceOptions.java:379)
    [...application stacktrace part...]

Steps to reproduce

Our specific case:

  • Call ServiceOptions.getDefaultProjectId() when running in google cloud

General case:

  • Call GlobalOpenTelemetry.get / GlobalOpenTelemetry.getTracer before buildAndRegisterGlobal

Latter one is important as this can happen when using libraries that instrument OpenCensus, accessing the tracer e.g. in static fields.

What did you expect to see?

GlobalOpenTelemetry.get should not set the GlobalOpenTelemetry.noop() in the fallback case but just return it.
(→ remove line GlobalOpenTelemetry.java:67)

What did you see instead?

GlobalOpenTelemetry.get has a side effect of setting the noop telemetry causing exception above.

What version and what artifacts are you using?
Artifacts: opentelemetry-api, opentelemetry-sdk, opentelemetry-opencensus-shim
Version: 1.10.1
How did you reference these artifacts? (excerpt from your build.gradle, pom.xml, etc)

In gradle subproject, used by all services:

    api('io.opentelemetry:opentelemetry-api:1.10.1')
    api("io.opentelemetry:opentelemetry-opencensus-shim:1.10.1-alpha")
    implementation("io.opentelemetry:opentelemetry-sdk:1.10.1")

Environment
Compiler: OpenJDK 11
OS: docker image gcr.io/distroless/java

Additional context
...

@jack-berg
Copy link
Member

GlobalOpenTelemetry.get has a side effect of setting the noop telemetry causing exception above.

If it didn't have this side effect then an application might be started with libraries containing a mix of noop and properly configured OpenTelemetry instances, depending on the order in which they called OpenTelemetry.

this happens because we are trying to access the gcloud project id before setting up the global opentelemetry using ServiceOptions from google-cloud-core

Not familiar with google-cloud-core, but I assume that you're trying to access the gcloud project id before setting up GlobalOpenTelemetry because the gcloud project id is used in setting up GlobalOpenTelemetry in some capacity? If its not, you should setup GlobalOpenTelemetry before that call.

@jsuereth any input?

@anuraaga
Copy link
Contributor

Unfortunately opencensus automatically initializes on first call and has no way of controlling initialization order - this sort of issue is why in OTel we discourage the global.

I agree that our current behavior is better for non-shim users. Because the shim is a transitional mechanism perhaps it's ok for this sort of case to rely on resetForTest, the name is mostly to prevent confusing app behavior but when it is known to be used in a "safe" way then it's still ok. Hopefully it would be removed after code migrates to OTel.

@gfelbing
Copy link
Author

Thanks @anuraaga, I actually had the workaround with resetForTest already in place but it just felt so wrong.
The thing is, that this will stay there until all dependencies are migrated away from OpenCensus, which can take a while at least for big ones like gcloud or grpc libraries.

@anuraaga
Copy link
Contributor

What if we add a class to the opencensus-shim, OpenCensusShim.resetOpenTelemetry(), which just calls resetForTest? It will also nicely allow the method to be naturally removed when finally removing the shim.

By the way, we have gRPC instrumentation already

https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/grpc-1.6

Not sure about plans for gcloud sdk instrumentation, I'm surprised not to see anything like that in DataDog (the original code that our instrumentation derives from)

https://github.com/DataDog/dd-trace-java/tree/master/dd-java-agent/instrumentation

Perhaps the gRPC instrumentation works well enough for gcloud SDK? @jsuereth may be able to give more thoughts on the instrumentation story for that too.

@barend
Copy link

barend commented Mar 11, 2022

I run into this same problem.

Conditions similar/same as what gfelbing describes: attempting to use opentelemetry-opencensus-shim to get tracing from the BigTable client library and inadvertently registering the global noop before my intended (programmatic) OpenTelemetry configuration is enacted.

Between OpenTelemetry, OpenCensus and the various Google SDK components, the amount of stuff happening during <clinit> makes it difficult to initialise everything in the right order (I also understand the need to avoid locks on the main execution path). → I think it would be a helpful diagnostic aid to have GlobalOpenTelemetry.set() log a stack trace, toggled by a JVM system property, so that you can see who called it without attaching a debugger.

@anuraaga
Copy link
Contributor

@anuraaga
Copy link
Contributor

anuraaga commented Mar 14, 2022

@punya Do you happen to know anyone that would be able to provide information on the plans for OpenTelemetry integration for gcloud SDK Java users?

For the record, I tried writing some library instrumentation for google-http-client-java but wasn't able to since it doesn't present IOExceptions in a way that instrumentation would be able to access, while handling them natively with OpenCensus inline. Anyways I'd generally expect the future instrumentation story to be at the gcloud SDK level itself rather than its transport implementations.

@punya
Copy link
Member

punya commented Mar 14, 2022

Thanks for reporting this issue and digging into the details.

I'm not aware of specific plans at the Google Cloud SDK level. My rough understanding is:

  • Google Cloud SDK usage is viewed as an instance of gRPC usage
  • gRPC is currently instrumented using OpenCensus
  • They'll prioritize adopting/supporting OpenTelemetry when metrics are stable (in addition to traces)

If there's anything that OpenCensus can do to ease this transition (without breaking existing users), please let us know -- patches or suggestions of this type are very welcome.

@barend
Copy link

barend commented Mar 15, 2022

Hi @barend - we do already have the stack trace recording, is it not working for you?

https://github.com/open-telemetry/opentelemetry-java/blob/main/api/all/src/main/java/io/opentelemetry/api/GlobalOpenTelemetry.java#L105

Hi @anuraaga, I managed to completely overlook that 🤦‍♂️ , thank you for pointing it out to me.

@barend
Copy link

barend commented Mar 15, 2022

@punya I've only scratched the surface of the Google Cloud SDK. What I didn't recognise last week is that the GCP SDK telemetry uses a layer of indirection.

I can't seem to recover the original page of documentation where I found it, but I remember that I configured my BigTable client to use OpenCensusTracerFactory based on a page of GCP documentation. I followed those instructions without digging any deeper.

Having now dug a bit deeper, this class implements ApiTracerFactory which is defined entirely without reference to any particular tracing tech (as the whole abstract factory pattern indeed implies).

It looks like the simplest way around problems on the interaction between the GCP SDKs, OpenCensus and the OpenTelemetry-OpenCensus-Shim is to implement these interfaces directly on top of OpenTelemetry, taking OpenCensus out of the picture. All of these are heavily annotated as internal and subject to change though.

@punya
Copy link
Member

punya commented Mar 16, 2022

I wasn't aware of that layer of indirection myself, thanks @barend for pointing it out. If you do end up creating an OTel-based implementation of ApiTracerFactory, I'd be happy to dig up who owns that component within Google and ask about getting your implementation into gax-java (assuming that's something you want to pursue).

@barend
Copy link

barend commented Mar 16, 2022

I'm afraid I can't put that on top of my list right now, but I'll keep that in mind. Of course, I'd be just as happy if someone beat me to it.

@jack-berg
Copy link
Member

This has been partially addressed in #5010: the side affect in which AutoConfiguredOpenTelemetrySdk.initialize() is called if GlobalOpenTelemetry has not been set when GlobalOpenTelemetry.get() is called is now opt in.

However even without opting in, if GlobalOpenTelemetry has not been set when GlobalOpenTelemetry.get() is called, GlobalOpenTelemetry will still be set to OpenTelemetry.noop().

As discussed here the advantage of this is that you have an invariant that every call to GlobalOpenTelemetry.get() gets the same response, regardless of ordering. While not perfect in every case, it simplifies debugging initialization issues and is pretty good in most situations.

I'm going to close this issue. Feel free to reopen it if there are new arguments to change the behavior.

@yzhaoa
Copy link

yzhaoa commented Apr 30, 2023

For posterity - resetForTest doesn't actually work if you need to capture the opencensus traces. By the time you call resetForTest the opencensus-shim has already captured in a static variable a tracer derived from OpenTelemetry.noop(), so any subsequent attempts to change GlobalOpenTelemetry will not be used by the opencensus shim and thus not produce any traces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants