Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging: OpenTelemetry spans are ignored as parents for LogRecords #9302

Closed
dashpole opened this issue Jan 25, 2024 · 15 comments · Fixed by #10030
Closed

logging: OpenTelemetry spans are ignored as parents for LogRecords #9302

dashpole opened this issue Jan 25, 2024 · 15 comments · Fixed by #10030
Assignees
Labels
api: logging Issues related to the Cloud Logging API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@dashpole
Copy link

dashpole commented Jan 25, 2024

Client

logging: https://github.com/googleapis/google-cloud-go/tree/main/logging

Current behavior

The current implementation of populateTraceInfo always uses a remote span context as the parent

func populateTraceInfo(e *Entry, req *http.Request) bool {
if req == nil {
if e.HTTPRequest != nil && e.HTTPRequest.Request != nil {
req = e.HTTPRequest.Request
} else {
return false
}
}
header := req.Header.Get("Traceparent")
if header != "" {
// do not use traceSampled flag defined by traceparent because
// flag's definition differs from expected by Cloud Tracing
traceID, spanID, _ := deconstructTraceParent(header)
if traceID != "" {
e.Trace = traceID
e.SpanID = spanID
return true
}
}
header = req.Header.Get("X-Cloud-Trace-Context")
if header != "" {
traceID, spanID, traceSampled := deconstructXCloudTraceContext(header)
if traceID != "" {
e.Trace = traceID
e.SpanID = spanID
// enforce sampling if required
e.TraceSampled = e.TraceSampled || traceSampled
return true
}
}
return false
}

By "Remote span context", I mean a trace ID and span ID that originate from outside the application. The code above always uses a remote span context because it explicitly parses the trace id and span id from HTTP headers.

When a user has instrumented their application with OpenTelemetry, the trace ID and span ID have already been parsed from the HTTP headers, and used to start a span. If they are using the otelhttp library, it:

  1. extracts the context from req.Header into a golang context.Context
  2. uses the context to start a span
  3. updates the context.Context on the req using req.WithContext

Expected behavior

If the user is using opentelemetry instrumentation, we want to use the traceID and spanID of the span that OpenTelemetry created. This way, when a user looks for logs associated with that span, the logs produced by this logging library will appear. The traceID and spanID are easy to get since otelhttp stores them in req.Context().

We can do something close to this:

func populateTraceInfo(e *Entry, req *http.Request) bool {
	if req == nil {
		if e.HTTPRequest != nil && e.HTTPRequest.Request != nil {
			req = e.HTTPRequest.Request
		} else {
			return false
		}
	}
        // New handing code
        otelSpanContext := trace.SpanContextFromContext(req.Context())
	if otelSpanContext.IsValid() {
	        e.Trace := otelSpanContext.TraceID()
	        e.SpanID := otelSpanContext.SpanID()
                e.TraceSampled = e.TraceSampled | otelSpanContext.IsSampled()
                return true
	}
        // end of new code
	header := req.Header.Get("Traceparent")
         if header != "" { 
 		// do not use traceSampled flag defined by traceparent because 
 		// flag's definition differs from expected by Cloud Tracing 
 		traceID, spanID, _ := deconstructTraceParent(header) 
 		if traceID != "" { 
 			e.Trace = traceID 
 			e.SpanID = spanID 
 			return true 
 		} 
 	} 
 	header = req.Header.Get("X-Cloud-Trace-Context") 
 	if header != "" { 
 		traceID, spanID, traceSampled := deconstructXCloudTraceContext(header) 
 		if traceID != "" { 
 			e.Trace = traceID 
 			e.SpanID = spanID 
 			// enforce sampling if required 
 			e.TraceSampled = e.TraceSampled || traceSampled 
 			return true 
 		} 
 	} 
 	return false 
 } 

To test this, we should make sure an http request with a context from a span creation ends up in the LogEntry:

    req := &http.Request{}
    ctx := context.Background()
    ctx, _ = trace.NewNoopTracerProvider().Tracer("test instrumentation").Start(ctx, "span name")
    req = req.WithContext(ctx)
    e := &Entry{}
    got := populateTraceInfo(e, req)
    assert.True(t, got)
    assert.Len(t, e.Trace, 32)
    assert.Len(t, e.SpanID, 16)
    // TODO test sampled somehow
@dashpole dashpole added the triage me I really want to be triaged. label Jan 25, 2024
@product-auto-label product-auto-label bot added the api: logging Issues related to the Cloud Logging API. label Jan 25, 2024
@quartzmo quartzmo assigned quartzmo and unassigned gkevinzheng Jan 26, 2024
@quartzmo quartzmo added triage me I really want to be triaged. and removed triage me I really want to be triaged. labels Jan 26, 2024
@quartzmo quartzmo assigned gkevinzheng and unassigned quartzmo Jan 26, 2024
@quartzmo
Copy link
Member

Sorry for the label churn, I misunderstood where this function was.

@gkevinzheng
Copy link
Contributor

@dashpole how can we get this to work both in the case where the user is using otel libraries and the case where they're not using otel libraries?

@dashpole
Copy link
Author

dashpole commented Jan 30, 2024

I'm suggesting you add a case that handles otel libraries in addition to the current parsing for traceparent and XCTC. If they aren't using OTel libraries, it should fall back to the existing logic.

Edit: I've updated the sample code above to show the full function, with the addition of OTel-handling logic.

@gkevinzheng
Copy link
Contributor

Ahh, ok. Are there any other libraries customers are using besides this one that we might have to support in the future?

@dashpole
Copy link
Author

There are some projects that predated OpenTelemetry, such as OpenCensus or OpenTracing that had similar functionality, but both projects are archived in favor of OpenTelemetry. For Go, OpenTelemetry is really the only in-process tracing instrumentation that has broad adoption.

@cindy-peng
Copy link
Contributor

cindy-peng commented Jan 30, 2024

Hi @dashpole , do you know if this logic works the same for other language version of otelhttp?

When a user has instrumented their application with OpenTelemetry, the trace ID and span ID have already been parsed from the HTTP headers, and used to start a span. If they are using the otelhttp library, it:

extracts the context from req.Header into a golang context.Context
uses the context to start a span
updates the context.Context on the req using req.WithContext

Can we assume the suggested fix will be similar for nodejs, java and python?

@dashpole
Copy link
Author

Yes, although context implementations differ between languages. @aabmass might be able to help with the other languages if you have questions.

@noahdietz noahdietz added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p3 Desirable enhancement or fix. May not be included in next release. and removed triage me I really want to be triaged. labels Jan 30, 2024
@noahdietz
Copy link
Contributor

It sounds like there is some unexpected behavior re:tracing contexts used so I added the type: bug label as part of issue triage, feel free to change that, just trying to organize things.

@cindy-peng
Copy link
Contributor

cindy-peng commented Feb 6, 2024

Hi @aabmass , I am currently working on similar fixes for this issue in nodejs-logging and java-logging repos. I am looking for similar approaches as following to to get SpanContext in node and java to propagate the correct traceId/span Ids:

        // New handing code
        otelSpanContext := trace.SpanContextFromContext(req.Context())
	if otelSpanContext.IsValid() {
	        e.Trace := otelSpanContext.TraceID()
	        e.SpanID := otelSpanContext.SpanID()
                e.TraceSampled = e.TraceSampled | otelSpanContext.IsSampled()
                return true
	}
        // end of new code

Seems like there are quite a few OTEL instrumentation libraries. Could you point me to the right OTEL nodejs and java libraries/dependencies that I can use to retrieve the span context?

Here are where the traceId handled in logging repos currently:

  1. nodejs-logging: https://github.com/googleapis/nodejs-logging/blob/main/src/entry.ts#L235
  2. Java-logging: https://github.com/googleapis/java-logging/blob/main/google-cloud-logging/src/main/java/com/google/cloud/logging/LoggingImpl.java#L841

@gkevinzheng
Copy link
Contributor

@aabmass For Python integrations like Django/Flask, should we use the otel library for Python, or try to integrate the otel Django/Flask integrations with our logging library?

@dashpole
Copy link
Author

dashpole commented Feb 6, 2024

@cindy-peng @gkevinzheng You should just need to use the trace API library for each language, rather than a specific instrumentation library. I was just trying to illustrate how they would work together, but the implementation I wrote above for Go will work no matter what library they are using.

I think for JS, you can get the current span context like this: https://github.com/open-telemetry/opentelemetry-js/blob/f86251d40fbf615be87319c8a1f5643afb820076/doc/propagation.md?plain=1#L38

    const spanContext = trace.getSpanContext(context);
    // Skip if the current span context is not valid or suppressed.
    if (
      !spanContext ||
      !isSpanContextValid(spanContext) ||
      isTracingSuppressed(context)
    ) {
      // fall-back to parsing traceparent or XCTC headers
      return;
    }
    // use spanContext.traceId, spanContext.spanId and spanContext.traceFlags & TraceFlags.SAMPLED

@dashpole
Copy link
Author

dashpole commented Feb 6, 2024

For java, it would look similar to this: https://github.com/open-telemetry/opentelemetry-java/blob/3e7302e742227d4526b765621a933d748012c906/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/TraceBasedExemplarFilter.java#L35

    spanContext SpanContext = Span.fromContext(context).getSpanContext()
    // make sure it is valid, and then use spanContext's trace id, span id, and sampled flag

@dashpole
Copy link
Author

dashpole commented Feb 6, 2024

Looks like in python it is something like this: https://github.com/open-telemetry/opentelemetry-python/blob/66e7d61ceba839c16d51b9297334ec8615cdf6a0/opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py#L479

    span_context = get_current_span().get_span_context()
    // make sure it is valid?  Then use spanContext's  trace id, span id, and sampled flag

@cindy-peng
Copy link
Contributor

cindy-peng commented Feb 6, 2024

Thanks @dashpole! There is another impacted scenario for App Engines, Cloud Functions and Cloud Runs with automatic tracing(https://cloud.google.com/trace/docs/overview#configurations_with_automatic_tracing). As a fix for these environments, do you think we should generate our own new Span Id instead of retrieving it form headers?

More generically, for customers using tracing via Tracing Client libraries instead of Open Telemetry, do you have any suggestions for Span Id processing?

@dashpole
Copy link
Author

dashpole commented Feb 6, 2024

Parsing the XCTC or traceparent headers as a fallback if OpenTelemetry span context is not found (as the logging library in Go does today) will handle the automatic tracing from Cloud Run, app engine, etc. even for users not using OpenTelemetry, since those platforms set the XCTC header.

I would not recommend integrating with the non-OpenTelemetry client libraries unless there is a significant need. It should be possible to do something similar to what i'm suggesting for OTel (find the "get current span context" method in those libraries), but we expect to change the recommendation to use OTel for tracing in those languages very soon.

@gkevinzheng gkevinzheng added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: logging Issues related to the Cloud Logging API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
5 participants