Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring Boot WebFlux + open-telemetry java agent, use Baggage and makeCurrent() in Filter, traceId is duplicated for many times. #12085

Closed
giantoneday opened this issue Aug 22, 2024 · 6 comments
Labels
bug Something isn't working needs triage New issue that requires triage

Comments

@giantoneday
Copy link

giantoneday commented Aug 22, 2024

Describe the bug

Hi,
I use Spring Boot WebFlux + open-telemetry java agent, and add Baggage to the Context, but I found that if I add baggage.storeInContext(Context.current()).makeCurrent() on Filter,

  1. Call a normal block http api, everything is fine.
  2. Call a flux http api, each traceId is duplicated for many times.
  3. If I don't add baggage to Context and don't makeCurrent(), then flux api is also fine.

My test project is here: flux-open-telemetry.zip

Uploading flux-open-telemetry.zip…

And please read the README to reproduce my issue. Both opentelemetry-javaagent 2.7.0 or 2.4.0 have the same issues.
Could you please show me how to fix this problem? Because it's important for me.

some similar issues reported by others:

  1. optl-instrumentation with spring-webflux generate duplicated traceId #8125
  2. Many different requests with a same traceId, seems a bug? #3168
  3. Duplicate Trace ID opentelemetry-java#1531

Thanks.

Steps to reproduce

My test project is attached, and please read the README to reproduce my issue.
Could you please show me how to fix this problem? Because it's important for me.

Expected behavior

traceId is not duplicated

Actual behavior

traceId is duplicated

Javaagent or library instrumentation version

2.4.0 or 2.7.0

Environment

JDK: 17
OS: mac/linux

Additional context

No response

@giantoneday giantoneday added bug Something isn't working needs triage New issue that requires triage labels Aug 22, 2024
@giantoneday giantoneday changed the title Spring Boot WebFlux + open-telemetry java agent, use Baggage, traceId is duplicated for many times. Spring Boot WebFlux + open-telemetry java agent, use Baggage and makeCurrent() in Filter, traceId is duplicated for many times. Aug 23, 2024
@giantoneday
Copy link
Author

@zeitlinger
Copy link
Member

I was able to reproduce the issue - it was not introduced by #12003

@laurit
Copy link
Contributor

laurit commented Aug 23, 2024

@giantoneday After starting the application do only one request. Now if you examine the log file you'll notice that

[INFO ] [2024-08-23 14:32:42.088] [logid: traceId=] [uid:] [http-nio-8080-exec-1] [TID: N/A] [c.f.c.CommonUtil:print:25] [default] [http-nio-8080-exec-1] com.fluxtest.controller.CommonUtil 25  telemetry: tag: ReqLogConfig.beforeRequest traceId: f4b349fa0f4b8b240f7c1adeca12e49e

before request is logged on thread http-nio-8080-exec-1 and

[INFO ] [2024-08-23 14:32:43.179] [logid: traceId=] [uid:] [http-nio-8080-exec-2] [TID: N/A] [c.f.c.ReqLogConfig$RequestLoggingFilter:afterRequest:49] [default] [http-nio-8080-exec-2] c.fluxtest.controller.ReqLogConfig 49  afterRequest: message:[REQUEST: [GET /test/stream?reqId=11111, client=0:0:0:0:0:0:0:1, headers=[host:"localhost:8080", user-agent:"curl/8.6.0", accept:"text/event-stream"]]] traceId: f4b349fa0f4b8b240f7c1adeca12e49e

after request is logged on a different thread http-nio-8080-exec-2. There is a requirement that scope must be closed on the same thread that opened it. Essentially when you call context.makeCurrent() you bind the context to the current thread, failure to close it means that thread will remain associated with that context. Closing the scope on wrong thread doesn't do anything. If a thread already is associated with context then a new trace won't be created when there is an incoming request and you can observe that different requests use the same trace when they run on the same thread.

I was able to reproduce the issue - it was not introduced by #12003

@zeitlinger since the original report mentions that it already does not work in 2.4.0 I find it hard to believe that the PR you singled out could be the culprit as it was merged only in 2.7.0

@laurit laurit added the needs author feedback Waiting for additional feedback from the author label Aug 23, 2024
@zeitlinger
Copy link
Member

@zeitlinger since the original report mentions that it already does not work in 2.4.0 I find it hard to believe that the PR you singled out could be the culprit as it was merged only in 2.7.0

I said it did not introduce it.

Looking at the user code, I can't see what's wrong with it.
In other words: how can you write this filter to avoid the issue?

 private static class RequestLoggingFilter extends AbstractRequestLoggingFilter {

        public RequestLoggingFilter() {
            setAfterMessagePrefix("REQUEST: [");
        }

        @Override
        protected void beforeRequest(HttpServletRequest request, String message) {
            Baggage baggage = Baggage.builder()
                    .put(BAGGAGE_USER_ID, String.valueOf(System.currentTimeMillis()))
                    .build();
            Scope scope = baggage.storeInContext(Context.current()).makeCurrent();
            request.setAttribute(SCOPE, scope);
            print("ReqLogConfig.beforeRequest", true);
        }

        @Override
        protected void afterRequest(HttpServletRequest request, String message) {
            //print("ReqLogConfig.afterRequest", false);
            log.info("afterRequest: message:[{}] traceId: {}", message, getTraceId(false));
            Scope scope = (Scope) request.getAttribute(SCOPE);
            if (scope != null) {
                log.info("close scope: {} {}", scope.getClass(), scope);
                scope.close();
            }
            print("afterRequest finish", false);
            //Span.current().end();
        }
    }

@laurit
Copy link
Contributor

laurit commented Aug 23, 2024

Looking at the user code, I can't see what's wrong with it.
In other words: how can you write this filter to avoid the issue?

Have a look at how AbstractRequestLoggingFilter is implemented

        if (shouldLog && isFirstRequest) {
            this.beforeRequest((HttpServletRequest)requestToUse, this.getBeforeMessage((HttpServletRequest)requestToUse));
        }

        try {
            filterChain.doFilter((ServletRequest)requestToUse, response);
        } finally {
            if (shouldLog && !this.isAsyncStarted((HttpServletRequest)requestToUse)) {
                this.afterRequest((HttpServletRequest)requestToUse, this.getAfterMessage((HttpServletRequest)requestToUse));
            }
        }

The isAsyncStarted check is what messes things up. You should close the scope regardless of whether request is handled asynchronously or not. I'd assume that instead of spring helper classes you could use a plain filter implementation or perhaps just extend the OncePerRequestFilter so that you could write something like

try (Scope ignore = baggage.makeCurrent()) {
  filterChain.doFilter(request, response);
}

@giantoneday
Copy link
Author

@laurit @zeitlinger tks for your reply. I use a plain filter(tomcat level's filter) to fix this problem, tks.

@github-actions github-actions bot removed the needs author feedback Waiting for additional feedback from the author label Aug 26, 2024
@laurit laurit closed this as completed Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New issue that requires triage
Projects
None yet
Development

No branches or pull requests

3 participants