-
Notifications
You must be signed in to change notification settings - Fork 38.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Observation.Context from WebClient request #32198
Comments
Thanks for the report. I'm using the following controller to run many requests with the package com.example.webclientleak;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.reactive.function.client.WebClient;
@RestController
public class TestController {
private final WebClient webClient;
public TestController(WebClient.Builder builder) {
this.webClient = builder.baseUrl("http://localhost:8080").build();
}
@GetMapping("/test")
public String test() {
return "TEST";
}
@GetMapping("/")
public Mono<String> index(@RequestParam int count) {
return Flux.range(1, count)
.flatMap(i -> this.webClient.get().uri("/test").retrieve().bodyToMono(String.class))
.last();
}
} I ran many of queries against the main endpoint (so millions of webclient requests) and I don't see any memory leak in my tools. The Java heap is not growing and I can consistently get it down when triggering manual GCs. I don't understand why #31702 would change anything, as this commit doesn't change the nature of this "cycle" since the observation context was already pointing to the request builder and the request (which itself contains the context as an attribute). This change merely changes the ordering of things. I don't think cycle like this cause memory leaks in the first place, as they're not reachably from GC roots once the request is sent. Do you have a different perspective on this? It would be interesting to get why those context instances are still reachable in your case and what's holding a reference to them. In my case, I only had a couple of those in memory as they were held by the observation registry, waiting to be sent/processed. Additionnally, any indication on how to reproduce this locally would be very much welcome. Thanks! |
Not really sure either, but as the heapdump indicates some memory are retained and the change happened to the classes what I can see. That was my only lead: ClientRequestObservationContext observationContext = new ClientRequestObservationContext(requestBuilder); and before in 6.1.1 it was created without the request: ClientRequestObservationContext observationContext = new ClientRequestObservationContext(); Also this in 6.1.2 ClientRequest request = requestBuilder
.attribute(ClientRequestObservationContext.CURRENT_OBSERVATION_CONTEXT_ATTRIBUTE, observationContext)
.build(); At least it feels related from the bump in version and what I can see from the dump. WIll give it a try locally in my integration test if I can hammer on one endpoint, haven't been able to reproduce locally so far either |
Did the reactor-netty dependency version change at all between both? |
Only change was webflux, I done a lot of tests to be able to isolate which change caused the issue: <dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-webflux</artifactId>
<version>6.1.2</version>
</dependency> |
How do you mean like a test app? Or the amount of traffic? |
I created a small project on start.spring.io and added the controller I shared in a comment above. This doesn't reproduce the memory leak after millions of requests. I'm thinking that I'm probably missing some part of the setup. Maybe a specific Reactor version is required, a specific configuration property? Or maybe the server part needs to be using MVC, not WebFlux as I was using in my sample? Maybe this only happens on a particular JVM version? Maybe an actual observation registry implementation needs to be involved? I'm not saying this is not a bug on our side, but it's hard to fix a memory leak without reproducing it and seeing where the allocation happens. If you can reproduce the issue on a smaller sample, I'd be happy to have a look and work on a fix. Without that, my best bet would be to revert the change I introduced in 6.1.2 but this would give zero guarantee (you could try it for yourself in your application by using spring-webflux 6.1.2 and adding the former |
Totally understand, I had the ambition to recreate it locally as well without success, creating a smaller sample is therefore not as easy, have only seen this in live traffic after 2-3h running. Also using Webflux in the server app. Could try your suggestion to include the former impl, but even if that is successful It's not the best approach going forward... |
You previous screenshot points to
I'm not suggesting that as a solution, but really another step to ensure that this change is the culprit, even though we can't reproduce it locally. To be honest, I'm not sure how the change in 6.1.2 could create a problem like this as the " reference cycle" existed already. |
I have run with lots of different versions of the other dependencies reactor, netty and so on .. but seems like this patch version is the one that triggers the behaviour. I guess It could be some other component that causes the issues, hard to tell |
Some update: We have managed to avoid the memory leak by instead of using
|
In the snippet you shared here, it does not seem like you are using Spring Boot's managed WebClient builder. Per reference documentation, the OOTB managed builder has Observation component pre-attached: Its a long shot, but I wonder if you use OOTB Spring managed builder if the issue is re-producible. Of course, the other challenge is that there is no reproducer app (yet). |
I think this currently points to a "connection leak", or at least a setup where many connections are cancelled or closed and the connection pool is taking some time to invalidate and clean up resources. The change in Spring Framework is likely to make this behavior more obvious by attaching more memory to the connection attribute? I'll close this issue for now since I still haven't found a way to reproduce this. We'll reopen as soon as we have a reproduced so we can investigate. |
Hi together, I upgraded our application from Spring Boot 3.0.13 to 3.2.2 and I see the same issue as @erimolNordnet reported, too. The issue appears at the moment only in our background job where we make a lot of http requests and the cpu usage is very high. Before we make a request we load some data which could be multiple megabytes. In the screenshot the CancelableContinuationImpl contains the data which was loaded before. It looks like that the netty client underneath is not releasing the memory after the request. Our webclient config uses a connection pool with the lifo setting and the evictInBackground setting is set to 45 seconds. In our api instance we couldn`t see this issue. Maybe this helps to get an idea why this is happening. Our quickfix is to set spring-webflux to 6.1.1. Best, Konstantin |
Can you provide a minimal sample application that demonstrates the problem? You seem to imply that the connection pool does not release the connection but this is not managed by Spring so I don't see the link. I think that the leak might already be there and that reverting the Spring version might just make the problem less visible. I'll revisit this anyway to check whether we can improve things. But just so you know we have not found any proof of leak in Spring Framework so far. |
Unfortunately I was not able to reproduce it locally. But I keep trying. Do you know if there is a place where the request attributes are cleared before the connection is released to the pool? |
The request attributed were never cleared as the request object instance is getting GC'd. Since this bothers people that much we might remove the observation as a request attribute and just let people get it from the observation context. This probably won't solve the actual memory management issues but this might make it less visible. |
Have a running app since 3 years in production without any issue. Experienced a memory leak when bumped Spring dependencies and have now traced it down to a single library: spring-webflux 6.1.1 -> 6.1.2 (no other code change)
It seems like their is a circular dependency between ClientRequestObservationContext and ClientRequest.Builder that cannot be garbage collected. The behaviour is deterministic in the sense that I managed to recreate the issue multiple times, it takes a couple of hours before running out of heap space. One instance of the application is quite heavy used with 50-100 requests / s and lots of downstream calls to other MicroServices using WebClient where the issue seems to be.
Changes that I think could be related:
Please let me know if I can provide more information, first time posting a bug report here.
// Erik
Have done an heap dump multiple times and retained memory increasing for the specified types. Attaching a print screen of the heap dump:
Heap before update
Heap after update
The text was updated successfully, but these errors were encountered: