Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebClient observations should not record CANCEL signals as aborted if response was received #30070

Closed
rubasace opened this issue Mar 6, 2023 · 0 comments
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) theme: observability An issue related to observability and tracing type: enhancement A general enhancement
Milestone

Comments

@rubasace
Copy link

rubasace commented Mar 6, 2023

A couple years ago I opened an issue about how the cancellation signal was not being properly handled: spring-projects/spring-boot#18444

On it, we ended up implementing a solution where the cancellation signal would only be processed in case it was received without any onNext signal being emitted before:

@bclozel what's the current implementation behaviour when we are consuming the body as a Flux and we limit it with something like next() or take(n). I implemented something very similar on a custom ExchangeFilterFunction for logging and in those scenarios, I was logging false positives for timeouts as after receiving the requested elements a cancellation signal is sent upstream to stop the producer. I think the same might happen here with the current implementation.

The way I solved it was simply flagging somehow on the doOnNext() operations that we already received something, so if that happens I just ignore the cancellation signal assuming it's desired behaviour. Of course, there's the edge case where we are actually timing out on a Flux if a particular element takes too long, but I think it's an acceptable tradeoff given that we don't know the reason of the cancellation.

What do you think?

That was the way it was approached on Spring Boot 2.x:

@rubasace this feature is not about recording timeouts, but recording cancelled requests. This can happen for various operators like timeout or take, and in all cases we can't know whether this is the intended behavior or an error.

From a pure WebClient point of view, we're recording cancellations because the request did not complete and we cannot record tags with the response (status, execution time).

The workaround you're describing might work, but we'd still get "false negatives" if the client receives the response headers but no body, or only a partial body.
You're right that in this case we could still record metrics twice for the same request: once when receiving the response, and another time if a cancel event happens while reading the body.

I think it's still a good compromise and I'll adapt these changes to only record cancellations if we did not receive any response at all.

With the new Observation API implementation, the onCancel signal is being processed without checking if the onNext signal has already being processed, creating the same problems I was describing on the old issue:

return responseMono.doOnNext(observationContext::setResponse)
						.doOnError(observationContext::setError)
						.doOnCancel(() -> {
							observationContext.setAborted(true);
							observation.stop();
						})
						.doOnTerminate(observation::stop);

I believe it would make sense to introduce a check on the onCancel() as, right now, a WebClient call that receives a Flux and converts and calls a method like .next() or take(n) registers each call twice (one as SUCCESS and one as CLIENT_ERROR).

EDIT: sorry, missclicked when creating the issue, before adding any description or a proper title

@rubasace rubasace changed the title Observation handling of cancellation esignal Observation handling of cancellation has changed (broken?) on Spring Boot 3 Mar 6, 2023
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Mar 6, 2023
@rubasace rubasace changed the title Observation handling of cancellation has changed (broken?) on Spring Boot 3 Observation handling of cancellation for Flux.next() and Flux.take(n) records each call twice (as both SUCCESS and ERROR) Mar 6, 2023
@bclozel bclozel transferred this issue from spring-projects/spring-boot Mar 6, 2023
@bclozel bclozel self-assigned this Mar 6, 2023
@bclozel bclozel added in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement theme: observability An issue related to observability and tracing and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Apr 7, 2023
@bclozel bclozel added this to the 6.0.8 milestone Apr 7, 2023
@bclozel bclozel changed the title Observation handling of cancellation for Flux.next() and Flux.take(n) records each call twice (as both SUCCESS and ERROR) WebClient observations should not record CANCEL signals as aborted if response was received Apr 7, 2023
@bclozel bclozel closed this as completed in 01f9788 Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) theme: observability An issue related to observability and tracing type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants