Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDI context propagation improvements for the reactive stack #27443

Merged
merged 3 commits into from
Sep 8, 2022

Conversation

mkouba
Copy link
Contributor

@mkouba mkouba commented Aug 23, 2022

This PR implements a new way to store the ArC request scope. Instead of storing that object in a ThreadLocal (which is problematic in an event loop-based architecture), the request scope is stored in the Vert.x duplicated context which is used for the whole reactive/async processing. If there is no duplicated context, it falls back to a thread-local.

This PR makes accessing request-scope beans a lot simpler in reactive applications. The request scope is available as soon as the continuations are executed on the same duplicated context. We aware that if this access happens after the termination of the scope, an error is reported (as it's clearly misuing the concepts).

ArC

Introduce the VertxCurrentContextFactory so that the Vert.x duplicated context can be used to store the "current context" for normal scopes. If we're not on the Vert.x duplicated context we fallback to the ThreadLocal (or FastThreadLocal respectively).

resteasy-reactive

Don't clear our CDI current request when suspending.

This was done to prevent subsequent requests running on the same thread as a suspended request from accessing the former's data. With the advent of DuplicatedContext-backed storage, there is no longer any chance of mixing data, so there is no need to clear it out.

Furthermore, by not clearing out current requests, the code that accesses the request scoped CurrentVertxRequest
that is executed while the request is suspended can now work even if context propagation is not in play.

Tests

Add leak detection tests for resteasy reactive, graphql, and reactive rest client. Also, improve the OpenTelemetry reactive tests.

OpenTelemetry

Only register the OpenTelemetryClientFilter for RESTEasy Client Classic. Use Capabilities to determine when to register the OpenTelemetryClientFilter.

@mkouba
Copy link
Contributor Author

mkouba commented Aug 24, 2022

@Sanne would you be able to benchmark this PR (with CP disabled) to get some actual numbers?

@quarkus-bot

This comment has been minimized.

@mkouba
Copy link
Contributor Author

mkouba commented Aug 26, 2022

@radcortez @brunobat Hey guys, the OpenTelemetryReactiveTest#multiple() test fails in this WIP because it expects 7 spans but it only gets 6. Unfortunately, the test does not contain any description and my OTel tracing knowledge is minimal. Would you care to take a look? Any details would be appreciated...

@cescoffier may also add some relevant info because he fixed some OTel reactive problem recently.

@cescoffier
Copy link
Member

I didn't fix anything; I've found some code that would need to be changed to support this PR. OTel is duplicating duplicated context to create a new span. With this PR, you cannot do that anymore. We need to find another way, like marking the context or something like this.

@mkouba
Copy link
Contributor Author

mkouba commented Aug 26, 2022

I didn't fix anything

Eh, yes that commit is actually pretty old :D

@geoand
Copy link
Contributor

geoand commented Aug 26, 2022

Also, please rebase onto main in order to pick up a CI fix.

@mkouba mkouba force-pushed the vertx-context-reference-factory branch from 6893a45 to 297cb93 Compare August 26, 2022 12:11
@brunobat
Copy link
Contributor

brunobat commented Aug 26, 2022

@radcortez @brunobat Hey guys, the OpenTelemetryReactiveTest#multiple() test fails in this WIP because it expects 7 spans but it only gets 6. Unfortunately, the test does not contain any description and my OTel tracing knowledge is minimal. Would you care to take a look? Any details would be appreciated...

@cescoffier may also add some relevant info because he fixed some OTel reactive problem recently.

I believe we are missing a span like this, for the 2nd client call, with QueryParam Goku:
SdkSpan{traceId=f6bccae8b9c40899056cfb2236529f82, spanId=2d51ec72a1e13750, parentSpanContext=ImmutableSpanContext{traceId=f6bccae8b9c40899056cfb2236529f82, spanId=f67817406f93a6ac, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, name=HTTP GET, kind=CLIENT, attributes=AttributesMap{data={http.method=GET, http.url=http://localhost:8081/reactive?name=Naruto, http.user_agent=Resteasy Reactive Client, http.status_code=200, http.response_content_length=12}, capacity=128, totalAddedValues=5}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1661524694185603792, endEpochNanos=1661524696249212208}

@cescoffier
Copy link
Member

cescoffier commented Aug 26, 2022

@brunobat yes, because the initial code was switching to another duplicated context without going back to it after the operation.

So either we need to implement this second switch or we find another way to trigger the creation of the span.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@mkouba
Copy link
Contributor Author

mkouba commented Aug 31, 2022

@cescoffier ^ problem with formatting...

@cescoffier cescoffier force-pushed the vertx-context-reference-factory branch from 3edf30d to 7382676 Compare August 31, 2022 06:50
@cescoffier
Copy link
Member

@mkouba fixed. I added the RR and a GrapQL tests as discussed.
I'm now looking at the OTel issue.

@quarkus-bot

This comment has been minimized.

@radcortez
Copy link
Member

I've changed the registration of the OpenTelemetryClientFilter only to be applied in the RESTEasy Classic Client. This avoids the double spans for the Reactive Client, which also uses the Vert.x Tracer.

@radcortez
Copy link
Member

I would also like to add some additional tests on the OTel side.

@cescoffier
Copy link
Member

@radcortez what about testing that the span "tree" is correct (what I did manually yesterday)?

The purpose would be to detect changes in the structure when we add more OTel support in Mutiny (because typically, for mutiny, the tree is slightly different with the last span having two parents).

@mkouba mkouba force-pushed the vertx-context-reference-factory branch from 1f7d1d2 to 00ce709 Compare September 7, 2022 06:04
@rsvoboda rsvoboda added the triage/qe? Issue could use a quality focused review to further harden it label Sep 7, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented Sep 7, 2022

/cc @mjurc, @rsvoboda

@quarkus-bot

This comment has been minimized.

@mkouba mkouba force-pushed the vertx-context-reference-factory branch from 00ce709 to 6bd931c Compare September 7, 2022 13:06
@quarkus-bot quarkus-bot bot added the area/lra label Sep 7, 2022
@gsmet gsmet dismissed their stale review September 7, 2022 15:11

Addressed!

@mkouba mkouba added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Sep 7, 2022
stuartwdouglas and others added 3 commits September 7, 2022 18:56
== ArC
Introduce the VertxCurrentContextFactory so that the Vert.x duplicated
context can be used to store the "current context" for normal scopes.

== resteasy-reactive
Don't clear our CDI current request when suspending.

This used to be done in order to prevent subsequent
requests running on the same thread as a suspended
request from accessing the former's data.
With the advent of DuplicatedContext backed
storage, there is no longer any chance of mixing
data so there is no need to clear it out.

Furthermore, by not clearing out current request,
code that accesses the request scoped CurrentVertxRequest
that is executed while the request is suspended, can now
work even if context propagation is not in play.

== Tests
Add leak detection tests for resteasy reactive, graphql and reactive rest client. Also improve the OpenTelemetry reactive tests.

== OpenTelemetry
Only register the OpenTelemetryClientFilter for RESTEasy Client Classic. Use Capabilities to determine when to register the OpenTelemetryClientFilter.

Co-authored-by: Georgios Andrianakis <[email protected]>
Co-authored-by: Clement Escoffier <[email protected]>
Co-authored-by: Roberto Cortez <[email protected]>
Co-authored-by: brunobat <[email protected]>
Co-authored-by: Martin Kouba <[email protected]>
@mkouba mkouba force-pushed the vertx-context-reference-factory branch from 55877b0 to 2b64b81 Compare September 7, 2022 16:57
@mkouba mkouba merged commit 79e9f85 into quarkusio:main Sep 8, 2022
@quarkus-bot quarkus-bot bot added this to the 2.13 - main milestone Sep 8, 2022
@quarkus-bot quarkus-bot bot removed the triage/waiting-for-ci Ready to merge when CI successfully finishes label Sep 8, 2022
@rsvoboda
Copy link
Member

rsvoboda commented Sep 8, 2022

Triggered adhoc GitHub Action Daily run to check possible side-effects and the run was green.
No regression was detected with our TS (https://github.com/quarkus-qe/quarkus-test-suite).

michael-simons added a commit to quarkiverse/quarkus-neo4j that referenced this pull request Sep 19, 2022
Fixes failing CI tests. Somehow being related to this change here quarkusio/quarkus#27443. I don't think that our use of `CompleteableFuture` is wrong, it's what has been suggested a while back to chose over our own drivers reactive types and we am not gonna change that. The regex is now more lenient. FWIW if anything should change than Quarkus and the CI system should make sure it works both the same way in the CI in *this* project here and the overall integration project running on the whole of Quarkivere.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.