Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon Services: no current Vertx context found for an S3 call with reactive resteasy and panache reactive #34687

Open
scrocquesel opened this issue Jul 11, 2023 · 26 comments
Labels
area/panache kind/bug Something isn't working

Comments

@scrocquesel
Copy link
Contributor

scrocquesel commented Jul 11, 2023

Describe the bug

A user reported an issue using AWS extension and reactive panache at the same time. Trying to reproduce the issue with s3 quickstart and I'm stuck trying to make it work.

First, adding quarkus-hibernate-reactive-panache and calling a JAX-RS endpoint that do not use Panache entity or repository but make S3 calls fails, trying to autoclose an on demand Session.

I see two issues:

  • Could it be possible to narrow the detection of panache usage per method in a resource class.
  • How to run a continuation on the Vertx ThreadPool after the s3 completionstage ? This is a bit off the issue but if narrowing is not possible this could solve this issue and explain usage of both s3 and panache in the same method

Expected behavior

Request should return a 200 response. At least, it should not try to close a session. Looking at #34671, it seems that the WithSessionOnDemand annotation is added even if the Resource does not inject a Panache entity nor repository. The detection mechanism is a bit too wide and it detects static usage of entity/repo in others methods.

Actual behavior

Request failed with

2023-07-11 22:59:56,208 INFO  [io.quarkus] (Quarkus Main Thread) Installed features: [amazon-s3, cdi, hibernate-orm, hibernate-reactive, hibernate-reactive-panache, reactive-pg-client, resteasy-reactive, resteasy-reactive-jackson, smallrye-context-propagation, vertx]
2023-07-11 23:00:17,048 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-1) HTTP Request to /async-s3 failed, error id: 35acc389-f5b0-4f34-bc59-b3506f412974-1: java.lang.IllegalStateException: No current Vertx context found
        at io.quarkus.hibernate.reactive.panache.common.runtime.SessionOperations.vertxContext(SessionOperations.java:186)
        at io.quarkus.hibernate.reactive.panache.common.runtime.SessionOperations.closeSession(SessionOperations.java:191)
        at io.quarkus.hibernate.reactive.panache.common.runtime.SessionOperations.lambda$withSessionOnDemand$0(SessionOperations.java:73)
        at io.smallrye.mutiny.Uni.lambda$eventually$1(Uni.java:580)
        at io.smallrye.context.impl.wrappers.SlowContextualBiFunction.apply(SlowContextualBiFunction.java:21)
        at io.smallrye.mutiny.groups.UniOnItemOrFailure.lambda$call$2(UniOnItemOrFailure.java:68)
        at io.smallrye.context.impl.wrappers.SlowContextualBiFunction.apply(SlowContextualBiFunction.java:21)
        at io.smallrye.mutiny.operators.uni.UniOnItemOrFailureFlatMap$UniOnItemOrFailureFlatMapProcessor.performInnerSubscription(UniOnItemOrFailureFlatMap.java:86)
        at io.smallrye.mutiny.operators.uni.UniOnItemOrFailureFlatMap$UniOnItemOrFailureFlatMapProcessor.onItem(UniOnItemOrFailureFlatMap.java:54)
        at io.smallrye.mutiny.operators.uni.UniOnItemTransform$UniOnItemTransformProcessor.onItem(UniOnItemTransform.java:43)
        at io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription.forwardResult(UniCreateFromCompletionStage.java:63)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallMetricCollectionStage.lambda$execute$0(AsyncApiCallMetricCollectionStage.java:56)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallTimeoutTrackingStage.lambda$execute$2(AsyncApiCallTimeoutTrackingStage.java:69)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:177)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:105)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.completeResponseFuture(MakeAsyncHttpRequestStage.java:238)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:163)
        at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

How to Reproduce?

With this repro : https://github.com/scrocquesel/quarkus-quickstarts/tree/s3_with_panache

  1. inside amazon-s3-quickstart
  2. quarkus dev
  3. curl http://localhost:8080/async-s3.html
    (4. you can also try to upload a file)

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@scrocquesel scrocquesel added the kind/bug Something isn't working label Jul 11, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Jul 11, 2023

/cc @FroMage (panache), @loicmathieu (panache)

@geoand
Copy link
Contributor

geoand commented Jul 12, 2023

cc @Sgitario as well

@mkouba
Copy link
Contributor

mkouba commented Jul 12, 2023

Could it be possible to narrow the detection of panache usage per method in a resource class.

I'm afraid that it would be too costly. Currently, we only inspect the contstant pool to detect the usage of an entity (and repository after #34671 is merged).

Still, I'm trying to understand the problem. It seems that there's no current Vert.x context when the attempt to close the session... Maybe we should use the captured context instead. @cescoffier WDYT?

@mkouba mkouba changed the title No current Vertx context found with reactive resteasy and panache Amazon Services: no current Vertx context found for an S3 call with reactive resteasy and panache reactive Jul 12, 2023
@cescoffier
Copy link
Member

I think all the s3 callbacks will be invoked on random thread (random in the unmanaged sense), which mean: no context. That could be the issue.

Even with context propagation, it would not work as the CS must be decorated by context propagation.

Is there a way to specify on which executor the completion stages from the AWS sdk are resolved?

@mkouba
Copy link
Contributor

mkouba commented Jul 12, 2023

Is there a way to specify on which executor the completion stages from the AWS sdk are resolved?

I think that we need someone from the io.quarkiverse.amazonservices:quarkus-amazon-services team...

@gsmet
Copy link
Member

gsmet commented Jul 12, 2023

@scrocquesel who created the issue is the maintainer of the Amazon Services extensions so we should be good :)

@scrocquesel
Copy link
Contributor Author

I think all the s3 callbacks will be invoked on random thread (random in the unmanaged sense), which mean: no context. That could be the issue.

Even with context propagation, it would not work as the CS must be decorated by context propagation.

Is there a way to specify on which executor the completion stages from the AWS sdk are resolved?

@cescoffier I recently changed that so that it can now terminate on the Executor provided by io.quarkus.deployment.builditem.ExecutorBuildItem. It is a client configuration and AFAIK you can't override this on a per request basis. The client is configured no matter the context in which it will be consumed as a singleton.

The code looks like this right now

    @GET
    public Uni<List<FileObject>> listFiles() {
        ListObjectsRequest listRequest = ListObjectsRequest.builder()
                .bucket(bucketName)
                .build();
        return Uni.createFrom().completionStage(() -> s3.listObjects(listRequest))
                .onItem().transform(result -> toFileItems(result));
    }

I tried something like

        var e = ContextInternal.current().executor();
        return Uni.createFrom().completionStage(() -> s3.listObjects(listRequest)).emitOn(e)
                .onItem().transform(result -> toFileItems(result));

but it didn't work.

@cescoffier
Copy link
Member

Ok, this way it cannot go back to the context.

We need to capture it and restore it.

So, there are multiple ways to do this:

  • we can decorate the executor to capture the context and restore it
  • we can use the next-to-be vertx utilities which will provide an executor doing ^ (CC @jponge)

We can do the first approach for now. If someone tell me where to look I can have a look (next week unfortunately).

@cescoffier
Copy link
Member

cescoffier commented Jul 12, 2023

I need to check one think about your second approach. I believe it's retrouve the netty executor, not the duplicated context one.

So you execute on the root context, not on the duplicated context.

@scrocquesel
Copy link
Contributor Author

scrocquesel commented Jul 12, 2023

Ok, this way it cannot go back to the context.

We need to capture it and restore it.

So, there are multiple ways to do this:

  • we can decorate the executor to capture the context and restore it
  • we can use the next-to-be vertx utilities which will provide an executor doing ^ (CC @jponge)

We can do the first approach for now. If someone tell me where to look I can have a look (next week unfortunately).
This is where the executor is set on the client and you can test with this integration test.

I need to check one think about your second approach. I believe it's retrouve the netty executor, not the duplicated context one.

So you execute on the root context, not on the duplicated context.

With

@GET
    public Uni<List<FileObject>> listFiles() {
        ListObjectsRequest listRequest = ListObjectsRequest.builder()
                .bucket(bucketName)
                .build();

        System.out.println("capture isOnVertxThread:" + Context.isOnVertxThread());
        System.out.println("capture isOnEventLoopThread:" + Context.isOnEventLoopThread());
        System.out.println("capture isOnWorkerThread:" + Context.isOnWorkerThread());
        var e = ContextInternal.current().executor();
        return Uni.createFrom().completionStage(() -> s3.listObjects(listRequest)).emitOn(e)
                .onItem().transform(result -> {
                    System.out.println("transform isOnVertxThread:" + Context.isOnVertxThread());
                    System.out.println("transform isOnEventLoopThread:" + Context.isOnEventLoopThread());
                    System.out.println("transform isOnWorkerThread:" + Context.isOnWorkerThread());   
                    return toFileItems(result);
                });
    }
2023-07-12 22:03:48,064 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-1) capture isOnVertxThread:true
2023-07-12 22:03:48,066 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-1) capture isOnEventLoopThread:true
2023-07-12 22:03:48,068 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-1) capture isOnWorkerThread:false
2023-07-12 22:03:48,113 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-1) transform isOnVertxThread:true
2023-07-12 22:03:48,114 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-1) transform isOnEventLoopThread:true
2023-07-12 22:03:48,114 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-1) transform isOnWorkerThread:false
2023-07-12 22:03:48,116 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-1) HTTP Request to /async-s3 failed, error id: dc954144-2a20-435e-b2f1-d6c6f197fedc-3: java.lang.IllegalStateException: No current Vertx context found

and without the emitOn(e)

2023-07-12 22:03:11,931 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-2) capture isOnVertxThread:true
2023-07-12 22:03:11,932 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-2) capture isOnEventLoopThread:true
2023-07-12 22:03:11,934 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (vert.x-eventloop-thread-2) capture isOnWorkerThread:false
2023-07-12 22:03:11,971 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (executor-thread-1) transform isOnVertxThread:true
2023-07-12 22:03:11,973 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (executor-thread-1) transform isOnEventLoopThread:false
2023-07-12 22:03:11,974 INFO  [sof.ama.aws.ser.s3.int.res.S3Resource] (executor-thread-1) transform isOnWorkerThread:true
2023-07-12 22:03:11,977 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-2) HTTP Request to /async-s3 failed, error id: dc954144-2a20-435e-b2f1-d6c6f197fedc-2: java.lang.IllegalStateException: No current Vertx context found

I also tried with var e = ((ContextInternal) VertxContext.getOrCreateDuplicatedContext(ContextInternal.current())).executor();

@scrocquesel
Copy link
Contributor Author

I edit my previous comment to provide log trace with thread names

@cescoffier
Copy link
Member

Thread names are not enough, duplicated contexts are not threads.

You cannot use var e = ContextInternal.current().executor(); - this is running on the root context, not on the duplicated one.

You need to do var e = runnable -> captured.runOnContext(runnable)

@scrocquesel
Copy link
Contributor Author

Thread names are not enough, duplicated contexts are not threads.

You cannot use var e = ContextInternal.current().executor(); - this is running on the root context, not on the duplicated one.

You need to do var e = runnable -> captured.runOnContext(runnable)

Thanks, to compile I had to do

Context context = Vertx.currentContext();
Executor e = command -> context.runOnContext(x -> command.run());

I guess the sample at

[source, java]
----
public Uni<String> invoke() {
Context context = Vertx.currentContext();
return invokeRemoteService()
.emitOn(runnable -> context.runOnContext(runnable)); <1>
}
----
is also not correct or does runOnContext recently changed ?

Is there an issue tracking the "next-to-be vertx utilities which will provide an executor doing ^" ?
I'll try to hook that directly in the extension but it is not using Vert.x and I would rather not add a dependency only for that.

@scrocquesel
Copy link
Contributor Author

scrocquesel commented Jul 14, 2023

Trying to wrap the Executor is not working. By the time, the completion stage is executed, aws netty client has already moved to another thread and the Vertx.currentContext() is null in the executor execute method.

https://github.com/aws/aws-sdk-java-v2/blob/4639d41ee6bc835ea2e53bb27ebb0d977c292d6c/http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/internal/NettyRequestExecutor.java#L91-L97

    public CompletableFuture<Void> execute() {
        Promise<Channel> channelFuture = context.eventLoopGroup().next().newPromise();
        executeFuture = createExecutionFuture(channelFuture);
        acquireChannel(channelFuture);
        channelFuture.addListener((GenericFutureListener) this::makeRequestListener);
        return executeFuture;
    }

where context.eventLoopGroup() is the EventLoopGroup supplied by EventLoopGroupBuildItem`.

@cescoffier
Copy link
Member

You cannot use the netty executor or the EventLoopGroupBuildItem - because you will always have a root context, and not a duplicated context. Thus it breaks the propagation.

You will have only one way or another to use Vert.x.

The PR adding the next-to-be executor is this one: smallrye/smallrye-mutiny-vertx-bindings#733

@manor-parmar
Copy link

manor-parmar commented Sep 5, 2023

try to do as below, capture the context and than restore it while emitting the completionStage

// capture context Context context = Vertx.currentContext(); Uni.createFrom() .completionStage( () -> { FileUpload fileUpload = s3TransferManager.uploadFile(...); return fileUpload.completionFuture(); }) .emitOn(command -> context.runOnContext(x -> command.run())) // Restore the context .onItem() .invoke( i -> { var status = i.response(); log.info("s3 upload response:{}", status); }) .onFailure() .invoke(e -> log.error("error:", e)) .eventually(...);

@adampoplawski
Copy link

adampoplawski commented Oct 21, 2023

Hello
We have a similar problem. After reactive transactional method with Panache, the next flatmap that sends data to SNS gives "No current Vertx context found". We are using Quarkus 3.4. Is there a recommended walkaround?

Currently, we have

repository.transactionalMethod()
.flatMap{ service.useSns() }

walkaround we are using

val context = Vertx.currentContext()
repository.transactionalMethod()
.flatMap{ service.useSns() }
.emitOn { context.runOnContext { it.run() } }

@FroMage
Copy link
Member

FroMage commented Oct 24, 2023

Losing the Vert.x context is problematic, because CP doesn't propagate it, and indeed, delegates to it. How can we fix this @cescoffier @mkouba?

@cescoffier
Copy link
Member

Unfortunately, there is no magic solution. You need to capture the context and restore it to emit the result produced by the completion stage of AWS.

@FroMage
Copy link
Member

FroMage commented Oct 24, 2023

Well, there could be magic, because Mutiny uses CP, and if CP captured the current vert.x context, it could propagate it :)

@markusdlugi
Copy link
Contributor

markusdlugi commented Mar 14, 2024

We have actually been using the workaround mentioned above of capturing & restoring the context via emitOn for quite some time with the SqsAsyncClient. However, while it works at first glance, we just noticed that this breaks down badly once there is any kind of parallelism in the system, e.g., other REST requests being processed simultaneously and forcing Vertx to schedule stuff on different threads. We end up seeing:

java.lang.IllegalStateException: Illegal pop() with non-matching JdbcValuesSourceProcessingState

We also tried using the suggested context-aware scheduler as follows:

SqsAsyncClient.builder().asyncConfiguration(
                b -> b.advancedOption(SdkAdvancedAsyncClientOption.FUTURE_COMPLETION_EXECUTOR,
                        ContextAwareScheduler.delegatingTo(Infrastructure.getDefaultWorkerPool())
                                .withGetOrCreateContext(vertx)));

But that also only works if there is no parallelism, with other requests running in parallel we observe

java.lang.IllegalStateException: HR000069: Detected use of the reactive Session from a different Thread than the one which was used to open the reactive Session - this suggests an invalid integration; original thread [164]: 'vert.x-eventloop-thread-4' current Thread [161]: 'vert.x-eventloop-thread-1'

Only "workaround" we found for now, if you want to call it that, is to set quarkus.vertx.event-loops-pool-size to 1. But I'm not sure it's 100% bullet-proof, and it doesn't seem like a good idea to put your application on crutches like that just to work around the AWS/Vertx incompatibility.

Any ideas how we could properly work around these issues?

Edit: I can kinda work around the last exception I got with the ContextAwareScheduler using -Dorg.hibernate.reactive.common.InternalStateAssertions.ENFORCE=false, but that feels wrong and I'm wondering if there isn't a better way to make Vertx return to the original thread?

@markusdlugi
Copy link
Contributor

For completeness sake: we investigated the issues described above more thoroughly and it was actually related to the context being lost by the Quarkus Cache which we were using to reduce the number of calls to getQueueUrl. So it was actually not related to the AWS integration. I opened #39515 for that.

@scrocquesel
Copy link
Contributor Author

Is ContextAwareScheduler.delegatingTo(Infrastructure.getDefaultWorkerPool()).withGetOrCreateContext(vertx) the correct approach?

In the Quarkus AWS extension, I inject the executor provided by the ExecutorBuildItem. The only issue I encounter was to deal with the TCCL because it is lost when using the AWS CRT-based HTTP client, which utilizes JNI. Thus, in Dev/Test mode, I wrap it with an executor that captures the TCCL.

Still, I am seeking a solution to capture the Vert.x context and restore it. One potential solution could involve having an AWS ExecutorInterceptor that captures the context before the request and restores it after the request when back on the thread provided by the configured thread executor. However, as far as I know, the Vert.x context can only be restored by switching to another thread, which is not feasible in this case.

@jponge
Copy link
Member

jponge commented Mar 25, 2024

Is ContextAwareScheduler.delegatingTo(Infrastructure.getDefaultWorkerPool()).withGetOrCreateContext(vertx) the correct approach?

I'd say so

@markusdlugi
Copy link
Contributor

Is ContextAwareScheduler.delegatingTo(Infrastructure.getDefaultWorkerPool()).withGetOrCreateContext(vertx) the correct approach?

I'd say so

From my tests, I think that this unfortunately doesn't work as expected, or at least I was unable to configure it correctly. The configuration I mentioned above with the ContextAwareScheduler would in fact result in the context not being preserved, regardless of whether using the AWS CRT-based client or the Netty NIO client or whether I was using Infrastructure.getDefaultWorkerPool() or vertx.nettyEventLoopGroup(). We got some dedicated unit tests for ensuring the Vertx context is preserved in our application now, and while they're all green with the emitOn approach, I can't get it to work with the ContextAwareScheduler.

Maybe someone who has a working configuration with ContextAwareScheduler could show how it's done, because I'd actually love to get rid of the tedious emitOn workaround. While that one is working reliably (if you make sure to wrap all CompletionStage usages properly cough cough), it's certainly cumbersome and error-prone...

@jponge
Copy link
Member

jponge commented Mar 25, 2024

ScheduledExecutorService supports different strategies to specify which Vert.x context shall be used. The key for withGetOrCreateContext to work is to make sure it gets called from a thread that has the Vert.x context that you need, otherwise it will create a new context, but it may not have the correct context for propagation to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/panache kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants