Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ReactorDispatcher instance is closed. Connection is lost #19698

Closed
the-mod opened this issue Mar 8, 2021 · 13 comments
Closed

[BUG] ReactorDispatcher instance is closed. Connection is lost #19698

the-mod opened this issue Mar 8, 2021 · 13 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing)

Comments

@the-mod
Copy link

the-mod commented Mar 8, 2021

Describe the bug
I am using azure-messaging-eventhubs:5.4 to span EventProcessingHost Instances on multiple Nodes and facing this Issue after the Services is running fine for several days.
My Apps then doesn't cosume Events anymore.
It seams something is off using the AccessToken (see second Exception), I also don't have Key Rotation applied on my Eventhubs.

Exception or Stack Trace

java.util.concurrent.RejectedExecutionException:
   at com.azure.core.amqp.implementation.ReactorDispatcher.throwIfSchedulerError (ReactorDispatcher.java93)
   at com.azure.core.amqp.implementation.ReactorDispatcher.invoke (ReactorDispatcher.java68)
   at com.azure.core.amqp.implementation.ReactorReceiver.dispose (ReactorReceiver.java190)
   at com.azure.core.amqp.implementation.ReactorSession$LinkSubscription.dispose (ReactorSession.java555)
   at com.azure.core.amqp.implementation.ReactorSession.lambda$dispose$1 (ReactorSession.java149)
   at java.util.concurrent.ConcurrentHashMap.forEach
   at com.azure.core.amqp.implementation.ReactorSession.dispose (ReactorSession.java149)
   at com.azure.core.amqp.implementation.ReactorConnection$SessionSubscription.dispose (ReactorConnection.java421)
   at com.azure.core.amqp.implementation.ReactorConnection.removeSession (ReactorConnection.java322)
   at com.azure.core.amqp.implementation.ReactorConnection.dispose (ReactorConnection.java266)
   at com.azure.core.amqp.implementation.ReactorConnection.dispose (ReactorConnection.java250)
   at com.azure.messaging.eventhubs.implementation.EventHubReactorAmqpConnection.dispose (EventHubReactorAmqpConnection.java144)
   at com.azure.core.amqp.implementation.AmqpChannelProcessor.close (AmqpChannelProcessor.java308)
   at com.azure.core.amqp.implementation.AmqpChannelProcessor.setAndClearChannel (AmqpChannelProcessor.java297)
   at com.azure.core.amqp.implementation.AmqpChannelProcessor.lambda$onNext$2 (AmqpChannelProcessor.java103)
   at reactor.core.publisher.LambdaSubscriber.onError (LambdaSubscriber.java149)
   at reactor.core.publisher.FluxReplay$SizeBoundReplayBuffer.replayNormal (FluxReplay.java802)
   at reactor.core.publisher.FluxReplay$SizeBoundReplayBuffer.replay (FluxReplay.java898)
   at reactor.core.publisher.ReplayProcessor.onError (ReplayProcessor.java440)
   at com.azure.core.amqp.implementation.ReactorConnection$ReactorExceptionHandler.onConnectionError (ReactorConnection.java380)
   at com.azure.core.amqp.implementation.ReactorExecutor.run (ReactorExecutor.java126)
   at reactor.core.scheduler.SchedulerTask.call (SchedulerTask.java68)
   at reactor.core.scheduler.SchedulerTask.call (SchedulerTask.java28)
   at java.util.concurrent.FutureTask.run
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run
   at java.util.concurrent.ThreadPoolExecutor.runWorker
   at java.util.concurrent.ThreadPoolExecutor$Worker.run
   at java.lang.Thread.run

The SDK itself logs (com.azure.core.util.logging.ClientLogger) this message (no Stacktrace available) comming from com.azure.core.amqp.implementation.ActiveClientTokenManager:

Error occurred while refreshing token that is not retriable. Not scheduling refresh task. Use ActiveClientTokenManager.authorize() to schedule task again. audience[amqp://host/eh/ConsumerGroups/cg/Partitions/9] scopes[amqp://host/eh/ConsumerGroups/cg/Partitions/9]
ReactorDispatcher instance is closed.

To Reproduce
Implement the EventProcessorHost and wait

Setup (please complete the following information):

  • OS: Win10
  • Version of the Library used: 5.4
@joshfree joshfree added Client This issue points to a problem in the data-plane of the library. Event Hubs bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing) labels Mar 8, 2021
@joshfree
Copy link
Member

joshfree commented Mar 8, 2021

Thanks for filing this azure-messaging-eventhubs github issue, @the-mod. @srnagar / @conniey could you please follow up?

@conniey
Copy link
Member

conniey commented Mar 10, 2021

Related or this is a duplicate of #19753. We should appropriately close the link and recreate it though.

@the-mod
Copy link
Author

the-mod commented Mar 10, 2021

@conniey thanks for the info. I updated today all my apps to azure-messaging-eventhubs:5.5.0 in order to get rid of this. But as it seams the Problems still exist

@the-mod
Copy link
Author

the-mod commented Mar 15, 2021

@conniey Just to give you an update: its also happening with azure-messaging-eventhubs:5.5.0. Its also not only the EventProcessingHost I am using, it also happens with my EventHubProducerAsyncClient implementations.

It is giving me this error (w/o Stacktrace):

partitionId[null]: Sending messages timed out. ReactorDispatcher instance is closed.

coming from com.azure.core.amqp.implementation.RetryUtil

in combination with

ReactorDispatcher instance is closed.

coming from com.azure.core.amqp.implementation.ReactorDispatcher

The latest Error happens just now, before that on Saturday. So roughly 48h in between. Maybe it helps for the investigations

@the-mod
Copy link
Author

the-mod commented Mar 17, 2021

@conniey @srnagar any news on this?

@conniey
Copy link
Member

conniey commented Mar 18, 2021

Hey,

I am currently looking at another reliability issue #18070 which may be related to your issue. But I will get to this issue afterwards because we should be creating a new link if there is an auth issue.

@the-mod
Copy link
Author

the-mod commented Mar 23, 2021

@conniey thx for the update.

I just saw in my Logs, a partition was closed and opened again and again while erroring with this.
(Not sure if it is related to this Issue)

Exception: reactor.core.Exceptions$OverflowException

Message: The receiver is overrun by more signals than expected (bounded queue...)

Stacktrace:

reactor.core.Exceptions$OverflowException:
   at reactor.core.Exceptions.failWithOverflow (Exceptions.java221)
   at reactor.core.publisher.FluxWindowTimeout$WindowTimeoutSubscriber.onNext (FluxWindowTimeout.java233)
   at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext (FluxPeekFuseable.java203)
   at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext (FluxDoFinally.java123)
   at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync (FluxPublishOn.java439)
   at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run (FluxPublishOn.java526)
   at reactor.core.scheduler.WorkerTask.call (WorkerTask.java84)
   at reactor.core.scheduler.WorkerTask.call (WorkerTask.java37)
   at java.util.concurrent.FutureTask.run
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run
   at java.util.concurrent.ThreadPoolExecutor.runWorker
   at java.util.concurrent.ThreadPoolExecutor$Worker.run
   at java.lang.Thread.run

@conniey
Copy link
Member

conniey commented Apr 13, 2021

@the-mod

We released 5.7.0 this morning. This should fix the unauthorized issue. It will close the send link on an unauthorized and try reauthorizing next time you try to send a message by creating another link.

https://repo1.maven.org/maven2/com/azure/azure-messaging-eventhubs/5.7.0/
https://repo1.maven.org/maven2/com/azure/azure-messaging-eventhubs-checkpointstore-blob/1.6.0/

Cheers,
Connie

@the-mod
Copy link
Author

the-mod commented Apr 14, 2021

@conniey thank you for the Information. I will test it these days and provide Feedback.

@the-mod
Copy link
Author

the-mod commented Apr 15, 2021

@conniey
I replaced in my project
azure-messaging-eventhubs: 5.6.0 and azure-messaging-eventhubs-checkpointstore-blob: 1.5.0
with
azure-messaging-eventhubs: 5.7.0 and azure-messaging-eventhubs-checkpointstore-blob: 1.6.0
and unfortunately I see a performance decrease by max 50%.

I also testet
azure-messaging-eventhubs: 5.6.0 and azure-messaging-eventhubs-checkpointstore-blob: 1.6.0
as well as
azure-messaging-eventhubs: 5.7.0 and azure-messaging-eventhubs-checkpointstore-blob: 1.5.0
with the same decreasing performance.

I didn't dive into this yet in detail.
Would be nice if you can check this behavior on your site.

Best

@conniey
Copy link
Member

conniey commented Apr 15, 2021

Fascinating. We did not do any performance tests on both solutions but there were several race conditions fixed, so I can see that affecting it. It may be best to create a new bug to address this.

/cc @YijunXieMS

@conniey
Copy link
Member

conniey commented Apr 20, 2021

Performance investigations #20791

@conniey
Copy link
Member

conniey commented Apr 20, 2021

Created an issue to track the performance degradation. #20841

@conniey conniey closed this as completed Apr 20, 2021
azure-sdk pushed a commit to azure-sdk/azure-sdk-for-java that referenced this issue Sep 19, 2022
Network 2022 05 01 (Azure#20695)

* Adds base for updating Microsoft.Network from version stable/2022-01-01 to version 2022-05-01

* Updates readme

* Updates API version in new specs and examples

* add DisableTcpStateTracking on nic (Azure#19734)

Co-authored-by: Dan Tu <[email protected]>

* Adding new VIP Swap APIs to support extension resource design (Azure#19698)

* Adding new VIP Swap APIs to support extension resource design

* fixing validation checks

* minor fix

* Reverting changes

* Minor fix to add a new line

* fix description (Azure#19987)

Co-authored-by: Dan Tu <[email protected]>

* Add BillingType Property to ExpressRoutePorts (Azure#19932)

* change port json

* changes

* add Virtual network gateway policy group api (Azure#19766)

* finish update on connection configuration references

* resolve reference error

* finish prettier

* resolve sematic error

* finish update change

* enablePrivateLinkFastPath property for VirtualNetworkGatewayConnection resource (Azure#20050)

* Support for Per Rule Actions in Application Gateway WAF (Azure#20027)

* Adding changes for per rule actions

* Fixing spaces

* prettier fix

* Addressing PR comments

* Address review comments

* Add ColoLocation to ExpressRoutePort result (Azure#20033)

* Add ColoLocation to ExpressRoutePort result

* Fix automation code check error

* Merge from network-2022-05-01

* Revert "Merge from network-2022-05-01"

This reverts commit f0671a040b2bad684921f8f8ad4b53bb0f4b4a93.

Co-authored-by: Jianqi Zhao <[email protected]>

* DDoS Per IP SKU + Custom Policy Deprecation Swagger PR (Azure#19985)

* initial changes

* example file changes

* new files for new apis

* adding a period

* changing reference

* prettier fixes

* ddos fix

* lint diff fixes

* more lint diff errors

* fixing operation id

* changing bool to string enum

* example changes

* adding 202s as these are long running ops

* examples

Co-authored-by: Manas Chakka <[email protected]>

* Adding UnhealthyThreshold property to LB health probes, and updated examples with new property (Azure#20001)

* Property to enable/disable OCSP revocation check on client certificate (Azure#20301)

* add new property to enable/disable OCSP revocation check on client certificate for MTLS

* revert breaking change of updating enum property

* Add new parameter asn, geo and expressRouteAdvertise to CustomIPPrefix for new 2022-05-01 nrp sdk version (Azure#20266)

* fix

* fix

* fix

* fix

* fix

* fix

Co-authored-by: Weiheng Li <[email protected]>

* Network 2022-05-01 Azure Virtual Network Manager Contributions (Azure#20440)

* Add third NIP enum for avnm security admin config, AllowRulesOnly

* add optional top parameters to various network manager post APIs

* revise all example input subscriptions to all zero guid

* change all top parameter max value to 100

* Revert "change all top parameter max value to 100"

This reverts commit 12943d2f2b91b22f1dae232cb291a8551fedfdca.

Co-authored-by: Jared Gorthy <[email protected]>

* Alpelled waf manifest 05 01 2022 (Azure#20047)

* init

* init

* fix comments

* fix err

* try ref to appgw def

* try ref to appgw def

* try ref to appgw def

* try ref to appgw def

* update example

* update example

* update example

* update state and action

* update state and action

* change ruleset name and add tiers

* cammel case fix

* cammel case fix

* s_ruleid

* s_RuleId

* suppress s_RuleId camel case issue

* suppress s_RuleId camel case issue

* address comments

* address comments

* address comments

* fix typo

* cont fix

* cont fix

* cont fix

* cont fix

* cont fix

* add pageable

* add next link

Co-authored-by: Alon Pelled <[email protected]>

* Add prefixType property to CustomIPPrefix for NRP 2022-05-01 (Azure#20486)

* add prefixType parameter

* add example

* Adding swagger changes for routemaps feature (Azure#20091)

* initial changes for routemaps

* run npm prettier script

* adding examples

* fixing validation errors

* more validations

* more fixes

* trying to fix lint error

* fix for match condition

* renaming ActionType to fix lint

* fix for python sdk generation

* renaming Rule to RouteMapRule

* modify examples to remove vnetroutes from non hub vnet conns

* cleanup

* Added Upper-case transform (Azure#20674)

Co-authored-by: dtuCloud <[email protected]>
Co-authored-by: Dan Tu <[email protected]>
Co-authored-by: shnaya434 <[email protected]>
Co-authored-by: nimaller <[email protected]>
Co-authored-by: tyrannicrex <[email protected]>
Co-authored-by: Jesus Arango <[email protected]>
Co-authored-by: Sindhu Aluguvelli <[email protected]>
Co-authored-by: Jianqi Zhao <[email protected]>
Co-authored-by: Jianqi Zhao <[email protected]>
Co-authored-by: Manas Chakka <[email protected]>
Co-authored-by: Manas Chakka <[email protected]>
Co-authored-by: bhavanabheem <[email protected]>
Co-authored-by: biaogao <[email protected]>
Co-authored-by: Weiheng Li <[email protected]>
Co-authored-by: Weiheng Li <[email protected]>
Co-authored-by: jago2136 <[email protected]>
Co-authored-by: Jared Gorthy <[email protected]>
Co-authored-by: alon-microsoft <[email protected]>
Co-authored-by: Alon Pelled <[email protected]>
Co-authored-by: Eric Hoffmann <[email protected]>
Co-authored-by: arganapathy <[email protected]>
Co-authored-by: rahulbissa2727 <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing)
Projects
None yet
Development

No branches or pull requests

4 participants