Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] EventHubs EventProcessorClient doesn't recover event receiving from a partition for about 30 minutes #19753

Closed
YijunXieMS opened this issue Mar 10, 2021 · 0 comments
Assignees
Labels
blocking-customer-adoption Issue is blocking the migration from Track 1 to Track 2 Client This issue points to a problem in the data-plane of the library. Event Grid
Milestone

Comments

@YijunXieMS
Copy link
Contributor

Describe the bug

A customer who have been evaluating EventHubs Java SDK reported that they found a partition might not process events for long time then it recovered.
The problem reproduced in my test. My test consumes events from a 16-partition EventHub with one event processor client. Occasionally a partition connection / link is detatched. In the following occurence, the SDK stopped processing any events from partition 10 for 26 minutes. The following is the summary of what happened in my test. I've more detailed logs.

  1. Partition 10 stops receiving after the following event processed at 02/22 18:30:55. The event handler isn’t blocked by the event handler because updateCheckpoint log is printed out.
    Line 757582: 2021-02-22 18:30:55,191 INFO TestEventHubHandler [boundedElastic-26] Processing event from partition 10 with sequence number 257489, offset 34369397888, and enqueued time 2021-02-22T10:25:01.279Z
    Line 757630: 2021-02-22 18:30:55,292 INFO TestEventHubHandler [boundedElastic-26] send 1 events to Model Matching Stream
    Line 757638: 2021-02-22 18:30:55,298 INFO TestEventHubHandler [boundedElastic-26] execution time = 107
    Line 757639: 2021-02-22 18:30:55,298 INFO TestEventHubHandler [boundedElastic-26] Begin updateCheckpoint time = 1614018655298
    Line 757651: 2021-02-22 18:30:55,306 INFO TestEventHubHandler [boundedElastic-26] Finish updateCheckpoint time = 1614018655306
  2. The next log for receiving an event from this partition is 02/22 18:56:29. There is a 26-minute gap between two events.
    2021-02-22 18:56:29,619 INFO TestEventHubHandler [boundedElastic-311] Processing event from partition 10 with sequence number 257490, offset 34369398968, and enqueued time 2021-02-22T10:25:01.295Z
  3. “ReactorDispatcher instance is closed” happened between the above mentioned two events.
    a. 2021-02-22 18:48:21,172 INFO c.a.c.a.i.ReactorReceiver [parallel-2] connectionId[MF_21b512_1613987026165], path[test-activity/ConsumerGroups/$default/Partitions/10], linkName[10_2462b4_1613987026166] - tokenRenewalFailure[ReactorDispatcher instance is closed.]
    b. 2021-02-22 18:55:21,965 INFO c.a.c.a.i.h.ReceiveLinkHandler [reactor-executor-12] onLinkRemoteClose connectionId[MF_21b512_1613987026165], linkName[10_2462b4_1613987026166], errorCondition[amqp:link:detach-forced], errorDescription[The link '10_2462b4_1613987026166' is force detached. Code: RenewToken. Details: Unauthorized access. 'Listen' claim(s) are required to perform this operation. Resource: 'sb://yijxie-eventhubs.servicebus.windows.net/test-activity/consumergroups/$default/partitions/10'.. TrackingId:45d6b49d67fe42e3b87b29866a348933_G7, SystemTracker:gateway5, Timestamp:2021-02-22T18:55:21]
    c. 2021-02-22 18:55:21,967 INFO c.a.c.a.i.h.ReceiveLinkHandler [reactor-executor-12] processOnClose connectionId[MF_21b512_1613987026165], linkName[10_2462b4_1613987026166], errorCondition[amqp:link:detach-forced], errorDescription[The link '10_2462b4_1613987026166' is force detached. Code: RenewToken. Details: Unauthorized access. 'Listen' claim(s) are required to perform this operation. Resource: 'sb://yijxie-eventhubs.servicebus.windows.net/test-activity/consumergroups/$default/partitions/10'.. TrackingId:45d6b49d67fe42e3b87b29866a348933_G7, SystemTracker:gateway5, Timestamp:2021-02-22T18:55:21]
    d. 2021-02-22 18:55:21,967 INFO c.a.c.a.i.ReactorSession [reactor-executor-12] linkName[10_2462b4_1613987026166] entityPath[test-activity/ConsumerGroups/$default/Partitions/10]: Error occurred. Removing receive link.
    e. 2021-02-22 18:55:21,967 ERROR c.a.c.a.i.ReactorDispatcher [reactor-executor-12] ReactorDispatcher instance is closed.
    f. 2021-02-22 18:55:21,967 WARN c.a.c.a.i.ReactorExecutor [reactor-executor-12] connectionId[MF_21b512_1613987026165], message[Unhandled exception while processing events in reactor, report this error.]
    g. java.util.concurrent.RejectedExecutionException: ReactorDispatcher instance is closed.

Exception or Stack Trace
Contact me to get the detailed logs.

To Reproduce
Steps to reproduce the behavior:
Run a long test with with 16 partitions.

Code Snippet
Contact me to get the code snippet.

Expected behavior
It's normal that the service closes a connection/link occasionally for some reasons. The client SDK should detect this disconnection/broken link immediately and recover event consuming.

Screenshots
None

Setup (please complete the following information):

  • OS: Azure VM Standard DS2 v2 (2 vcpus, 7 GiB memory). Ubuntu 18.04
  • IDE : [e.g. IntelliJ]
  • Version of the Library used: azure-messaging-eventhubs 5.5.0

Additional context
Add any other context about the problem here.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [ x] Bug Description Added
  • [ x] Repro Steps Added
  • [ x] Setup information Added
@YijunXieMS YijunXieMS added Event Grid Client This issue points to a problem in the data-plane of the library. blocking-customer-adoption Issue is blocking the migration from Track 1 to Track 2 labels Mar 10, 2021
@YijunXieMS YijunXieMS added this to the [2021] April milestone Mar 10, 2021
@conniey conniey closed this as completed Mar 10, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocking-customer-adoption Issue is blocking the migration from Track 1 to Track 2 Client This issue points to a problem in the data-plane of the library. Event Grid
Projects
None yet
Development

No branches or pull requests

2 participants