[BUG] EventHubs EventProcessorClient doesn't recover event receiving from a partition for about 30 minutes #19753
Labels
blocking-customer-adoption
Issue is blocking the migration from Track 1 to Track 2
Client
This issue points to a problem in the data-plane of the library.
Event Grid
Milestone
Describe the bug
A customer who have been evaluating EventHubs Java SDK reported that they found a partition might not process events for long time then it recovered.
The problem reproduced in my test. My test consumes events from a 16-partition EventHub with one event processor client. Occasionally a partition connection / link is detatched. In the following occurence, the SDK stopped processing any events from partition 10 for 26 minutes. The following is the summary of what happened in my test. I've more detailed logs.
Line 757582: 2021-02-22 18:30:55,191 INFO TestEventHubHandler [boundedElastic-26] Processing event from partition 10 with sequence number 257489, offset 34369397888, and enqueued time 2021-02-22T10:25:01.279Z
Line 757630: 2021-02-22 18:30:55,292 INFO TestEventHubHandler [boundedElastic-26] send 1 events to Model Matching Stream
Line 757638: 2021-02-22 18:30:55,298 INFO TestEventHubHandler [boundedElastic-26] execution time = 107
Line 757639: 2021-02-22 18:30:55,298 INFO TestEventHubHandler [boundedElastic-26] Begin updateCheckpoint time = 1614018655298
Line 757651: 2021-02-22 18:30:55,306 INFO TestEventHubHandler [boundedElastic-26] Finish updateCheckpoint time = 1614018655306
2021-02-22 18:56:29,619 INFO TestEventHubHandler [boundedElastic-311] Processing event from partition 10 with sequence number 257490, offset 34369398968, and enqueued time 2021-02-22T10:25:01.295Z
a. 2021-02-22 18:48:21,172 INFO c.a.c.a.i.ReactorReceiver [parallel-2] connectionId[MF_21b512_1613987026165], path[test-activity/ConsumerGroups/$default/Partitions/10], linkName[10_2462b4_1613987026166] - tokenRenewalFailure[ReactorDispatcher instance is closed.]
b. 2021-02-22 18:55:21,965 INFO c.a.c.a.i.h.ReceiveLinkHandler [reactor-executor-12] onLinkRemoteClose connectionId[MF_21b512_1613987026165], linkName[10_2462b4_1613987026166], errorCondition[amqp:link:detach-forced], errorDescription[The link '10_2462b4_1613987026166' is force detached. Code: RenewToken. Details: Unauthorized access. 'Listen' claim(s) are required to perform this operation. Resource: 'sb://yijxie-eventhubs.servicebus.windows.net/test-activity/consumergroups/$default/partitions/10'.. TrackingId:45d6b49d67fe42e3b87b29866a348933_G7, SystemTracker:gateway5, Timestamp:2021-02-22T18:55:21]
c. 2021-02-22 18:55:21,967 INFO c.a.c.a.i.h.ReceiveLinkHandler [reactor-executor-12] processOnClose connectionId[MF_21b512_1613987026165], linkName[10_2462b4_1613987026166], errorCondition[amqp:link:detach-forced], errorDescription[The link '10_2462b4_1613987026166' is force detached. Code: RenewToken. Details: Unauthorized access. 'Listen' claim(s) are required to perform this operation. Resource: 'sb://yijxie-eventhubs.servicebus.windows.net/test-activity/consumergroups/$default/partitions/10'.. TrackingId:45d6b49d67fe42e3b87b29866a348933_G7, SystemTracker:gateway5, Timestamp:2021-02-22T18:55:21]
d. 2021-02-22 18:55:21,967 INFO c.a.c.a.i.ReactorSession [reactor-executor-12] linkName[10_2462b4_1613987026166] entityPath[test-activity/ConsumerGroups/$default/Partitions/10]: Error occurred. Removing receive link.
e. 2021-02-22 18:55:21,967 ERROR c.a.c.a.i.ReactorDispatcher [reactor-executor-12] ReactorDispatcher instance is closed.
f. 2021-02-22 18:55:21,967 WARN c.a.c.a.i.ReactorExecutor [reactor-executor-12] connectionId[MF_21b512_1613987026165], message[Unhandled exception while processing events in reactor, report this error.]
g. java.util.concurrent.RejectedExecutionException: ReactorDispatcher instance is closed.
Exception or Stack Trace
Contact me to get the detailed logs.
To Reproduce
Steps to reproduce the behavior:
Run a long test with with 16 partitions.
Code Snippet
Contact me to get the code snippet.
Expected behavior
It's normal that the service closes a connection/link occasionally for some reasons. The client SDK should detect this disconnection/broken link immediately and recover event consuming.
Screenshots
None
Setup (please complete the following information):
Additional context
Add any other context about the problem here.
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
The text was updated successfully, but these errors were encountered: