-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Messages from few partitions are getting delayed #18467
Comments
Thank you for your feedback. Tagging and routing to the team best able to assist. |
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @samuelkoppes. Issue DetailsWe are using Microsoft.Azure.EventHubs.Processor 4.2.0 version on .net core 3.1 running as a pod on Openshift cluster Quite often we are seeing issues where the pod is not able to receive message from few partition where as it is able to pull messages from other partitions. The delay is as high as 10 minutes at times. This gets auto resolved and we get all the messages in burst fashion. But we don't have visibility on why is there a huge delay on few partitions. Is there a trace or log that can show us what is happening behind the scenes while polling the eventhub? Is this the same issue that is fixed as part of #12691 in the latest version Microsoft.Azure.EventHubs.Processor 4.3.1
|
Update: One more thing that we observed: Say pod A is holding the lease to partition 1. The lease ownership changes to Pod B if Pod A is not successfully renewing the least but Pod B is not pulling messages even if the ownership is with it. Later Pod A reclaims the lease back and successfully starts processing from partition 1 again. For the entire time when PodB was holding the lease no messages were processed and hence the latency. Is this a known issue in v4 client and have we addressed this in v5 client? |
I have investigated very similar issue with another customer on K8s and that turned to be a downstream write stuck issue. Can you please trace in ProcessEventsAsync code as in and out? See if you have a blocking call which is causing the stuck behavior. Each 'in' should have a corresponding 'out' w/ reasonable delay. |
Hi, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you! |
We are using Microsoft.Azure.EventHubs.Processor 4.2.0 version on .net core 3.1 running as a pod on Openshift cluster
We have 8 pods listening to eventhub with 32 partitions.
Quite often we are seeing issues where the pod is not able to receive message from few partition where as it is able to pull messages from other partitions. The delay is as high as 10 minutes at times. This gets auto resolved and we get all the messages in burst fashion. But we don't have visibility on why is there a huge delay on few partitions. Is there a trace or log that can show us what is happening behind the scenes while polling the eventhub?
Is this the same issue that is fixed as part of #12691 in the latest version Microsoft.Azure.EventHubs.Processor 4.3.1
The text was updated successfully, but these errors were encountered: