-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUERY] Azure.Messaging.EventHub PartitionReceiver/EventHubConsumerClient High CPU Usage? #21099
Comments
Could this be related? Microsft.Azure.Amqp: lock contention is extremely high when the request rate is high |
It's not possible to make any definitive statements with the current context and available information, so the best that I can do is generalize and speculate a bit. Generally speaking, it sounds like you've hit a point where you're doing too much on a single machine and should consider a different distribution of work. The best practice for consuming from many partitions at a high rate is to spread out partitions among different machines. Each partition requires a dedicated AMQP link to read from the service and how that link is managed differs between the clients:
It's difficult to say what the optimal number of partitions to a given machine is, as it will vary quite a bit by the size of the machine, the size of events, the work being done, the hosting environment, and other factors. My advice would be to try starting with 2-4 partitions per CPU thread and then measuring and experimenting to tune from there. Some other potential things that you can consider:
Some wider speculation on causes:
|
Thank you @jsquire, appreciate the detailed response which provided an excellent starting point for my investigation. Here's a couple follow up observation:
Using PerfView, I dug a bit deeper into the .Net5 and .NetFramework differences for the The inclusion time for this method call wasn't so large when using Could such a large gap be tied to the allocation and networking improvements of .Net 5 like you mentioned? We have some scenarios that read from a remote event hubs (compute in west coast, event hub in east coast) that benefit greatly from the |
It is very possible and, in this case quite likely. What you're observing is the same code running on two different host frameworks with different performance characteristics. We use very few compiler constant branches to sniff frameworks and those are intended only to work around compatibility issues. There was a very high amount of effort put into reducing allocations in .NET 5 and a large focus on performance tuning the network components used by ASP.NET. Though it's a bit outdated by now, this blog post by Stephen Toub highlights some of the significant areas where were further improved since. I don't want to link to non-authoritative sources, but there are plenty of more recent articles around performance testing. The .NET team may have additional resources to share if you decide to reach out to them directly.
The AMQP library was developed alongside the Azure Messaging services and was written in a time less allocation-focused. There are definitely code paths within that could be improved, but the networking primitives that it uses are provided by .NET itself. In the case of web sockets, the |
Thanks for the detailed explanation and references! I got help parsing the traces today and observed that a lot of the extra time spent by the .Net Framework version is in |
Context
When consuming from a dedicated event hub containing more than 200 partitions and using an instance of the SDK's
PartitionReceiver
withTransportType = EventHubsTransportType.AmqpWebSockets
andTrackLastEnqueuedEventProperties = true
for each partition. I've observed high CPU usage when a single box consumes from a large number of partitions and high event rate (9k - 70k events per second).Query/Question
What's the best practice when consuming from many partitions and/or high event rate? I ran the same test targeting .Net 5 and it was not an issue, CPU at 20% while consuming ~50k events per second.
Environment:
The text was updated successfully, but these errors were encountered: