-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClientClusterManifestProvider has serious performance problem,When there are multiple Silos and clients, it will cause network IO and CPU to increase by a multiple of the number of clients. #8722
Comments
Equivalent to adding a 500ms dead loop, continuously executing GetClusterManifest(), resulting in a large amount of invalid network IO. |
A dictionary of Manifest should be maintained to check for any version changes, and only perform version changes if there are any. At the same time, it should be the same version of Silo for comparison, rather than the versions of Silo polled in the cluster. Otherwise, the more Silos there are, the higher the network overhead and CPU usage. |
This is our modified code:
|
@ReubenBond We have submitted the PR, please review and merge it into the 7. x branch as soon as possible, and release a fixed version. Thank you. |
@benjaminpetit We have submitted the Optimized code, please review and merge it into the 7. x branch as soon as possible, and release a fixed version. Thank you. |
Did this only start with v7.2.3, or did you see it with v7.2.2 and prior as well? |
@ReubenBond |
There are serious performance problem with CPU, memory, and network IO in v7.2.2. |
In short, all versions above 7.0 have this problem. |
Thank you for your investigation and the PR. I am looking into this. I may push some changes to your PR branch before we merge it.
In other words, your PR fixes your issues, greatly reducing CPU usage for your hosts? |
Yes, the CPU and memory have all returned to the level of the previous 3. x version. After repair, everything is normal as before.
|
After fixing this issue, the CPU has been reduced from 18% to 5%. In addition, the memory of the client has also been reduced by half.
|
Thanks, I pushed changes to the PR and will probably merge this if our tests pass. Thank you again for the fix and the investigation. I hope we can give you a new release soon |
fixed dotnet#8722
fixed dotnet#8722
For example:
The cluster hava 3silos,3clients.
After the cluster is started, the CPU and network IO will rapidly increase, especially network IO, where the amount of data transmitted per second to ClusterManifest reaches 80MB.CPU is burdened by the network and also increases with it.
After analysis, we have identified the problem.
The text was updated successfully, but these errors were encountered: