-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of Cat Nodes API #99744
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Pinging @elastic/es-data-management (Team:Data Management) |
The
This seems like a nice idea but it would be fragile, we'd need to keep track of the columns that included stats to know whether or not the stats request was needed, and make sure to keep that list up to date as the columns change over time. Also the
IMO this is a valid point, although I would not want to solve it as described. Today every node-level |
Just to add: another idea would be to indicate in the stats request that we don't care about stats for individual shards, we are only going to use a summary. That'd save a bunch of effort and network traffic with the nodes stats and indices stats APIs too. |
@DaveCTurner Thanks for the reply. In the most scenarios we will use the JSON APIs instead of
This is a great idea that is more elegant than i think and avoids unnecessary serialization of the request content from
This is also a great idea that would trim useless shard stats from responses and avoid unnecessary deserialization of
I think implementaion of these ideas could solve this issue. |
I don't quite understand why indices stats APIs can be optimized.I see coordinate node needs shard stats to build indices stats.
|
I would like to do it, do you think I could give it a try?
|
I think we can reduce the work here if the user specifies
Sure, go for it. I recommend you don't do it all in one PR tho, try and separate the independent changes out to make them easier to review. |
@DaveCTurner I've opened a pull request(#99938) for this idea . Could you please have a look when you have some time? Thanks
|
…equest (#99938) There's no need to include the whole top-level `NodesInfoRequest` in the requests for info from individual nodes, and this can add substantial overhead if there are lots of nodes in the cluster. With this commit we drop the wrapper in favour of just the parts of the top-level request needed for the node-level processing. Relates #99744
…equest (elastic#99938) There's no need to include the whole top-level `NodesInfoRequest` in the requests for info from individual nodes, and this can add substantial overhead if there are lots of nodes in the cluster. With this commit we drop the wrapper in favour of just the parts of the top-level request needed for the node-level processing. Relates elastic#99744
I've opened a pull request for this idea in #100466. |
Thanks @NEUpanning, I'll take a look next week. You might also be interested in #90631 which is kind of the same thing but for the |
I would like to resolve this issue. After that PR is merged, I will try it using the similar approach. |
After we have implemented these ideas mentioned above, the CPU usage and cost time of fetching nodes stats (without shards-level stats) reduce to 1/1000th of their original levels in the cluster with 200 data nodes and 140k shards. So this issue is closed as completed. |
Nice work @NEUpanning, thanks for the report, the fixes, and for confirming that the problems are fixed. |
Thanks again David. Thanks for your help and patience in code review. |
Description
We found that executing the Cat Nodes API (query parameters do not matter) on the coordinate node of a large cluster can require a huge amount of CPU.This could have a significant impact on cluster stability.I reproduced this problem in the cluster with 200 data nodes and 140k shards.
When I used the 'top' command to query, the result showed that CPU usage fluctuated between 726% and 1173% for 3 seconds.
The most CPU usage comes from
ShardStats.<init>
andDiscoveryNode.writeTo
.ShardStats.<init>
is called when coordinate node deserializes response that is responded by other nodes.DiscoveryNode.writeTo
is called when coordinate node serializes request that will be sent to other nodes.Here is Flame GraphSeveral superficial ideas try to solve this issue:
NodesStatsRequest#indices
based on query parameters to filter indices stats rather than callingNodesStatsRequest.indices(true)
that contains all indices stats.For instance if users call_cat/nodes?h=m
,coordinate node should not fetch indices stats from other nodes.This would avoid a lot of unnecessary deserialization of the response content fromShardStats.<init>
.NodeInfoRequest#concreteNodes
to null after usingNodeInfoRequest#concreteNodes
to build iterator that used to send requests.This would avoid unnecessary serialization of the request content fromDiscoveryNode.writeTo
.The text was updated successfully, but these errors were encountered: