-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant response of empty result in AsyncShardFetch to avoid OOM issue #84010
Conversation
❌ Author of the following commits did not sign a Contributor Agreement: Please, read and sign the above mentioned agreement if you want to contribute to this project |
@DaveCTurner We already backport the #77991, but the OOM issue still exists. The main part which occupy the memory is not DiscoveryNode but StoreFilesMetadata which is a node response content and is relatively large. |
b515a21
to
e37d3e1
Compare
Pinging @elastic/es-distributed (Team:Distributed) |
Also would you sign the CLA? |
CLA Done @DaveCTurner |
@DaveCTurner As you can see in the screenshot, there was totally 32927024 objets in the dump and they occupied more than 15GB. |
i.e. 450 bytes. That's a lot more than I expect. Can you explain why it's so much? |
Are you sure you signed the CLA with the address |
On reflection I'd rather avoid allocating these objects entirely rather than creating them and then filtering them out as proposed here. #84034 should do what we want. |
Sounds good, thanks for raising the issue. Closing this, will proceed with #84034 instead. |
@maosuhan please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation. |
In the process of full restart, OOM problems are often encountered. After dump analysis and we found the following problems.
In the process of fetching data, the master will ask each data whether it owns the shard, and all the returned results of the node will be saved in the map<nodeId, nodeResponse> in AsyncShardFetch. In most cases, a shard only has data on several nodes, but the intermediate result map will still save the return results of all nodes.
In this MR, we only save valid intermediate result of node responses and ignore the node responses that does not hold the shard at all.
It is proved to be successfully in our company and the OOM issue is gone after the optimization.