Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimization] Reduce ShardId key size in AsyncShardFetch Revamp strategy #12010

Closed
amkhar opened this issue Jan 24, 2024 · 3 comments
Closed
Assignees
Labels
bug Something isn't working Cluster Manager

Comments

@amkhar
Copy link
Contributor

amkhar commented Jan 24, 2024

Describe the bug

AsyncShardFetch Revamp strategy (explained here ) uses ShardId object directly in the key to store metadata of all the shards received from all the nodes. So overall memory usage goes in the factor of

ShardId object size * shard_count * node_count = this goes in GBs as ShardId object contains more data.

Related component

Cluster Manager

To Reproduce

  1. Use the latest code of this project
  2. spin up a new cluster with 500 nodes and 500K shards
  3. Restart all cluster manager nodes at once
  4. We'll see heap getting full, can go up to 50GB

Expected behavior

We should reduce the key size and not use full object, ideally use smaller sized key to reduce overall heap usage.

Additional Details

Screenshots
async-shard-fetch-batch-dominator-tree

heap dump collected from a cluster where batch size is 4000 and one batch is taking more than 500MBs.

@peternied
Copy link
Member

[Triage - attendees 1 2 3]
@amkhar Thanks for filing, we'd welcome a pull request to address this issue.

@amkhar
Copy link
Contributor Author

amkhar commented Jan 25, 2024

@peternied - Sure. I'm working on making a change, it requires a bit more exhaustive testing. I'll share once that is completed.

Change will be blocked on the rest of the PRs mentioned in the META issue #8098. Once initial PRs are merged, then only this change can be merged.

@amkhar
Copy link
Contributor Author

amkhar commented Mar 20, 2024

Not needed as cache restructuring is being done as part of #12504 and ShardId will not get repeated.

@amkhar amkhar closed this as completed Mar 20, 2024
@amkhar amkhar self-assigned this Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager
Projects
None yet
Development

No branches or pull requests

2 participants