-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch async fetch shards data to reduce memory consumption. #81081
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the right sort of idea. I left some comments inline. I think we should also do this for replica allocations too.
@@ -56,6 +56,7 @@ | |||
*/ | |||
public interface Lister<NodesResponse extends BaseNodesResponse<NodeResponse>, NodeResponse extends BaseNodeResponse> { | |||
void list(ShardId shardId, @Nullable String customDataPath, DiscoveryNode[] nodes, ActionListener<NodesResponse> listener); | |||
void flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than introducing this method to the lister (and the corresponding flag passed in to fetchData
) could we have the allocator directly indicate the end of an allocation round which triggers the flush.
for (ShardId shardId : requestMap.keySet()) { | ||
ShardRequestInfo shardRequest = requestMap.get(shardId); | ||
shards.put(shardRequest.shardId(), shardRequest.getCustomDataPath()); | ||
if (node.getVersion().before(Version.V_7_16_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version in master
is now 8.1.0; it's unlikely we'll backport this to an earlier version.
}; | ||
|
||
client.executeLocally( | ||
TransportNodesListGatewayStartedShards.TYPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm undecided about re-using the same action type for both kinds of request here. I think it'd be cleaner to introduce a new one (and to name it something better than internal:gateway/local/started_shards
) given how big a difference in behaviour we are making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DaveCTurner , if we introduce a new action, then we need to refactor some logics in GatewayAllocator
, like the follow structures, it seems that would a big change for the high level allocators. How do you think so?
elasticsearch/server/src/main/java/org/elasticsearch/gateway/GatewayAllocator.java
Lines 57 to 60 in 2629c32
private final ConcurrentMap<ShardId, AsyncShardFetch<NodeGatewayStartedShards>> asyncFetchStarted = ConcurrentCollections | |
.newConcurrentMap(); | |
private final ConcurrentMap<ShardId, AsyncShardFetch<NodeStoreFilesMetadata>> asyncFetchStore = ConcurrentCollections | |
.newConcurrentMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is true, I think we could keep pretty much the same interface from the point of view of GatewayAllocator
. It should be possible to implement a batching Lister
which reworks the batched responses into a BaseNodesResponse<NodeGatewayStartedShards>
.
protected NodeGatewayStartedShards nodeOperation(NodeRequest request, Task task) { | ||
protected NodeGroupedGatewayStartedShards nodeOperation(NodeRequest request, Task task) { | ||
NodeGroupedGatewayStartedShards groupedStartedShards = new NodeGroupedGatewayStartedShards(clusterService.localNode()); | ||
for (Map.Entry<ShardId, String> entry : request.getShards().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When sending these requests per-shard we execute them in parallel across the FETCH_SHARD_STARTED
threadpool. I think we should continue to parallelise them at the shard level like that.
Thanks for the suggestion. I am going to complete the optimization. |
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-distributed (Team:Distributed) |
Thanks. I would try to break down into smaller steps. Any suggestion would be appreciated. |
IMO the trickiest part is that today there's no well-defined "end" to the shard-by-shard fetching process, and there needs to be an end so that we know when to flush the last batch of requests. This PR adds this event (and a lot of other things besides). I think making that change in isolation would be worth trying as a first step. This change won't mean very much on its own because we won't actually be doing any flushing at the end, but still once we have this flushing event I believe it'll be easier to review the changes that adds batching on top of it. Independently, I think you could detach |
Hi @DaveCTurner , in the follow logic of above PR, we first iterate all the unassigned primaries, each
|
- No need to use an `AsyncShardFetch` here, there is no caching - Response may be very large, introduce chunking - Fan-out may be very large, introduce throttling - Processing time may be nontrivial, introduce cancellability - Eliminate many unnecessary intermediate data structures - Do shard-level response processing more eagerly - Determine allocation from `RoutingTable` not `RoutingNodes` - Add tests Relates elastic#81081
- No need to use an `AsyncShardFetch` here, there is no caching - Response may be very large, introduce chunking - Fan-out may be very large, introduce throttling - Processing time may be nontrivial, introduce cancellability - Eliminate many unnecessary intermediate data structures - Do shard-level response processing more eagerly - Determine allocation from `RoutingTable` not `RoutingNodes` - Add tests Relates #81081
Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete)) |
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
This commit is going to fix #80694.
node-to-shard
and call the cached listeners after receiving node level fetch reqeusts.Async shard fetch requests before/after optimization:
