Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify IndicesShardStoresAction #94507

Conversation

DaveCTurner
Copy link
Contributor

  • No need to use an AsyncShardFetch here, there is no caching
  • Response may be very large, introduce chunking
  • Fan-out may be very large, introduce throttling
  • Processing time may be nontrivial, introduce cancellability
  • Eliminate many unnecessary intermediate data structures
  • Do shard-level response processing more eagerly
  • Determine allocation from RoutingTable not RoutingNodes
  • Add tests

Relates #81081

- No need to use an `AsyncShardFetch` here, there is no caching
- Response may be very large, introduce chunking
- Fan-out may be very large, introduce throttling
- Processing time may be nontrivial, introduce cancellability
- Eliminate many unnecessary intermediate data structures
- Do shard-level response processing more eagerly
- Determine allocation from `RoutingTable` not `RoutingNodes`
- Add tests

Relates elastic#81081
@DaveCTurner DaveCTurner added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.8.0 labels Mar 14, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 14, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

private final String[] concreteIndices;
private final RoutingTable routingTable;
private final Metadata metadata;
private final Map<String, Map<Integer, List<StoreStatus>>> indicesStatuses;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Map<ShardId, List<StoreStatus>> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just because that's what the response wants.

this.failures = new ConcurrentLinkedQueue<>();
this.outerListener = new RefCountingListener(1, listener.map(ignored -> {
task.ensureNotCancelled();
return new IndicesShardStoresResponse(Map.copyOf(indicesStatuses), List.copyOf(failures));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, was IndicesShardStoresResponse requiring map of maps before?

@@ -66,6 +63,8 @@ public class TransportIndicesShardStoresAction extends TransportMasterNodeReadAc

private static final Logger logger = LogManager.getLogger(TransportIndicesShardStoresAction.class);

static final int CONCURRENT_REQUESTS_LIMIT = 100; // TODO configurable?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be converted to a Setting or was it intentionally left as a TODO?
Not sure if this is important though as it was not limited previously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's worth a discussion. Bounding concurrency here might make things slower. 100 feels high enough to me to have limited impact, whilst being low enough to avoid this API being actively harmful to an enormous cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it could also depend on CPU/network. Have we ever recorded this was an issue in the wild when it was unlimited?
If not I am leaning towards approach of making it configurable with a huge default (1000?) and let them set it lower if/when issue is detected.

idegtiarenko
idegtiarenko previously approved these changes Mar 24, 2023
@DaveCTurner DaveCTurner dismissed idegtiarenko’s stale review March 27, 2023 09:10

I'm going to mark this as not-approved while I think about the concurrent-requests-limit part. Otherwise I'll forget that this is outstanding.

@DaveCTurner
Copy link
Contributor Author

Ok I pushed d0333bf which makes the limit configurable on a request-by-request basis (using the same parameter as we use in the search API: max_concurrent_shard_requests. I've left the default at 100.

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 27, 2023
@DaveCTurner
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/docs

@elasticsearchmachine elasticsearchmachine merged commit e377a86 into elastic:main Mar 27, 2023
@DaveCTurner DaveCTurner deleted the 2023-03-13-IndicesShardStores-simplify branch March 27, 2023 15:34
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 28, 2023
Added in elastic#94507 but without the comment from elastic#93101, which this commit
fixes.
DaveCTurner added a commit that referenced this pull request Apr 6, 2023
Added in #94507 but without the comment from #93101, which this commit
fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants