-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the number of concurrent requests per node #31206
Conversation
With `max_concurrent_shard_requests` we used to throttle / limit the number of concurrent shard requests a high level search request can execute per node. This had several problems since it limited the number on a global level based on the number of nodes. This change now throttles the number of concurrent requests per node while still allowing concurrency across multiple nodes. Closes elastic#31192
Pinging @elastic/es-search-aggs |
…oose it in a race
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you had a search/replace issue but I like it in general. Given that it is an expert option I don't think we need sophisticated backward compatibility, let's just add a note about it in the breaking changes for 7.0?
|
||
synchronized void add(Runnable runnable) { | ||
if (queue == null) { // create this lazily | ||
queue = new LinkedList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not that it matters much but ArrayDeque should be better?
} | ||
|
||
synchronized void add(Runnable runnable) { | ||
if (queue == null) { // create this lazily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it need to be lazy?
@@ -258,7 +254,7 @@ protected void doExecute(Task task, SearchRequest searchRequest, ActionListener< | |||
} | |||
String[] aliases = aliasFilter.getAliases(); | |||
String[] finalIndices = aliases.length == 0 ? new String[] {shardId.getIndexName()} : aliases; | |||
// here we have to map the filters to the UUID since from now on we use the uuid for the lookup | |||
// here we have to pendingExecutionsPerNode the filters to the UUID since from now on we use the uuid for the lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search/replace issue?
try { | ||
executePhaseOnShard(shardIt, shard, new SearchActionListener<FirstResult>(new SearchShardTarget(shard | ||
.currentNodeId(), | ||
shardIt.shardId(), shardIt.getClusterAlias(), shardIt.getOriginalIndices()), shardIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation of the two above lines doesn't help readability
@@ -199,7 +199,7 @@ public void scrollId(String scrollId) { | |||
* If profiling was enabled, this returns an object containing the profile results from | |||
* each shard. If profiling was not enabled, this will return null | |||
* | |||
* @return The profile results or an empty map | |||
* @return The profile results or an empty pendingExecutionsPerNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search/replace issue?
@@ -90,7 +90,7 @@ public final int getNumReducePhases() { | |||
|
|||
/** | |||
* Returns the profile results for this search response (including all shards). | |||
* An empty map is returned if profiling was not enabled | |||
* An empty pendingExecutionsPerNode is returned if profiling was not enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search/replace issue?
@@ -203,8 +203,8 @@ public RemoteClusterService getRemoteClusterService() { | |||
} | |||
|
|||
/** | |||
* Return a map of nodeId to pending number of search requests. | |||
* This is a snapshot of the current pending search and not a live map. | |||
* Return a pendingExecutionsPerNode of nodeId to pending number of search requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search/replace issue?
fork(() -> onShardFailure(shardIndex, shard, shard.currentNodeId(), shardIt, e)); | ||
} | ||
}; | ||
if (pendingExecutions == null || pendingExecutions.tryAcquire()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe remove this tryAcquire and go through the else block directly? I think it's simpler if the management of the semaphore permits are contained in PendingExecutions?
@jpountz I addressed your comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: Remove RestGetAllAliasesAction (#31308) Temporary fix for broken build Reenable Checkstyle's unused import rule (#31270) Remove remaining unused imports before merging #31270 Fix non-REST doc snippet [DOC] Extend SQL docs Immediately flush channel after writing to buffer (#31301) [DOCS] Shortens ML API intros Use quotes in the call invocation (#31249) move security ingest processors to a sub ingest directory (#31306) Add 5.6.11 version constant. Fix version detection. SQL: Whitelist SQL utility class for better scripting (#30681) [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299) CCS: don't proxy requests for already connected node (#31273) Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap [test] opensuse packaging turn up debug logging Add unreleased version 6.3.1 Removes experimental tag from scripted_metric aggregation (#31298) [Rollup] Metric config parser must use builder so validation runs (#31159) [ML] Check licence when datafeeds use cross cluster search (#31247) Add notion of internal index settings (#31286) Test: Remove broken yml test feature (#31255) REST hl client: cluster health to default to cluster level (#31268) [ML] Update test thresholds to account for changes to memory control (#31289) Log warnings when cluster state publication failed to some nodes (#31233) Fix AntFixture waiting condition (#31272) Ignore numeric shard count if waiting for ALL (#31265) [ML] Implement new rules design (#31110) index_prefixes back-compat should test 6.3 (#30951) Core: Remove plain execute method on TransportAction (#30998) Update checkstyle to 8.10.1 (#31269) Set analyzer version in PreBuiltAnalyzerProviderFactory (#31202) Modify pipelining handlers to require full requests (#31280) Revert upgrade to Netty 4.1.25.Final (#31282) Use armored input stream for reading public key (#31229) Fix Netty 4 Server Transport tests. Again. REST hl client: adjust wait_for_active_shards param in cluster health (#31266) REST high-level Client: remove deprecated API methods (#31200) [DOCS] Mark SQL feature as experimental [DOCS] Updates machine learning custom URL screenshots (#31222) Fix naming conventions check for XPackTestCase Fix security Netty 4 transport tests Fix race in clear scroll (#31259) [DOCS] Clarify audit index settings when remote indexing (#30923) Delete typos in SAML docs (#31199) REST high-level client: add Cluster Health API (#29331) [ML][TEST] Mute tests using rules (#31204) Support RequestedAuthnContext (#31238) SyncedFlushResponse to implement ToXContentObject (#31155) Add Get Aliases API to the high-level REST client (#28799) Remove some line length supressions (#31209) Validate xContentType in PutWatchRequest. (#31088) [INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024) Suppress extras FS on caching directory tests Revert "[DOCS] Added 6.3 info & updated the upgrade table. (#30940)" Revert "Fix snippets in upgrade docs" Fix snippets in upgrade docs [DOCS] Added 6.3 info & updated the upgrade table. (#30940) LLClient: Support host selection (#30523) Upgrade to Netty 4.1.25.Final (#31232) Enable custom credentials for core REST tests (#31235) Move ESIndexLevelReplicationTestCase to test framework (#31243) Encapsulate Translog in Engine (#31220) HLRest: Add get index templates API (#31161) Remove all unused imports and fix CRLF (#31207) [Tests] Fix self-referencing tests [TEST] Fix testRecoveryAfterPrimaryPromotion [Docs] Remove mention pattern files in Grok processor (#31170) Use stronger write-once semantics for Azure repository (#30437) Don't swallow exceptions on replication (#31179) Limit the number of concurrent requests per node (#31206) Call ensureNoSelfReferences() on _agg state variable after scripted metric agg script executions (#31044) Move java version checker back to its own jar (#30708) [test] add fix for rare virtualbox error (#31212)
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to elastic#31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to elastic#31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to elastic#31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
With
max_concurrent_shard_requests
we used to throttle / limitthe number of concurrent shard requests a high level search request
can execute per node. This had several problems since it limited the
number on a global level based on the number of nodes. This change
now throttles the number of concurrent requests per node while still
allowing concurrency across multiple nodes.
Closes #31192