-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] implement throttling in indexer #55011
[Transform] implement throttling in indexer #55011
Conversation
Pinging @elastic/es-analytics-geo (:Analytics/Rollup) |
Pinging @elastic/ml-core (:ml/Transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change. Test coverage is nice :D.
I am a bit concerned around synchronization with the scheduledNextSearch
.
...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexing/AsyncTwoPhaseIndexer.java
Outdated
Show resolved
Hide resolved
5693925
to
7c55d35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this change. On the one hand I understand why this could be useful but on the other hand I wonder if this is the right place/method to do it.
I wonder for instance if we could use the search option called max_concurrent_shard_requests
to limit the impact on the cluster ? So instead of throttling the search requests entirely, this option could control the number of shard requests per node that can be executed concurrently.
That seems easier for users than setting a number of requests per second since this number will depend greatly on the number of unique docs per bucket.
If this is not enough I think we should at least ensure that the cancel of the scheduled search is handled properly. Without clear synchronization, I am not sure how we can ensure that another scheduled search is created while calling triggerThrottledSearchNow
?
I disagree with that, I agree that In transform you can retrieve the current requests per second using
There are some ideas for "smart throttling", 1 way could be to automate the 3 steps above. But this is not planned for the first version, a low hanging fruit we thought about: suggest a value for Speaking of alternatives, I think the dream solution is this: #37867 as it would implement load shaping instead of brute force throttling. Another alternative that would be great from a usage perspective: budgeting the search call. Instead of using There are definitely better solutions than requests per second, but for now we have to use what's available.
The answer is: we do not ensure. This works optimistic (to avoid overhead) and to the best of my knowledge should work: Assume thread A runs the indexer and a request for stop comes in from thread B, its possible that although thread B sets the state to Note that there is always only 1 thread A, but there can be multiple thread B's. |
1a4c37f
to
35a9b74
Compare
I implemented some improvements after offline discussion:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concurrency concerns. I THINK cancel
will simply return true if the thread is already executing, which might cause issues.
@@ -178,6 +208,35 @@ public synchronized boolean maybeTriggerAsyncJob(long now) { | |||
} | |||
} | |||
|
|||
/** | |||
* Cancels a scheduled search request and issues the search request immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Cancels a scheduled search request and issues the search request immediately | |
* Cancels a scheduled search request |
* Cancels a scheduled search request and issues the search request immediately | ||
*/ | ||
private synchronized void stopThrottledSearch() { | ||
if (scheduledNextSearch != null && scheduledNextSearch.cancel()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand around cancel
, the only times it will return false
are:
- If the action has already been completed
- If the action has already been cancelled.
This means it will return true
if the thread is executing.
threadPool.executor(executorName).execute(() -> checkState(getState()));
Could happen in the middle of triggerNextSearch
. This MIGHT be ok, but the call to checkState(getState())
might transition from STOPPING -> STOPPED
while a search is still in flight. I am not sure this is intended behavior.
@@ -461,4 +562,37 @@ private boolean checkState(IndexerState currentState) { | |||
} | |||
} | |||
|
|||
private synchronized void reQueueThrottledSearch() { | |||
if (scheduledNextSearch != null && scheduledNextSearch.cancel()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to above, a current search could be inflight. This means that if the next delay is 0L
, we could have two searches occurring in parallel.
As long as this method is NEVER called out of band, I think this might be ok.
Do you have a link? My source (code) says:
(same as https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html#cancel(boolean), we call it with So my understanding is that |
@hendrikmuhs here is nothing in the JavaDOC that gives guarantees around if the thread is already executing and we are not interrupting.
This seems to indicate to me that IF the action is already running, it will NOT be interrupted and cancel will return This seems to be validated here (unless my code is wrong): https://repl.it/@ben_w_trent/WhenIsCancelTrue |
Your code obviously proves it. However I think I am not the only one, thinking that the documentation sucks, e.g. https://bugs.openjdk.java.net/browse/JDK-8022624. SO has some posts, too. |
I addressed the test problems in the separate PR #55666, to avoid making this one more complicated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK, now that rescheduling won't kick off another search request if one is in flight.
Would be good to get another approval before merge.
...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexing/AsyncTwoPhaseIndexer.java
Outdated
Show resolved
Hide resolved
improve tests related to stopping using a client that answers and can be synchronized with the test thread in order to test special situations relates #55011
improve tests related to stopping using a client that answers and can be synchronized with the test thread in order to test special situations relates #55011
9a247d1
to
df0b527
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw a couple of nits but I stopped reviewing when I saw the throttling formula because it seems to highlight a fundamental issue: are we intending to throttle based on search requests per second or documents retrieved per second? What changes are required will depend on the answer to that.
...ugin/core/src/test/java/org/elasticsearch/xpack/core/indexing/AsyncTwoPhaseIndexerTests.java
Outdated
Show resolved
Hide resolved
...ugin/core/src/test/java/org/elasticsearch/xpack/core/indexing/AsyncTwoPhaseIndexerTests.java
Outdated
Show resolved
Hide resolved
if (requestsPerSecond <= 0) { | ||
return TimeValue.ZERO; | ||
} | ||
float timeToWaitNanos = (docCount / requestsPerSecond) * TimeUnit.SECONDS.toNanos(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formula implies that requestsPerSecond
is really desiredDocsPerSecond
. If that's correct then requestsPerSecond
seems like it will cause confusion in the future because I would assume requestsPerSecond
referred to the number of searches, each of which could return many documents.
For example, if I saw a configuration parameter requests_per_second
I might decide to set it to 2 so that I'd get a maximum of 2 search requests per second from this functionality. But then if one of my searches returns 1000 documents then I get a 500 second wait until the next search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw a couple of nits but I stopped reviewing when I saw the throttling formula because it seems to highlight a fundamental issue: are we intending to throttle based on search requests per second or documents retrieved per second? What changes are required will depend on the answer to that.
This design and the reasons for that are discussed in the tracking issue: #54862. In a nutshell: You are right that requests_per_second
is misleading. The name has still been chosen, because its a existing concept from reindex
. It's wrong there, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I missed that requests_per_second
is used in this way for reindex already. In that case consistency with reindex is more important.
run elasticsearch-ci/2 |
status update: We are evaluating/discussing about renaming |
b712882
to
20125a3
Compare
The setting got renamed to @droberts195 @jimczi would be nice to get your input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setting got renamed to
docs_per_second
.
I think that is much better as it now reflects how the throttling is done. It would be nice if reindex v2 used the same setting for consistency between similar features.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
implement throttling in async-indexer used by rollup and transform. The added docs_per_second parameter is used to calculate a delay before the next search request is send. With re-throttle its possible to change the parameter at runtime. When stopping a running job, its ensured that despite throttling the indexer stops in reasonable time. This change contains the groundwork, but does not expose the new functionality. relates elastic#54862 # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupJobTask.java # x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/job/RollupIndexerIndexingTests.java # x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/job/RollupIndexerStateTests.java
implement throttling in async-indexer used by rollup and transform. The added docs_per_second parameter is used to calculate a delay before the next search request is send. With re-throttle its possible to change the parameter at runtime. When stopping a running job, its ensured that despite throttling the indexer stops in reasonable time. This change contains the groundwork, but does not expose the new functionality. relates #54862 backport: #55011
implement throttling in async-indexer used by rollup and transform. The added
docs_per_second
parameter is used to calculate a delay before the nextsearch request is send. With re-throttle its possible to change the parameter at
runtime, at stop its ensured that despite throttling the indexer stops in
reasonable time
relates #54862
This PR adds the basics to use throttling, usage/exposure of this feature is planned for separate PR's, that's why I label this as
non-issue
.