From 23970e7679dee1862e48a9c920f72a94f7cf9bbb Mon Sep 17 00:00:00 2001 From: Nik Everett Date: Tue, 15 Aug 2017 15:53:29 -0400 Subject: [PATCH] Further improve docs for requests_per_second In #26185 we made the description of `requests_per_second` sane for reindex. This improves on the description by using some more common vocabulary ("batch size", etc) and improving the formatting of the example calculation so it stands out and doesn't require scrolling. --- docs/reference/docs/delete-by-query.asciidoc | 27 ++++++++++++++------ docs/reference/docs/reindex.asciidoc | 26 +++++++++++-------- docs/reference/docs/update-by-query.asciidoc | 27 ++++++++++++++------ 3 files changed, 53 insertions(+), 27 deletions(-) diff --git a/docs/reference/docs/delete-by-query.asciidoc b/docs/reference/docs/delete-by-query.asciidoc index 1e26aac6d612f..d3a1b00b6513a 100644 --- a/docs/reference/docs/delete-by-query.asciidoc +++ b/docs/reference/docs/delete-by-query.asciidoc @@ -164,14 +164,25 @@ shards to become available. Both work exactly how they work in the <>. `requests_per_second` can be set to any positive decimal number (`1.4`, `6`, -`1000`, etc) and throttles the number of requests per second that the delete-by-query -issues or it can be set to `-1` to disabled throttling. The throttling is done -waiting between bulk batches so that it can manipulate the scroll timeout. The -wait time is the difference between the time it took the batch to complete and -the time `requests_per_second * requests_in_the_batch`. Since the batch isn't -broken into multiple bulk requests large batch sizes will cause Elasticsearch -to create many requests and then wait for a while before starting the next set. -This is "bursty" instead of "smooth". The default is `-1`. +`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of +delete operations by padding each batch with a wait time. The throttling can be +disabled by setting `requests_per_second` to `-1`. + +The throttling is done by waiting between batches so that scroll that +`_delete_by_query` uses internally can be given a timeout that takes into +account the padding. The padding time is the difference between the batch size +divided by the `requests_per_second` and the time spent writing. By default the +batch size is `1000`, so if the `requests_per_second` is set to `500`: + +[source,txt] +-------------------------------------------------- +target_time = 1000 / 500 per second = 2 seconds +wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds +-------------------------------------------------- + +Since the batch is issued as a single `_bulk` request large batch sizes will +cause Elasticsearch to create many requests and then wait for a while before +starting the next set. This is "bursty" instead of "smooth". The default is `-1`. [float] === Response body diff --git a/docs/reference/docs/reindex.asciidoc b/docs/reference/docs/reindex.asciidoc index ebb30750d14db..2e64db6adb0ae 100644 --- a/docs/reference/docs/reindex.asciidoc +++ b/docs/reference/docs/reindex.asciidoc @@ -534,20 +534,24 @@ shards to become available. Both work exactly how they work in the <>. `requests_per_second` can be set to any positive decimal number (`1.4`, `6`, -`1000`, etc) and throttles the number of batches that the reindex issues by -padding each batch with a wait time. The throttling can be disabled by -setting `requests_per_second` to `-1`. +`1000`, etc) and throttles rate at which reindex issues batches of index +operations by padding each batch with a wait time. The throttling can be +disabled by setting `requests_per_second` to `-1`. -The throttling is done waiting between bulk batches so that it can manipulate the -scroll timeout. The wait time is the difference between the request scroll search -size divided by the `requests_per_second` and the `batch_write_time`. By default -the scroll batch size is `1000`, so if the `requests_per_second` is set to `500`: +The throttling is done by waiting between batches so that scroll that reindex +uses internally can be given a timeout that takes into account the padding. +The padding time is the difference between the batch size divided by the +`requests_per_second` and the time spent writing. By default the batch size is +`1000`, so if the `requests_per_second` is set to `500`: -`target_total_time` = `1000` / `500 per second` = `2 seconds` + -`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds` +[source,txt] +-------------------------------------------------- +target_time = 1000 / 500 per second = 2 seconds +wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds +-------------------------------------------------- -Since the batch isn't broken into multiple bulk requests large batch sizes will -cause Elasticsearch to create many requests and then wait for a while before +Since the batch is issued as a single `_bulk` request large batch sizes will +cause Elasticsearch to create many requests and then wait for a while before starting the next set. This is "bursty" instead of "smooth". The default is `-1`. [float] diff --git a/docs/reference/docs/update-by-query.asciidoc b/docs/reference/docs/update-by-query.asciidoc index 28c250dcfe193..a14647441f499 100644 --- a/docs/reference/docs/update-by-query.asciidoc +++ b/docs/reference/docs/update-by-query.asciidoc @@ -221,14 +221,25 @@ shards to become available. Both work exactly how they work in the <>. `requests_per_second` can be set to any positive decimal number (`1.4`, `6`, -`1000`, etc) and throttles the number of requests per second that the update-by-query -issues or it can be set to `-1` to disabled throttling. The throttling is done -waiting between bulk batches so that it can manipulate the scroll timeout. The -wait time is the difference between the time it took the batch to complete and -the time `requests_per_second * requests_in_the_batch`. Since the batch isn't -broken into multiple bulk requests large batch sizes will cause Elasticsearch -to create many requests and then wait for a while before starting the next set. -This is "bursty" instead of "smooth". The default is `-1`. +`1000`, etc) and throttles rate at which `_update_by_query` issues batches of +index operations by padding each batch with a wait time. The throttling can be +disabled by setting `requests_per_second` to `-1`. + +The throttling is done by waiting between batches so that scroll that +`_update_by_query` uses internally can be given a timeout that takes into +account the padding. The padding time is the difference between the batch size +divided by the `requests_per_second` and the time spent writing. By default the +batch size is `1000`, so if the `requests_per_second` is set to `500`: + +[source,txt] +-------------------------------------------------- +target_time = 1000 / 500 per second = 2 seconds +wait_time = target_time - delete_time = 2 seconds - .5 seconds = 1.5 seconds +-------------------------------------------------- + +Since the batch is issued as a single `_bulk` request large batch sizes will +cause Elasticsearch to create many requests and then wait for a while before +starting the next set. This is "bursty" instead of "smooth". The default is `-1`. [float] [[docs-update-by-query-response-body]]