Further improve docs for requests_per_second

In #26185 we made the description of `requests_per_second` sane for reindex. This improves on the description by using some more common vocabulary ("batch size", etc) and improving the formatting of the example calculation so it stands out and doesn't require scrolling.
elastic · Aug 15, 2017 · 6de0afd · 6de0afd
1 parent 3aab818
commit 6de0afd
Show file tree

Hide file tree

Showing 3 changed files with 53 additions and 27 deletions.
diff --git a/docs/reference/docs/delete-by-query.asciidoc b/docs/reference/docs/delete-by-query.asciidoc
@@ -164,14 +164,25 @@ shards to become available. Both work exactly how they work in the
 <<docs-bulk,Bulk API>>.
 
 `requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
-`1000`, etc) and throttles the number of requests per second that the delete-by-query
-issues or it can be set to `-1` to disabled throttling. The throttling is done
-waiting between bulk batches so that it can manipulate the scroll timeout. The
-wait time is the difference between the time it took the batch to complete and
-the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
-broken into multiple bulk requests large batch sizes will cause Elasticsearch
-to create many requests and then wait for a while before starting the next set.
-This is "bursty" instead of "smooth". The default is `-1`.
+`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of
+delete operations by padding each batch with a wait time. The throttling can be
+disabled by setting `requests_per_second` to `-1`.
+
+The throttling is done by waiting between batches so that scroll that
+`_delete_by_query` uses internally can be given a timeout that takes into
+account the padding. The padding time is the difference between the batch size
+divided by the `requests_per_second` and the time spent writing. By default the
+batch size is `1000`, so if the `requests_per_second` is set to `500`:
+
+[source,txt]
+--------------------------------------------------
+target_time = 1000 / 500 per second = 2 seconds
+wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
+--------------------------------------------------
+
+Since the batch is issued as a single `_bulk` request large batch sizes will
+cause Elasticsearch to create many requests and then wait for a while before
+starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
 
 [float]
 === Response body

diff --git a/docs/reference/docs/reindex.asciidoc b/docs/reference/docs/reindex.asciidoc
@@ -535,20 +535,24 @@ shards to become available. Both work exactly how they work in the
 <<docs-bulk,Bulk API>>.
 
 `requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
-`1000`, etc) and throttles the number of batches that the reindex issues by 
-padding each batch with a wait time. The throttling can be disabled by 
-setting `requests_per_second` to `-1`.
+`1000`, etc) and throttles rate at which reindex issues batches of index
+operations by padding each batch with a wait time. The throttling can be
+disabled by setting `requests_per_second` to `-1`.
 
-The throttling is done waiting between bulk batches so that it can manipulate the 
-scroll timeout. The wait time is the difference between the request scroll search
-size divided by the `requests_per_second` and the `batch_write_time`. By default 
-the scroll batch size is `1000`, so if the `requests_per_second` is set to `500`:
+The throttling is done by waiting between batches so that scroll that reindex
+uses internally can be given a timeout that takes into account the padding.
+The padding time is the difference between the batch size divided by the
+`requests_per_second` and the time spent writing. By default the batch size is
+`1000`, so if the `requests_per_second` is set to `500`:
 
-`target_total_time` = `1000` / `500 per second` = `2 seconds` +
-`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds`
+[source,txt]
+--------------------------------------------------
+target_time = 1000 / 500 per second = 2 seconds
+wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
+--------------------------------------------------
 
-Since the batch isn't broken into multiple bulk requests large batch sizes will 
-cause Elasticsearch to create many requests and then wait for a while before 
+Since the batch is issued as a single `_bulk` request large batch sizes will
+cause Elasticsearch to create many requests and then wait for a while before
 starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
 
 [float]

diff --git a/docs/reference/docs/update-by-query.asciidoc b/docs/reference/docs/update-by-query.asciidoc
@@ -221,14 +221,25 @@ shards to become available. Both work exactly how they work in the
 <<docs-bulk,Bulk API>>.
 
 `requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
-`1000`, etc) and throttles the number of requests per second that the update-by-query
-issues or it can be set to `-1` to disabled throttling. The throttling is done
-waiting between bulk batches so that it can manipulate the scroll timeout. The
-wait time is the difference between the time it took the batch to complete and
-the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
-broken into multiple bulk requests large batch sizes will cause Elasticsearch
-to create many requests and then wait for a while before starting the next set.
-This is "bursty" instead of "smooth". The default is `-1`.
+`1000`, etc) and throttles rate at which `_update_by_query` issues batches of
+index operations by padding each batch with a wait time. The throttling can be
+disabled by setting `requests_per_second` to `-1`.
+
+The throttling is done by waiting between batches so that scroll that
+`_update_by_query` uses internally can be given a timeout that takes into
+account the padding. The padding time is the difference between the batch size
+divided by the `requests_per_second` and the time spent writing. By default the
+batch size is `1000`, so if the `requests_per_second` is set to `500`:
+
+[source,txt]
+--------------------------------------------------
+target_time = 1000 / 500 per second = 2 seconds
+wait_time = target_time - delete_time = 2 seconds - .5 seconds = 1.5 seconds
+--------------------------------------------------
+
+Since the batch is issued as a single `_bulk` request large batch sizes will
+cause Elasticsearch to create many requests and then wait for a while before
+starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
 
 [float]
 [[docs-update-by-query-response-body]]