Skip to content

Commit

Permalink
Further improve docs for requests_per_second
Browse files Browse the repository at this point in the history
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
  • Loading branch information
nik9000 committed Aug 15, 2017
1 parent 3aab818 commit 6de0afd
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 27 deletions.
27 changes: 19 additions & 8 deletions docs/reference/docs/delete-by-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -164,14 +164,25 @@ shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>.

`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles the number of requests per second that the delete-by-query
issues or it can be set to `-1` to disabled throttling. The throttling is done
waiting between bulk batches so that it can manipulate the scroll timeout. The
wait time is the difference between the time it took the batch to complete and
the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
broken into multiple bulk requests large batch sizes will cause Elasticsearch
to create many requests and then wait for a while before starting the next set.
This is "bursty" instead of "smooth". The default is `-1`.
`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of
delete operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.

The throttling is done by waiting between batches so that scroll that
`_delete_by_query` uses internally can be given a timeout that takes into
account the padding. The padding time is the difference between the batch size
divided by the `requests_per_second` and the time spent writing. By default the
batch size is `1000`, so if the `requests_per_second` is set to `500`:

[source,txt]
--------------------------------------------------
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------

Since the batch is issued as a single `_bulk` request large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.

[float]
=== Response body
Expand Down
26 changes: 15 additions & 11 deletions docs/reference/docs/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -535,20 +535,24 @@ shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>.

`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles the number of batches that the reindex issues by
padding each batch with a wait time. The throttling can be disabled by
setting `requests_per_second` to `-1`.
`1000`, etc) and throttles rate at which reindex issues batches of index
operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.

The throttling is done waiting between bulk batches so that it can manipulate the
scroll timeout. The wait time is the difference between the request scroll search
size divided by the `requests_per_second` and the `batch_write_time`. By default
the scroll batch size is `1000`, so if the `requests_per_second` is set to `500`:
The throttling is done by waiting between batches so that scroll that reindex
uses internally can be given a timeout that takes into account the padding.
The padding time is the difference between the batch size divided by the
`requests_per_second` and the time spent writing. By default the batch size is
`1000`, so if the `requests_per_second` is set to `500`:

`target_total_time` = `1000` / `500 per second` = `2 seconds` +
`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds`
[source,txt]
--------------------------------------------------
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------

Since the batch isn't broken into multiple bulk requests large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
Since the batch is issued as a single `_bulk` request large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.

[float]
Expand Down
27 changes: 19 additions & 8 deletions docs/reference/docs/update-by-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -221,14 +221,25 @@ shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>.

`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles the number of requests per second that the update-by-query
issues or it can be set to `-1` to disabled throttling. The throttling is done
waiting between bulk batches so that it can manipulate the scroll timeout. The
wait time is the difference between the time it took the batch to complete and
the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
broken into multiple bulk requests large batch sizes will cause Elasticsearch
to create many requests and then wait for a while before starting the next set.
This is "bursty" instead of "smooth". The default is `-1`.
`1000`, etc) and throttles rate at which `_update_by_query` issues batches of
index operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.

The throttling is done by waiting between batches so that scroll that
`_update_by_query` uses internally can be given a timeout that takes into
account the padding. The padding time is the difference between the batch size
divided by the `requests_per_second` and the time spent writing. By default the
batch size is `1000`, so if the `requests_per_second` is set to `500`:

[source,txt]
--------------------------------------------------
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - delete_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------

Since the batch is issued as a single `_bulk` request large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.

[float]
[[docs-update-by-query-response-body]]
Expand Down

0 comments on commit 6de0afd

Please sign in to comment.