Skip to content

Commit

Permalink
Docs disambiguate reindex's requests_per_second (#26185)
Browse files Browse the repository at this point in the history
Reindex's docs were somewhere between unclear and
inaccurate around `requests_per_second`. This makes
them much more clear and accurate.
  • Loading branch information
berglh authored and nik9000 committed Aug 15, 2017
1 parent b22ee2e commit 3aab818
Showing 1 changed file with 15 additions and 8 deletions.
23 changes: 15 additions & 8 deletions docs/reference/docs/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -535,14 +535,21 @@ shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>.

`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles the number of requests per second that the reindex
issues or it can be set to `-1` to disabled throttling. The throttling is done
waiting between bulk batches so that it can manipulate the scroll timeout. The
wait time is the difference between the time it took the batch to complete and
the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
broken into multiple bulk requests large batch sizes will cause Elasticsearch
to create many requests and then wait for a while before starting the next set.
This is "bursty" instead of "smooth". The default is `-1`.
`1000`, etc) and throttles the number of batches that the reindex issues by
padding each batch with a wait time. The throttling can be disabled by
setting `requests_per_second` to `-1`.

The throttling is done waiting between bulk batches so that it can manipulate the
scroll timeout. The wait time is the difference between the request scroll search
size divided by the `requests_per_second` and the `batch_write_time`. By default
the scroll batch size is `1000`, so if the `requests_per_second` is set to `500`:

`target_total_time` = `1000` / `500 per second` = `2 seconds` +
`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds`

Since the batch isn't broken into multiple bulk requests large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.

[float]
[[docs-reindex-response-body]]
Expand Down

0 comments on commit 3aab818

Please sign in to comment.