Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex API: Disambiguation of requests_per_second #26185

Merged
merged 2 commits into from
Aug 15, 2017
Merged

Conversation

berglh
Copy link
Contributor

@berglh berglh commented Aug 14, 2017

Proposal for disambiguation of requests_per_second as discussed in Reindex API: Reindex task es_rejected_execution_exception search queue failure #26153.

@nik9000 As per our discussion in the aforementioned elasticsearch issue, I am looking to disambiguate the requests_per_second wait function. I compared some of my results and have a proposal for updating the instructions to reflect my experience and your brief explanation. Here are the results of this unsuccessful scroll batch size of 10000, requests_per_second of 10000 Reindex task:

Item Result
Total Docs 26690000
Total Batches 2669
Task Completion Time 6195 s
Time per Batch 2.32s
Overall EPS 4308
Time Throttled 2668 s
Time Throttled per Batch 1s
Time Working 3527 s
Time Working per Scroll 1.32 s
Working EPS 7567.3
Time Throttle to Work Ratio 0.75

@nik9000 So .5 would write all 10000 documents and then sleep 20000 seconds - the amount of time that the write took.

I interpret this formula as size / requests_per_second = wait_time in seconds. This indeed kind of matches the time in the original post.

So in this case I perform the wait_time calculation and append it to the Time Working per Scroll:

1. 10000 / 10000 = 1 s
2. 1 s + 1.32 s = 2.32 s 

That fits nicely. So now I will try how I interpret the documentation:

The wait time is the difference between the time it took the batch to complete and the time requests_per_second * requests_in_the_batch

I interpret this formula as either:

  1. batch_time_read_write - (requests_per_second * requests_in_the_batch)
  2. (requests_per_second * requests_in_the_batch) - batch_time_read_write
1. 1.32 - (10000 * 10000) = −99999998.68
2. (10000 * 10000) - 1.32 = 99999998.68

Neither of these really match, even converting the seconds to nano, it doesn't produce anything meaningful. Of course, this will be different if you are indeed timing the bulk write (as you may of suggested You definitely said this was the case).

10000 Failed Task Output

{
  "completed": true,
  "task": {
    "node": "fmVI6xlZQCmhqZqVPIjfXA",
    "id": 84157930,
    "type": "transport",
    "action": "indices:data/write/reindex",
    "status": {
      "total": 279063633,
      "updated": 0,
      "created": 26690000,
      "deleted": 0,
      "batches": 2669,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 2667995,
      "requests_per_second": 10000,
      "throttled_until_millis": 0
    },
    "description": "reindex from [largeindex] to [largeindex.es5]",
    "start_time_in_millis": 1502431979655,
    "running_time_in_nanos": 6195367033709,
    "cancellable": true
  },
  "response": {
    "took": 6195366,
    "timed_out": false,
    "total": 279063633,
    "updated": 0,
    "created": 26690000,
    "deleted": 0,
    "batches": 2669,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
      "bulk": 0,
      "search": 0
    },
    "throttled_millis": 2667995,
    "requests_per_second": 10000,
    "throttled_until_millis": 0,
    "failures": [
      {
        "shard": -1,
        "reason": {
          "type": "es_rejected_execution_exception",
          "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@48752bce on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@53fe6f13[Running, pool size = 49, active threads = 49, queued tasks = 999, completed tasks = 7727372]]"
        }
      }
    ]
  }
}

So lets consider the results of a successful scroll batch size of 10000 and requests_per_second of 5000 Reindex task:

Item Result
Total Docs 133317017
Total Batches 13332
Task Completion Time 44458.6 s
Time per Batch 3.33s
Overall EPS 2998.7
Time Throttled 26663 s
Time Throttled per Batch 1.99s
Time Working 17795 s
Time Working per Batch 1.33 s
Working EPS 7491.6
Time Throttle to Work Ratio 1.49

size / requests_per_second = wait_time in seconds between each bulk:

1. 10000 / 5000 = 2 s
2. 2 s + 1.33 s = 3.33 s 

Again, the formula suits the metrics perfectly. You'll also notice the Working EPS is very similar as you'd expect with the same scroll size of 10000. However, when applying my interpretation of the formula in the documentation:

  1. batch_time_read_write - (requests_per_second * requests_in_the_batch)
  2. (requests_per_second * requests_in_the_batch) - batch_time_read_write
1. 1.33 - (5000 * 10000) = −49999998.67
2. (5000 * 10000) - 1.33 = 49999998.67

Same kind of result again, not what I am experiencing.

5000 Successful Task Output

{
  "_index": ".tasks",
  "_type": "task",
  "_id": "fmVI6xlZQCmhqZqVPIjfXA:81294668",
  "_score": 1,
  "_source": {
    "completed": true,
    "task": {
      "node": "fmVI6xlZQCmhqZqVPIjfXA",
      "id": 81294668,
      "type": "transport",
      "action": "indices:data/write/reindex",
      "status": {
        "total": 133317017,
        "updated": 0,
        "created": 133317017,
        "deleted": 0,
        "batches": 13332,
        "version_conflicts": 0,
        "noops": 0,
        "retries": {
          "bulk": 0,
          "search": 0
        },
        "throttled_millis": 26663379,
        "requests_per_second": 5000,
        "throttled_until_millis": 0
      },
      "description": "reindex from [largeindex] to [largeindex.es5]",
      "start_time_in_millis": 1502414174752,
      "running_time_in_nanos": 44458656303656,
      "cancellable": true
    },
    "response": {
      "took": 44458656,
      "timed_out": false,
      "total": 133317017,
      "updated": 0,
      "created": 133317017,
      "deleted": 0,
      "batches": 13332,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 26663379,
      "requests_per_second": 5000,
      "throttled_until_millis": 0,
      "failures": []
    }
  }
}

Proposal for disambiguation of requests_per_second as discusses in [Reindex API: Reindex task es_rejected_execution_exception search queue failure elastic#26153](elastic#26153 (comment)).
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@rjernst
Copy link
Member

rjernst commented Aug 14, 2017

I'm not sure about being so detailed in the internal details about how throttling is implemented. But @nik9000 should probably look at this.

@rjernst rjernst requested a review from nik9000 August 14, 2017 18:23
@rjernst rjernst added >docs General docs changes review labels Aug 14, 2017
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot better then what I had. I left a note about changing the calculation to make it more clear that the batch write time counts against the wait time. Other things also count against the wait time but they are mostly very fast.

if the `requests_per_second` is set to `500`:

`wait_time_in_seconds` = `1000` / `500` = `2` seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like:

`target_total_time` = `1000` / `500 per second` = `2 seconds`
`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds`

@nik9000 nik9000 merged commit 3aab818 into elastic:5.5 Aug 15, 2017
nik9000 added a commit that referenced this pull request Aug 15, 2017
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
nik9000 pushed a commit that referenced this pull request Aug 15, 2017
Reindex's docs were somewhere between unclear and
inaccurate around `requests_per_second`. This makes
them much more clear and accurate.
nik9000 added a commit that referenced this pull request Aug 15, 2017
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
nik9000 pushed a commit that referenced this pull request Aug 15, 2017
Reindex's docs were somewhere between unclear and
inaccurate around `requests_per_second`. This makes
them much more clear and accurate.
nik9000 added a commit that referenced this pull request Aug 15, 2017
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
nik9000 pushed a commit that referenced this pull request Aug 15, 2017
Reindex's docs were somewhere between unclear and
inaccurate around `requests_per_second`. This makes
them much more clear and accurate.
nik9000 added a commit that referenced this pull request Aug 15, 2017
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
nik9000 pushed a commit that referenced this pull request Aug 15, 2017
Reindex's docs were somewhere between unclear and
inaccurate around `requests_per_second`. This makes
them much more clear and accurate.
nik9000 added a commit that referenced this pull request Aug 15, 2017
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
@nik9000
Copy link
Member

nik9000 commented Aug 15, 2017

Thanks for rewriting the requests_per_second docs @berglh! I've pushed 5fcf5e6 which tweaks the wording a bit and ports the docs changes you made to _delete_by_query and _update_by_query and reworks the formatting on the math so it fits in the box that the renders snippets.

Thanks for doing this. I really didn't have a good way to talk about requests_per_second until you did this.

jasontedor added a commit to glefloch/elasticsearch that referenced this pull request Aug 16, 2017
* master: (458 commits)
  Prevent cluster internal `ClusterState.Custom` impls to leak to a client (elastic#26232)
  Add packaging test for systemd runtime directive
  [TEST] Reenable RareClusterStateIt#testDeleteCreateInOneBulk
  Serialize and expose timeout of acknowledged requests in REST layer (elastic#26189)
  (refactor) some opportunities to use diamond operator (elastic#25585)
  [DOCS] Clarified readme for testing a single page
  Settings: Add keystore.seed auto generated secure setting (elastic#26149)
  Update version information (elastic#25226)
  "result" : created -> "result" : "created" (elastic#25446)
  Set RuntimeDirectory (elastic#23526)
  Drop upgrade from full cluster restart tests (elastic#26224)
  Further improve docs for requests_per_second
  Docs disambiguate reindex's requests_per_second (elastic#26185)
  [DOCS] Cleanup link for ec2 discovery (elastic#26222)
  Fix document field equals and hash code test
  Use holder pattern for lazy deprecation loggers
  Settings: Add keystore creation to add commands (elastic#26126)
  Docs: Cleanup docs for ec2 discovery (elastic#26065)
  Fix NPE when `values` is omitted on percentile_ranks agg (elastic#26046)
  Several internal improvements to internal test cluster infra (elastic#26214)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants