Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs - hints for handling errors and identifying queries and responses #1049

Merged
merged 4 commits into from
Aug 21, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions docs/recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,3 +172,114 @@ Rally will push metrics to the metric store configured in 1. and they can be vis
To tear down everything issue ``./stop.sh``.

It is possible to specify a different version of Elasticsearch for step 3. by setting ``export ES_VERSION=<the_desired_version>``.

Identifying when errors have been encountered
--------------------------------------------------------------

Custom track development can be error prone especially if you are testing a new query. A number of reasons can lead to queries returning errors.

Consider a simple example Rally operation::

{
"name": "geo_distance",
"operation-type": "search",
"index": "logs-*",
"body": {
"query": {
"geo_distance": {
"distance": "12km",
"source.geo.location": "40,-70"
}
}
}
}

This query requires the field ``source.geo.location`` to be mapped as a ``geo_point`` type. If incorrectly mapped, Elasticsearch will respond with an error.

Rally will not exit on errors (unless fatal e.g. [http://man7.org/linux/man-pages/man2/connect.2.html](ECONNREFUSED)) by default, instead reporting errors in the summary report via the [Error Rate](https://esrally.readthedocs.io/en/stable/summary_report.html?highlight=on-error#error-rate) statistic. This can potentially leading to misleading results. This behavior is by design and consistent with other load testing tools such as JMeter i.e. In most cases it is desirable that a large long running benchmark should not fail because of a single error response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links in rst should look like:

`ECONNREFUSED <https://man7.org/linux/man-pages/man2/connect.2.html>`_

See https://raw.githubusercontent.com/elastic/rally/master/docs/adding_tracks.rst for examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the internal link to the summary_report.html page, one way to do it is to add a cross-reference link just before the Error rate\n------- in https://raw.githubusercontent.com/elastic/rally/master/docs/summary_report.rst like below: (see also an existing example in install.rst)

$ git diff docs/summary_report.rst
diff --git a/docs/summary_report.rst b/docs/summary_report.rst
index 157a9e2..58891f6 100644
--- a/docs/summary_report.rst
+++ b/docs/summary_report.rst
@@ -187,6 +187,8 @@ Rally reports several percentile numbers for each task. Which percentiles are sh
 * **Definition**: Time period between start of request processing and receiving the complete response. This metric can easily be mixed up with ``latency`` but does not include waiting time. This is what most load testing tools refer to as "latency" (although it is incorrect).
 * **Corresponding metrics key**: ``service_time``

+.. _summary_report_error_rate:
+
 Error rate
 ---------- 

With that done, you can simple reference it using:

instead reporting errors in the summary report via the :ref:`Error Rate <summary_report_error_rate>` statistic.

By the way you can easily preview your docs by using make docs and then preview the locally generated files under docs/_build/html/recipes.html.


This behavior can also be changed, by invoking Rally with the [--on-error](https://esrally.readthedocs.io/en/stable/command_line_reference.html?highlight=on-error#on-error) switch e.g.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comments on how to build internal cross-references.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is a leading whitespace here, ruining formatting.


esrally --track=geonames --on-error=abort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renders weirdly (try make docs as mentioned above), what if we just merge it with the previous line like

... switch e.g. ``esrally --track=geonames --on-error=abort``.


Errors can also be investigated if you have configured a [dedicated Elasticsearch metrics store](https://esrally.readthedocs.io/en/stable/configuration.html#advanced-configuration).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for cross-reference links.


Checking Queries and Responses
--------------------------------------------------------------

As described above, errors can lead to misleading benchmarking results. Some issues, however, are more subtle and the result of queries not behaving and matching as intended.

Consider the following simple Rally operation::

{
"name": "geo_distance",
"operation-type": "search",
"index": "logs-*",
"body": {
"query": {
"term": {
"http.request.method": {
"value": "GET"
}
}
}
}
}

For this term query to match the field ``http.request.method`` needs to be type `keyword`. Should this field be [dynamically mapped](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html), its default type will be ``text`` causing the value `GET` to be [analyzed](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html), and indexed as `get`. The above query will in turn return `0` hits. The field should either be correctly mapped or the query modified to match on `http.request.method.keyword`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comment (with ECONNREFUSED ) about how to use external links. Should have mentioned earlier that https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html will be useful. For code samples we use:

``code sample``


Issues such as this can lead to misleading benchmarking results. Prior to running any benchmarks for analysis, we therefore recommended users ascertain whether queries are behaving as intended. Rally provides several tools to assist with this.

Firstly, users can modify the [logging level](https://esrally.readthedocs.io/en/stable/configuration.html?highlight=logging#logging) of Rally to `DEBUG`. Specifically, modify the ``elasticsearch`` logger i.e.::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment like above for cross-references.


"loggers": {
"elasticsearch": {
"handlers": ["rally_log_handler"],
"level": "DEBUG",
"propagate": false
},
"rally.profile": {
"handlers": ["rally_profile_handler"],
"level": "INFO",
"propagate": false
}
}

This will inturn ensure logs include the Elasticsearch query and accompanying response e.g.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/inturn/in turn ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you need to end the line in :: so that the next lines are properly formatted.


2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG > {"sort":[{"geonameid":"asc"}],"query":{"match_all":{}}}
2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG < {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1000,"relation":"eq"},"max_score":null,"hits":[{"_index":"geonames","_type":"_doc","_id":"Lb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2986043, "name": "Pic de Font Blanca", "asciiname": "Pic de Font Blanca", "alternatenames": "Pic de Font Blanca,Pic du Port", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2860", "timezone": "Europe/Andorra", "location": [1.53335, 42.64991]},"sort":[2986043]},

Users should discard any performance metrics collected from a benchmark with DEBUG logging. This will likely cause a client-side bottleneck so once the correctness of the queries have been established, disable this setting and re-run any benchmarks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/DEBUG/``DEBUG``?

also once the correctness of the queries have been established -> once the correctness of the queries has been established?


The number of hits from queries can also be investigated if you have configured a [dedicated Elasticsearch metrics store](https://esrally.readthedocs.io/en/stable/configuration.html#advanced-configuration). Specifically, documents within the index pattern ``rally-metrics-*`` contain a ``meta`` field with summary of individual responses e.g.::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above for cross-references.


{
"@timestamp" : 1597681313435,
"relative-time" : 130273374,
"race-id" : "452ad9d7-9c21-4828-848e-89974af3230e",
"race-timestamp" : "20200817T160412Z",
"environment" : "Personal",
"track" : "geonames",
"challenge" : "append-no-conflicts",
"car" : "defaults",
"name" : "latency",
"value" : 270.77871300025436,
"unit" : "ms",
"sample-type" : "warmup",
"meta" : {
"source_revision" : "757314695644ea9a1dc2fecd26d1a43856725e65",
"distribution_version" : "7.8.0",
"distribution_flavor" : "oss",
"pages" : 25,
"hits" : 11396503,
"hits_relation" : "eq",
"timed_out" : false,
"took" : 110,
"success" : true
},
"task" : "scroll",
"operation" : "scroll",
"operation-type" : "Search"
}