Skip to content

Commit

Permalink
Docs - hints for handling errors and identifying queries and responses (
Browse files Browse the repository at this point in the history
#1049)

* Hints for handling errors and identifying queries and responses

* Fix formatting errors

* Fix links and formatting

* Grammatical fixes
  • Loading branch information
Dale McDiarmid authored Aug 21, 2020
1 parent be8622a commit 030f71b
Show file tree
Hide file tree
Showing 4 changed files with 119 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/command_line_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,8 @@ Client certificates can be presented regardless of the ``verify_certs`` setting,
* Enable SSL, verify server certificates using private CA: ``--client-options="use_ssl:true,verify_certs:true,ca_certs:'/path/to/cacert.pem'"``
* Enable SSL, verify server certificates using private CA, present client certificates: ``--client-options="use_ssl:true,verify_certs:true,ca_certs:'/path/to/cacert.pem',client_cert:'/path/to/client_cert.pem',client_key:'/path/to/client_key.pem'"``

.. _command_line_reference_on_error:

``on-error``
~~~~~~~~~~~~

Expand Down
4 changes: 4 additions & 0 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ Let's go through an example step by step: First run ``esrally``::

Congratulations! Time to :doc:`run your first benchmark </race>`.

.. _advanced_configuration:

Advanced Configuration
----------------------

Expand Down Expand Up @@ -112,6 +114,8 @@ To verify that Rally will connect via the proxy server you can check the log fil

Rally will use this proxy server only for downloading benchmark-related data. It will not use this proxy for the actual benchmark.

.. _logging:

Logging
-------

Expand Down
111 changes: 111 additions & 0 deletions docs/recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,3 +172,114 @@ Rally will push metrics to the metric store configured in 1. and they can be vis
To tear down everything issue ``./stop.sh``.

It is possible to specify a different version of Elasticsearch for step 3. by setting ``export ES_VERSION=<the_desired_version>``.

Identifying when errors have been encountered
--------------------------------------------------------------

Custom track development can be error prone especially if you are testing a new query. A number of reasons can lead to queries returning errors.

Consider a simple example Rally operation::

{
"name": "geo_distance",
"operation-type": "search",
"index": "logs-*",
"body": {
"query": {
"geo_distance": {
"distance": "12km",
"source.geo.location": "40,-70"
}
}
}
}

This query requires the field ``source.geo.location`` to be mapped as a ``geo_point`` type. If incorrectly mapped, Elasticsearch will respond with an error.

Rally will not exit on errors (unless fatal e.g. `ECONNREFUSED <http://man7.org/linux/man-pages/man2/connect.2.html>`_) by default, instead reporting errors in the summary report via the :ref:`Error Rate <summary_report_error_rate>` statistic. This can potentially leading to misleading results. This behavior is by design and consistent with other load testing tools such as JMeter i.e. In most cases it is desirable that a large long running benchmark should not fail because of a single error response.

This behavior can also be changed, by invoking Rally with the :ref:`--on-error <command_line_reference_on_error>` switch e.g.::

esrally --track=geonames --on-error=abort
Errors can also be investigated if you have configured a :ref:`dedicated Elasticsearch metrics store <advanced_configuration>`.

Checking Queries and Responses
--------------------------------------------------------------

As described above, errors can lead to misleading benchmarking results. Some issues, however, are more subtle and the result of queries not behaving and matching as intended.

Consider the following simple Rally operation::

{
"name": "geo_distance",
"operation-type": "search",
"index": "logs-*",
"body": {
"query": {
"term": {
"http.request.method": {
"value": "GET"
}
}
}
}
}

For this term query to match the field ``http.request.method`` needs to be type ``keyword``. Should this field be `dynamically mapped <https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html>`_, its default type will be ``text`` causing the value ``GET`` to be `analyzed <https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html>`_, and indexed as ``get``. The above query will in turn return ``0`` hits. The field should either be correctly mapped or the query modified to match on ``http.request.method.keyword``.

Issues such as this can lead to misleading benchmarking results. Prior to running any benchmarks for analysis, we therefore recommended users ascertain whether queries are behaving as intended. Rally provides several tools to assist with this.

Firstly, users can set the :ref:`log level <logging>` for the Elasticsearch client to ``DEBUG`` i.e.::

"loggers": {
"elasticsearch": {
"handlers": ["rally_log_handler"],
"level": "DEBUG",
"propagate": false
},
"rally.profile": {
"handlers": ["rally_profile_handler"],
"level": "INFO",
"propagate": false
}
}

This will in turn ensure logs include the Elasticsearch query and accompanying response e.g.::

2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG > {"sort":[{"geonameid":"asc"}],"query":{"match_all":{}}}
2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG < {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1000,"relation":"eq"},"max_score":null,"hits":[{"_index":"geonames","_type":"_doc","_id":"Lb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2986043, "name": "Pic de Font Blanca", "asciiname": "Pic de Font Blanca", "alternatenames": "Pic de Font Blanca,Pic du Port", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2860", "timezone": "Europe/Andorra", "location": [1.53335, 42.64991]},"sort":[2986043]},

Users should discard any performance metrics collected from a benchmark with ``DEBUG`` logging. This will likely cause a client-side bottleneck so once the correctness of the queries has been established, disable this setting and re-run any benchmarks.

The number of hits from queries can also be investigated if you have configured a :ref:`dedicated Elasticsearch metrics store <advanced_configuration>`. Specifically, documents within the index pattern ``rally-metrics-*`` contain a ``meta`` field with a summary of individual responses e.g.::

{
"@timestamp" : 1597681313435,
"relative-time" : 130273374,
"race-id" : "452ad9d7-9c21-4828-848e-89974af3230e",
"race-timestamp" : "20200817T160412Z",
"environment" : "Personal",
"track" : "geonames",
"challenge" : "append-no-conflicts",
"car" : "defaults",
"name" : "latency",
"value" : 270.77871300025436,
"unit" : "ms",
"sample-type" : "warmup",
"meta" : {
"source_revision" : "757314695644ea9a1dc2fecd26d1a43856725e65",
"distribution_version" : "7.8.0",
"distribution_flavor" : "oss",
"pages" : 25,
"hits" : 11396503,
"hits_relation" : "eq",
"timed_out" : false,
"took" : 110,
"success" : true
},
"task" : "scroll",
"operation" : "scroll",
"operation-type" : "Search"
}

2 changes: 2 additions & 0 deletions docs/summary_report.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,8 @@ Rally reports several percentile numbers for each task. Which percentiles are sh
* **Definition**: Time period between start of request processing and receiving the complete response. This metric can easily be mixed up with ``latency`` but does not include waiting time. This is what most load testing tools refer to as "latency" (although it is incorrect).
* **Corresponding metrics key**: ``service_time``

.. _summary_report_error_rate:

Error rate
----------

Expand Down

0 comments on commit 030f71b

Please sign in to comment.