Document an example how to extract HTTP responses during a Rally benchmark #791

jtibshirani · 2019-10-12T00:14:50Z

As part of developing a new search feature or optimization, I often run benchmarks by adding a new search operation to an existing rally track. I've found that it can be easy to make mistakes in setting up the benchmark:

A few times I accidentally had an error in the search operation definition, for example I misspelled the name of the field or structured the query incorrectly. Both the contender and baseline were returning no results, but I simply thought my optimization wasn't working.
Before benchmarking I write unit tests for the feature/ optimization in Elasticsearch. But it's still possible that I have an error in the search definitions so that the baseline and contender aren’t performing the same operation -- for example the contender could be returning a smaller result set.

For this use case, I wonder if it'd be useful for rally to return the number of hits from a search operation, or even the whole search response. Seeing that the baseline and contender return the same number of results (and return more than 0 results) would give confidence that I've set up the benchmark correctly and help avoid some manual testing.

danielmitterdorfer · 2019-10-18T13:16:55Z

A few times I accidentally had an error in the search operation definition, for example I misspelled the name of the field or structured the query incorrectly. Both the contender and baseline were returning no results, but I simply thought my optimization wasn't working.

By default Rally will just continue on errors and report an error rate at the end of the benchmark but I think you can already achieve what you're after by specifying --on-error=abort? It would abort the benchmark as soon as Elasticsearch returns an error and provides more context (i.e. the response body from Elasticsearch). Would that help you?

I wonder if it'd be useful for rally to return the number of hits from a search operation, or even the whole search response

Rally already returns the number of hits in the metrics metadata but you need to configure a (dedicated) Elasticsearch metrics store for that. I think it would cover most of your needs. Here is an example from our nightly benchmarks:

{
  "environment": "nightly",
  "race-id": "6cac8c34-93b4-416e-a1ea-6c4e1b87f9fd",
  "race-timestamp": "20191005T200042Z",
  "@timestamp": 1570306375355,
  "relative-time": 41959,
  "track": "geonames",
  "challenge": "append-no-conflicts",
  "car": "defaults",
  "operation": "phrase",
  "task": "phrase",
  "name": "service_time",
  "operation-type": "Search",
  "sample-type": "warmup",
  "value": 3.7241727113723755,
  "unit": "ms",
  "meta": {
    "success": true,
    "distribution_flavor": "oss",
    "source_revision": "d598083051d46e7d44f577473dde52c4a32f6776",
    "distribution_version": "8.0.0-SNAPSHOT",
    "tag_license": "oss",
    "timed_out": false,
    "took": 1,
    "hits": 58,
    "hits_relation": "eq"
  }
}

In the meta.hits property we already track the number of hits and the meta property probably contains a few more things you're interested in.

jtibshirani · 2019-12-13T19:25:29Z

By default Rally will just continue on errors and report an error rate at the end of the benchmark but I think you can already achieve what you're after by specifying --on-error=abort?

In my use case (benchmarking different search strategies) this wouldn't help, since we usually don't error on unmapped fields. For example, if you perform a term query on a non-existent field, then we return a successful response with no hits. This is also very common behavior for aggregations. It also doesn't cover the case where I've structured the query slightly incorrectly and am not performing the operation I intended. This can happen surprisingly often given the complexity of the search DSL.

Rally already returns the number of hits in the metrics metadata but you need to configure a (dedicated) Elasticsearch metrics store for that

Thanks, I wasn't aware we already tracked that information in the metadata. The next time I'm benchmarking, I will try out my workflow using a metrics store and loop back if I have further suggestions.

talevy · 2019-12-13T19:50:55Z

I, too, would like to have addition information about the response. When benchmarking new aggregations, I've sometimes accidentally benchmarked buggy implementations that returned incorrect results, sometimes no results, and showed really fast performance. Since these runs are black-boxes, it is tough to know that I am benchmarking what I intended. Maybe due to bugs in the implementation, or bugs in the rally track.

What if this search-response information came in the form of simply outputting a sample response from a single iteration. maybe with a flag like --print-sample-search-responses. This way it would be straightforward to inspect that the benchmark results are actual benchmarks of what you intended.

p.s. I had no idea about --on-error=abort! should this be the default?

danielmitterdorfer · 2019-12-16T15:01:39Z

Thanks for the feedback both. You can log full requests and responses with Rally already today. Set the log level to "DEBUG" for the logger "elasticsearch" in ~/.rally/logging.json (but beware that this will very likely cause a client-side bottleneck so double-check that you reset this again when you're interested in the performance metrics). If you then run your benchmark you'll see output like the following:

2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG > {"sort":[{"geonameid":"asc"}],"query":{"match_all":{}}}
2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG < {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1000,"relation":"eq"},"max_score":null,"hits":[{"_index":"geonames","_type":"_doc","_id":"Lb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2986043, "name": "Pic de Font Blanca", "asciiname": "Pic de Font Blanca", "alternatenames": "Pic de Font Blanca,Pic du Port", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2860", "timezone": "Europe/Andorra", "location": [1.53335, 42.64991]},"sort":[2986043]},{"_index":"geonames","_type":"_doc","_id":"Lr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2993838, "name": "Pic de Mil-Menut", "asciiname": "Pic de Mil-Menut", "alternatenames": "Pic de Mil-Menut", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "B3", "admin2_code": "09", "admin3_code": "091", "admin4_code": "09024", "population": 0, "dem": "2138", "timezone": "Europe/Andorra", "location": [1.65, 42.63333]},"sort":[2993838]},{"_index":"geonames","_type":"_doc","_id":"L781D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2994701, "name": "Roc Mélé", "asciiname": "Roc Mele", "alternatenames": "Roc Mele,Roc Meler,Roc Mélé", "feature_class": "T", "feature_code": "MT", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "00", "population": 0, "dem": "2803", "timezone": "Europe/Andorra", "location": [1.74028, 42.58765]},"sort":[2994701]},{"_index":"geonames","_type":"_doc","_id":"ML81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3007683, "name": "Pic des Langounelles", "asciiname": "Pic des Langounelles", "alternatenames": "Pic des Langounelles", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "00", "population": 0, "dem": "2685", "timezone": "Europe/Andorra", "location": [1.47364, 42.61203]},"sort":[3007683]},{"_index":"geonames","_type":"_doc","_id":"Mb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3017832, "name": "Pic de les Abelletes", "asciiname": "Pic de les Abelletes", "alternatenames": "Pic de la Font-Negre,Pic de la Font-Nègre,Pic de les Abelletes", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "FR", "admin1_code": "A9", "admin2_code": "66", "admin3_code": "663", "admin4_code": "66146", "population": 0, "dem": "2411", "timezone": "Europe/Andorra", "location": [1.73343, 42.52535]},"sort":[3017832]},{"_index":"geonames","_type":"_doc","_id":"Mr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3017833, "name": "Estany de les Abelletes", "asciiname": "Estany de les Abelletes", "alternatenames": "Estany de les Abelletes,Etang de Font-Negre,Étang de Font-Nègre", "feature_class": "H", "feature_code": "LK", "country_code": "AD", "cc2": "FR", "admin1_code": "A9", "population": 0, "dem": "2260", "timezone": "Europe/Andorra", "location": [1.73362, 42.52915]},"sort":[3017833]},{"_index":"geonames","_type":"_doc","_id":"M781D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3023203, "name": "Port Vieux de la Coume d’Ose", "asciiname": "Port Vieux de la Coume d'Ose", "alternatenames": "Port Vieux de Coume d'Ose,Port Vieux de Coume d’Ose,Port Vieux de la Coume d'Ose,Port Vieux de la Coume d’Ose", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2687", "timezone": "Europe/Andorra", "location": [1.61823, 42.62568]},"sort":[3023203]},{"_index":"geonames","_type":"_doc","_id":"NL81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3029315, "name": "Port de la Cabanette", "asciiname": "Port de la Cabanette", "alternatenames": "Port de la Cabanette,Porteille de la Cabanette", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "B3", "admin2_code": "09", "admin3_code": "091", "admin4_code": "09139", "population": 0, "dem": "2379", "timezone": "Europe/Andorra", "location": [1.73333, 42.6]},"sort":[3029315]},{"_index":"geonames","_type":"_doc","_id":"Nb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3034945, "name": "Port Dret", "asciiname": "Port Dret", "alternatenames": "Port Dret,Port de Bareites,Port de las Bareytes,Port des Bareytes", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2660", "timezone": "Europe/Andorra", "location": [1.45562, 42.60172]},"sort":[3034945]},{"_index":"geonames","_type":"_doc","_id":"Nr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3038814, "name": "Costa de Xurius", "asciiname": "Costa de Xurius", "feature_class": "T", "feature_code": "SLP", "country_code": "AD", "admin1_code": "07", "population": 0, "dem": "1839", "timezone": "Europe/Andorra", "location": [1.47569, 42.50692]},"sort":[3038814]}]}}

p.s. I had no idea about --on-error=abort! should this be the default?

It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections and we just want to record that we went too far but not necessarily abort benchmarking.

In any case I sense we should improve the docs to include examples for such scenarios?

talevy · 2019-12-16T21:12:16Z

In any case I sense we should improve the docs to include examples for such scenarios?

Knowledge of this via docs sounds like it would have solved my mistakes in usage. +1 to examples in the docs. thanks for the info!

talevy · 2019-12-16T21:14:36Z

It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections

is this the norm rather than the exception?

danielmitterdorfer · 2019-12-18T13:04:42Z

It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections
is this the norm rather than the exception?

Honestly, I am hesitant to change the default behavior for this flag. It is common among load testing tools to continue on errors (e.g. JMeter does this as well by default) and in our experiments we usually want to continue a benchmark even when an error is indicated (by an error rate > 0 in the report), e.g. if one out many benchmarked queries fails e.g. due to changed syntax or tighter restrictions all of the other benchmark results are usually still fine and we would not want to lose them (btw, this is the reason we run Rally's integration tests in CI with the stricter --on-error=abort to spot such issues before the problem occurs in the nightly benchmarks). However, we then also investigate and try to eliminate / understand the cause.

danielmitterdorfer · 2019-12-18T15:01:47Z

"Implementation" notes: We should add a new page to our documentation that describes tips & tricks when benchmarking with specific examples (in a "recipe" style). The purpose of this example is to explain how to get request and response when running a benchmark with Rally (see my comment above how it's done). We should also mention the caveats (debug logging should not be enabled in a "real" benchmark).

gingerwizard · 2020-08-20T09:32:53Z

Propose adding the following to recipes:

Identifying when errors have been encountered

Include options such as configuring a metrics store and how to determine if errors were encountered.
Why we don't fail on default but how this behaviour can be induced.

How to capture query responses and hit counts

Setting debug level, emphasising this may impact the efficacy of benchmarking metrics.
Configuring a metrics store and how to determine hit counts

danielmitterdorfer added the feedback needed An open question blocks progress label Oct 18, 2019

danielmitterdorfer added :Docs Changes to the documentation enhancement Improves the status quo and removed feedback needed An open question blocks progress labels Dec 18, 2019

danielmitterdorfer added this to the 1.x milestone Dec 18, 2019

danielmitterdorfer added the help wanted We'd be happy about a community contribution label Dec 18, 2019

danielmitterdorfer changed the title ~~Return information from the search response in rally output~~ Document an example how to extract HTTP responses during a Rally benchmark Dec 18, 2019

danielmitterdorfer added the good first issue Small, contained changes that are good for newcomers label Jan 13, 2020

dliappis assigned gingerwizard Aug 20, 2020

gingerwizard mentioned this issue Aug 20, 2020

Docs - hints for handling errors and identifying queries and responses #1049

Merged

gingerwizard closed this as completed in #1049 Aug 21, 2020

danielmitterdorfer modified the milestones: 2.x, 2.0.2 Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document an example how to extract HTTP responses during a Rally benchmark #791

Document an example how to extract HTTP responses during a Rally benchmark #791

jtibshirani commented Oct 12, 2019 •

edited

Loading

danielmitterdorfer commented Oct 18, 2019

jtibshirani commented Dec 13, 2019

talevy commented Dec 13, 2019

danielmitterdorfer commented Dec 16, 2019 •

edited

Loading

talevy commented Dec 16, 2019

talevy commented Dec 16, 2019

danielmitterdorfer commented Dec 18, 2019

danielmitterdorfer commented Dec 18, 2019

gingerwizard commented Aug 20, 2020

Document an example how to extract HTTP responses during a Rally benchmark #791

Document an example how to extract HTTP responses during a Rally benchmark #791

Comments

jtibshirani commented Oct 12, 2019 • edited Loading

danielmitterdorfer commented Oct 18, 2019

jtibshirani commented Dec 13, 2019

talevy commented Dec 13, 2019

danielmitterdorfer commented Dec 16, 2019 • edited Loading

talevy commented Dec 16, 2019

talevy commented Dec 16, 2019

danielmitterdorfer commented Dec 18, 2019

danielmitterdorfer commented Dec 18, 2019

gingerwizard commented Aug 20, 2020

jtibshirani commented Oct 12, 2019 •

edited

Loading

danielmitterdorfer commented Dec 16, 2019 •

edited

Loading