Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document an example how to extract HTTP responses during a Rally benchmark #791

Closed
jtibshirani opened this issue Oct 12, 2019 · 9 comments · Fixed by #1049
Closed

Document an example how to extract HTTP responses during a Rally benchmark #791

jtibshirani opened this issue Oct 12, 2019 · 9 comments · Fixed by #1049
Assignees
Labels
:Docs Changes to the documentation enhancement Improves the status quo good first issue Small, contained changes that are good for newcomers help wanted We'd be happy about a community contribution
Milestone

Comments

@jtibshirani
Copy link

jtibshirani commented Oct 12, 2019

As part of developing a new search feature or optimization, I often run benchmarks by adding a new search operation to an existing rally track. I've found that it can be easy to make mistakes in setting up the benchmark:

  • A few times I accidentally had an error in the search operation definition, for example I misspelled the name of the field or structured the query incorrectly. Both the contender and baseline were returning no results, but I simply thought my optimization wasn't working.
  • Before benchmarking I write unit tests for the feature/ optimization in Elasticsearch. But it's still possible that I have an error in the search definitions so that the baseline and contender aren’t performing the same operation -- for example the contender could be returning a smaller result set.

For this use case, I wonder if it'd be useful for rally to return the number of hits from a search operation, or even the whole search response. Seeing that the baseline and contender return the same number of results (and return more than 0 results) would give confidence that I've set up the benchmark correctly and help avoid some manual testing.

@danielmitterdorfer danielmitterdorfer added the feedback needed An open question blocks progress label Oct 18, 2019
@danielmitterdorfer
Copy link
Member

A few times I accidentally had an error in the search operation definition, for example I misspelled the name of the field or structured the query incorrectly. Both the contender and baseline were returning no results, but I simply thought my optimization wasn't working.

By default Rally will just continue on errors and report an error rate at the end of the benchmark but I think you can already achieve what you're after by specifying --on-error=abort? It would abort the benchmark as soon as Elasticsearch returns an error and provides more context (i.e. the response body from Elasticsearch). Would that help you?

I wonder if it'd be useful for rally to return the number of hits from a search operation, or even the whole search response

Rally already returns the number of hits in the metrics metadata but you need to configure a (dedicated) Elasticsearch metrics store for that. I think it would cover most of your needs. Here is an example from our nightly benchmarks:

{
  "environment": "nightly",
  "race-id": "6cac8c34-93b4-416e-a1ea-6c4e1b87f9fd",
  "race-timestamp": "20191005T200042Z",
  "@timestamp": 1570306375355,
  "relative-time": 41959,
  "track": "geonames",
  "challenge": "append-no-conflicts",
  "car": "defaults",
  "operation": "phrase",
  "task": "phrase",
  "name": "service_time",
  "operation-type": "Search",
  "sample-type": "warmup",
  "value": 3.7241727113723755,
  "unit": "ms",
  "meta": {
    "success": true,
    "distribution_flavor": "oss",
    "source_revision": "d598083051d46e7d44f577473dde52c4a32f6776",
    "distribution_version": "8.0.0-SNAPSHOT",
    "tag_license": "oss",
    "timed_out": false,
    "took": 1,
    "hits": 58,
    "hits_relation": "eq"
  }
}

In the meta.hits property we already track the number of hits and the meta property probably contains a few more things you're interested in.

@jtibshirani
Copy link
Author

By default Rally will just continue on errors and report an error rate at the end of the benchmark but I think you can already achieve what you're after by specifying --on-error=abort?

In my use case (benchmarking different search strategies) this wouldn't help, since we usually don't error on unmapped fields. For example, if you perform a term query on a non-existent field, then we return a successful response with no hits. This is also very common behavior for aggregations. It also doesn't cover the case where I've structured the query slightly incorrectly and am not performing the operation I intended. This can happen surprisingly often given the complexity of the search DSL.

Rally already returns the number of hits in the metrics metadata but you need to configure a (dedicated) Elasticsearch metrics store for that

Thanks, I wasn't aware we already tracked that information in the metadata. The next time I'm benchmarking, I will try out my workflow using a metrics store and loop back if I have further suggestions.

@talevy
Copy link

talevy commented Dec 13, 2019

I, too, would like to have addition information about the response. When benchmarking new aggregations, I've sometimes accidentally benchmarked buggy implementations that returned incorrect results, sometimes no results, and showed really fast performance. Since these runs are black-boxes, it is tough to know that I am benchmarking what I intended. Maybe due to bugs in the implementation, or bugs in the rally track.

What if this search-response information came in the form of simply outputting a sample response from a single iteration. maybe with a flag like --print-sample-search-responses. This way it would be straightforward to inspect that the benchmark results are actual benchmarks of what you intended.

p.s. I had no idea about --on-error=abort! should this be the default?

@danielmitterdorfer
Copy link
Member

danielmitterdorfer commented Dec 16, 2019

Thanks for the feedback both. You can log full requests and responses with Rally already today. Set the log level to "DEBUG" for the logger "elasticsearch" in ~/.rally/logging.json (but beware that this will very likely cause a client-side bottleneck so double-check that you reset this again when you're interested in the performance metrics). If you then run your benchmark you'll see output like the following:

2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG > {"sort":[{"geonameid":"asc"}],"query":{"match_all":{}}}
2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG < {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1000,"relation":"eq"},"max_score":null,"hits":[{"_index":"geonames","_type":"_doc","_id":"Lb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2986043, "name": "Pic de Font Blanca", "asciiname": "Pic de Font Blanca", "alternatenames": "Pic de Font Blanca,Pic du Port", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2860", "timezone": "Europe/Andorra", "location": [1.53335, 42.64991]},"sort":[2986043]},{"_index":"geonames","_type":"_doc","_id":"Lr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2993838, "name": "Pic de Mil-Menut", "asciiname": "Pic de Mil-Menut", "alternatenames": "Pic de Mil-Menut", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "B3", "admin2_code": "09", "admin3_code": "091", "admin4_code": "09024", "population": 0, "dem": "2138", "timezone": "Europe/Andorra", "location": [1.65, 42.63333]},"sort":[2993838]},{"_index":"geonames","_type":"_doc","_id":"L781D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2994701, "name": "Roc Mélé", "asciiname": "Roc Mele", "alternatenames": "Roc Mele,Roc Meler,Roc Mélé", "feature_class": "T", "feature_code": "MT", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "00", "population": 0, "dem": "2803", "timezone": "Europe/Andorra", "location": [1.74028, 42.58765]},"sort":[2994701]},{"_index":"geonames","_type":"_doc","_id":"ML81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3007683, "name": "Pic des Langounelles", "asciiname": "Pic des Langounelles", "alternatenames": "Pic des Langounelles", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "00", "population": 0, "dem": "2685", "timezone": "Europe/Andorra", "location": [1.47364, 42.61203]},"sort":[3007683]},{"_index":"geonames","_type":"_doc","_id":"Mb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3017832, "name": "Pic de les Abelletes", "asciiname": "Pic de les Abelletes", "alternatenames": "Pic de la Font-Negre,Pic de la Font-Nègre,Pic de les Abelletes", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "FR", "admin1_code": "A9", "admin2_code": "66", "admin3_code": "663", "admin4_code": "66146", "population": 0, "dem": "2411", "timezone": "Europe/Andorra", "location": [1.73343, 42.52535]},"sort":[3017832]},{"_index":"geonames","_type":"_doc","_id":"Mr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3017833, "name": "Estany de les Abelletes", "asciiname": "Estany de les Abelletes", "alternatenames": "Estany de les Abelletes,Etang de Font-Negre,Étang de Font-Nègre", "feature_class": "H", "feature_code": "LK", "country_code": "AD", "cc2": "FR", "admin1_code": "A9", "population": 0, "dem": "2260", "timezone": "Europe/Andorra", "location": [1.73362, 42.52915]},"sort":[3017833]},{"_index":"geonames","_type":"_doc","_id":"M781D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3023203, "name": "Port Vieux de la Coume d’Ose", "asciiname": "Port Vieux de la Coume d'Ose", "alternatenames": "Port Vieux de Coume d'Ose,Port Vieux de Coume d’Ose,Port Vieux de la Coume d'Ose,Port Vieux de la Coume d’Ose", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2687", "timezone": "Europe/Andorra", "location": [1.61823, 42.62568]},"sort":[3023203]},{"_index":"geonames","_type":"_doc","_id":"NL81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3029315, "name": "Port de la Cabanette", "asciiname": "Port de la Cabanette", "alternatenames": "Port de la Cabanette,Porteille de la Cabanette", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "B3", "admin2_code": "09", "admin3_code": "091", "admin4_code": "09139", "population": 0, "dem": "2379", "timezone": "Europe/Andorra", "location": [1.73333, 42.6]},"sort":[3029315]},{"_index":"geonames","_type":"_doc","_id":"Nb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3034945, "name": "Port Dret", "asciiname": "Port Dret", "alternatenames": "Port Dret,Port de Bareites,Port de las Bareytes,Port des Bareytes", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2660", "timezone": "Europe/Andorra", "location": [1.45562, 42.60172]},"sort":[3034945]},{"_index":"geonames","_type":"_doc","_id":"Nr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3038814, "name": "Costa de Xurius", "asciiname": "Costa de Xurius", "feature_class": "T", "feature_code": "SLP", "country_code": "AD", "admin1_code": "07", "population": 0, "dem": "1839", "timezone": "Europe/Andorra", "location": [1.47569, 42.50692]},"sort":[3038814]}]}}

p.s. I had no idea about --on-error=abort! should this be the default?

It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections and we just want to record that we went too far but not necessarily abort benchmarking.

In any case I sense we should improve the docs to include examples for such scenarios?

@talevy
Copy link

talevy commented Dec 16, 2019

In any case I sense we should improve the docs to include examples for such scenarios?

Knowledge of this via docs sounds like it would have solved my mistakes in usage. +1 to examples in the docs. thanks for the info!

@talevy
Copy link

talevy commented Dec 16, 2019

It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections

is this the norm rather than the exception?

@danielmitterdorfer
Copy link
Member

It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections
is this the norm rather than the exception?

Honestly, I am hesitant to change the default behavior for this flag. It is common among load testing tools to continue on errors (e.g. JMeter does this as well by default) and in our experiments we usually want to continue a benchmark even when an error is indicated (by an error rate > 0 in the report), e.g. if one out many benchmarked queries fails e.g. due to changed syntax or tighter restrictions all of the other benchmark results are usually still fine and we would not want to lose them (btw, this is the reason we run Rally's integration tests in CI with the stricter --on-error=abort to spot such issues before the problem occurs in the nightly benchmarks). However, we then also investigate and try to eliminate / understand the cause.

@danielmitterdorfer danielmitterdorfer added :Docs Changes to the documentation enhancement Improves the status quo and removed feedback needed An open question blocks progress labels Dec 18, 2019
@danielmitterdorfer danielmitterdorfer added this to the 1.x milestone Dec 18, 2019
@danielmitterdorfer danielmitterdorfer added the help wanted We'd be happy about a community contribution label Dec 18, 2019
@danielmitterdorfer
Copy link
Member

"Implementation" notes: We should add a new page to our documentation that describes tips & tricks when benchmarking with specific examples (in a "recipe" style). The purpose of this example is to explain how to get request and response when running a benchmark with Rally (see my comment above how it's done). We should also mention the caveats (debug logging should not be enabled in a "real" benchmark).

@danielmitterdorfer danielmitterdorfer changed the title Return information from the search response in rally output Document an example how to extract HTTP responses during a Rally benchmark Dec 18, 2019
@danielmitterdorfer danielmitterdorfer added the good first issue Small, contained changes that are good for newcomers label Jan 13, 2020
@gingerwizard
Copy link

Propose adding the following to recipes:

  1. Identifying when errors have been encountered
  • Include options such as configuring a metrics store and how to determine if errors were encountered.
  • Why we don't fail on default but how this behaviour can be induced.
  1. How to capture query responses and hit counts
  • Setting debug level, emphasising this may impact the efficacy of benchmarking metrics.
  • Configuring a metrics store and how to determine hit counts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Docs Changes to the documentation enhancement Improves the status quo good first issue Small, contained changes that are good for newcomers help wanted We'd be happy about a community contribution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants