Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove shortcutTotalHitCount optimization #89047

Merged

Conversation

javanna
Copy link
Member

@javanna javanna commented Aug 2, 2022

Our TopDocsCollectorContext has an optimization to try and avoid counting total hit count for queries like match all docs, term query and field exists query, relying on the statistics from each segment instead. This optimization has been recently streamlined in lucene through the introduction of Weight#count and then leveraged directly by TotalHitCountCollector in lucene with https://issues.apache.org/jira/browse/LUCENE-10620 , later complemented by #88396 within Elasticsearch.

With this, we can remove the internal optimization and instead leverage the default lucene behaviour which covers more queries and will be possibly expanded in the future as well.

Closes #81034

Our TopDocsCollectorContext has an optimization to try and avoid counting total hit count for queries like match all docs, term query and field exists query, relying on the statistics from each segment instead. This optimization has been recently streamlined in lucene through the introduction of Weight#count and now leveraged directly by TotalHitCountCollector in lucene with https://issues.apache.org/jira/browse/LUCENE-10620 , later complemented by elastic#88396 within Elasticsearch.

With this, we can remove this internal optimization and instead leverage the default lucene behaviour which covers more queries and will be possibly expanded in the future as well.

Closes elastic#81034
@javanna javanna added :Search/Search Search-related issues that do not fall into other categories >refactoring labels Aug 2, 2022
@javanna javanna requested review from jpountz and dnhatn August 2, 2022 19:44
@javanna javanna marked this pull request as ready for review August 2, 2022 19:45
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Aug 2, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @javanna!

totalHitsSupplier = () -> topDocsSupplier.get().totalHits;
} else {
// don't compute hit counts via the collector
topDocsCollector = createCollector(sortAndFormats, numHits, searchAfter, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, this is the only place where I think we may be seeing regressions. Before your change, we would tell the top docs collector that it doesn't have to count hits at all (the 1 here) since we computed it up-front. With your change, we would always count trackTotalHitsUpTo documents, which delays a bit skipping.

Maybe we could run benchmarks with the geonames track to quantify the impact. I'll be especially interested on the impact on the default and term queries.

@csoulios csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@javanna
Copy link
Member Author

javanna commented Dec 20, 2022

I revived this PR and ran the geonames benchmarks. Nothing from the benchmarks results caught my eye, can you double check too @jpountz ? Would you like me to run other benchmarks?

Baseline (current main without my change):

|---------------------------------------------------------------:|-------------------------------:|----------------:|--------:|
|                     Cumulative indexing time of primary shards |                                |    14.7545      |     min |
|             Min cumulative indexing time across primary shards |                                |     2.9129      |     min |
|          Median cumulative indexing time across primary shards |                                |     2.96518     |     min |
|             Max cumulative indexing time across primary shards |                                |     2.99017     |     min |
|            Cumulative indexing throttle time of primary shards |                                |     0           |     min |
|    Min cumulative indexing throttle time across primary shards |                                |     0           |     min |
| Median cumulative indexing throttle time across primary shards |                                |     0           |     min |
|    Max cumulative indexing throttle time across primary shards |                                |     0           |     min |
|                        Cumulative merge time of primary shards |                                |     0.167567    |     min |
|                       Cumulative merge count of primary shards |                                |     6           |         |
|                Min cumulative merge time across primary shards |                                |     0.0027      |     min |
|             Median cumulative merge time across primary shards |                                |     0.00398333  |     min |
|                Max cumulative merge time across primary shards |                                |     0.143083    |     min |
|               Cumulative merge throttle time of primary shards |                                |     0.03035     |     min |
|       Min cumulative merge throttle time across primary shards |                                |     0           |     min |
|    Median cumulative merge throttle time across primary shards |                                |     0           |     min |
|       Max cumulative merge throttle time across primary shards |                                |     0.03035     |     min |
|                      Cumulative refresh time of primary shards |                                |     1.65617     |     min |
|                     Cumulative refresh count of primary shards |                                |    48           |         |
|              Min cumulative refresh time across primary shards |                                |     0.271283    |     min |
|           Median cumulative refresh time across primary shards |                                |     0.306783    |     min |
|              Max cumulative refresh time across primary shards |                                |     0.445217    |     min |
|                        Cumulative flush time of primary shards |                                |     0.959783    |     min |
|                       Cumulative flush count of primary shards |                                |    10           |         |
|                Min cumulative flush time across primary shards |                                |     0.140083    |     min |
|             Median cumulative flush time across primary shards |                                |     0.199433    |     min |
|                Max cumulative flush time across primary shards |                                |     0.225283    |     min |
|                                        Total Young Gen GC time |                                |     1.505       |       s |
|                                       Total Young Gen GC count |                                |    46           |         |
|                                          Total Old Gen GC time |                                |     0           |       s |
|                                         Total Old Gen GC count |                                |     0           |         |
|                                                     Store size |                                |     2.96172     |      GB |
|                                                  Translog size |                                |     2.56114e-07 |      GB |
|                                         Heap used for segments |                                |     0           |      MB |
|                                       Heap used for doc values |                                |     0           |      MB |
|                                            Heap used for terms |                                |     0           |      MB |
|                                            Heap used for norms |                                |     0           |      MB |
|                                           Heap used for points |                                |     0           |      MB |
|                                    Heap used for stored fields |                                |     0           |      MB |
|                                                  Segment count |                                |    87           |         |
|                                    Total Ingest Pipeline count |                                |     0           |         |
|                                     Total Ingest Pipeline time |                                |     0           |       s |
|                                   Total Ingest Pipeline failed |                                |     0           |         |
|                                                     error rate |                   index-append |     0           |       % |
|                                       100th percentile latency |            refresh-after-index | 10493.2         |      ms |
|                                  100th percentile service time |            refresh-after-index | 10493.2         |      ms |
|                                                     error rate |            refresh-after-index |   100           |       % |
|                                                 Min Throughput |                    index-stats |    89.95        |   ops/s |
|                                                Mean Throughput |                    index-stats |    89.98        |   ops/s |
|                                              Median Throughput |                    index-stats |    89.98        |   ops/s |
|                                                 Max Throughput |                    index-stats |    89.99        |   ops/s |
|                                        50th percentile latency |                    index-stats |     2.87281     |      ms |
|                                        90th percentile latency |                    index-stats |     3.78207     |      ms |
|                                        99th percentile latency |                    index-stats |     4.25682     |      ms |
|                                      99.9th percentile latency |                    index-stats |     4.55479     |      ms |
|                                       100th percentile latency |                    index-stats |     4.58303     |      ms |
|                                   50th percentile service time |                    index-stats |     1.65261     |      ms |
|                                   90th percentile service time |                    index-stats |     1.89706     |      ms |
|                                   99th percentile service time |                    index-stats |     2.16371     |      ms |
|                                 99.9th percentile service time |                    index-stats |     2.88629     |      ms |
|                                  100th percentile service time |                    index-stats |     3.40735     |      ms |
|                                                     error rate |                    index-stats |     0           |       % |
|                                                 Min Throughput |                     node-stats |    89.78        |   ops/s |
|                                                Mean Throughput |                     node-stats |    89.92        |   ops/s |
|                                              Median Throughput |                     node-stats |    89.93        |   ops/s |
|                                                 Max Throughput |                     node-stats |    89.97        |   ops/s |
|                                        50th percentile latency |                     node-stats |     2.99969     |      ms |
|                                        90th percentile latency |                     node-stats |     4.06034     |      ms |
|                                        99th percentile latency |                     node-stats |     5.01334     |      ms |
|                                      99.9th percentile latency |                     node-stats |     6.46364     |      ms |
|                                       100th percentile latency |                     node-stats |     6.596       |      ms |
|                                   50th percentile service time |                     node-stats |     2.0371      |      ms |
|                                   90th percentile service time |                     node-stats |     2.42635     |      ms |
|                                   99th percentile service time |                     node-stats |     4.08543     |      ms |
|                                 99.9th percentile service time |                     node-stats |     4.85342     |      ms |
|                                  100th percentile service time |                     node-stats |     4.86696     |      ms |
|                                                     error rate |                     node-stats |     0           |       % |
|                                                 Min Throughput |                        default |    49.99        |   ops/s |
|                                                Mean Throughput |                        default |    49.99        |   ops/s |
|                                              Median Throughput |                        default |    49.99        |   ops/s |
|                                                 Max Throughput |                        default |    50           |   ops/s |
|                                        50th percentile latency |                        default |     3.06582     |      ms |
|                                        90th percentile latency |                        default |     4.31493     |      ms |
|                                        99th percentile latency |                        default |     4.75226     |      ms |
|                                      99.9th percentile latency |                        default |     7.30601     |      ms |
|                                       100th percentile latency |                        default |     9.2979      |      ms |
|                                   50th percentile service time |                        default |     2.03371     |      ms |
|                                   90th percentile service time |                        default |     2.34846     |      ms |
|                                   99th percentile service time |                        default |     2.72376     |      ms |
|                                 99.9th percentile service time |                        default |     6.79293     |      ms |
|                                  100th percentile service time |                        default |     8.94201     |      ms |
|                                                     error rate |                        default |     0           |       % |
|                                                 Min Throughput |                           term |    99.88        |   ops/s |
|                                                Mean Throughput |                           term |    99.92        |   ops/s |
|                                              Median Throughput |                           term |    99.93        |   ops/s |
|                                                 Max Throughput |                           term |    99.95        |   ops/s |
|                                        50th percentile latency |                           term |     2.59975     |      ms |
|                                        90th percentile latency |                           term |     3.03838     |      ms |
|                                        99th percentile latency |                           term |     3.4622      |      ms |
|                                      99.9th percentile latency |                           term |     7.59964     |      ms |
|                                       100th percentile latency |                           term |    11.1005      |      ms |
|                                   50th percentile service time |                           term |     1.80133     |      ms |
|                                   90th percentile service time |                           term |     2.06376     |      ms |
|                                   99th percentile service time |                           term |     2.28196     |      ms |
|                                 99.9th percentile service time |                           term |     6.80313     |      ms |
|                                  100th percentile service time |                           term |    10.8138      |      ms |
|                                                     error rate |                           term |     0           |       % |
|                                                 Min Throughput |                         phrase |   109.65        |   ops/s |
|                                                Mean Throughput |                         phrase |   109.79        |   ops/s |
|                                              Median Throughput |                         phrase |   109.81        |   ops/s |
|                                                 Max Throughput |                         phrase |   109.86        |   ops/s |
|                                        50th percentile latency |                         phrase |     2.58116     |      ms |
|                                        90th percentile latency |                         phrase |     3.01626     |      ms |
|                                        99th percentile latency |                         phrase |     3.32541     |      ms |
|                                      99.9th percentile latency |                         phrase |    15.9282      |      ms |
|                                       100th percentile latency |                         phrase |    18.9218      |      ms |
|                                   50th percentile service time |                         phrase |     1.80603     |      ms |
|                                   90th percentile service time |                         phrase |     2.01516     |      ms |
|                                   99th percentile service time |                         phrase |     2.26491     |      ms |
|                                 99.9th percentile service time |                         phrase |    10.6534      |      ms |
|                                  100th percentile service time |                         phrase |    18.4407      |      ms |
|                                                     error rate |                         phrase |     0           |       % |
|                                                 Min Throughput |           country_agg_uncached |     3           |   ops/s |
|                                                Mean Throughput |           country_agg_uncached |     3           |   ops/s |
|                                              Median Throughput |           country_agg_uncached |     3           |   ops/s |
|                                                 Max Throughput |           country_agg_uncached |     3           |   ops/s |
|                                        50th percentile latency |           country_agg_uncached |   134.2         |      ms |
|                                        90th percentile latency |           country_agg_uncached |   145.774       |      ms |
|                                        99th percentile latency |           country_agg_uncached |   162.576       |      ms |
|                                       100th percentile latency |           country_agg_uncached |   176.927       |      ms |
|                                   50th percentile service time |           country_agg_uncached |   133.062       |      ms |
|                                   90th percentile service time |           country_agg_uncached |   144.698       |      ms |
|                                   99th percentile service time |           country_agg_uncached |   161.493       |      ms |
|                                  100th percentile service time |           country_agg_uncached |   175.857       |      ms |
|                                                     error rate |           country_agg_uncached |     0           |       % |
|                                                 Min Throughput |             country_agg_cached |    98.62        |   ops/s |
|                                                Mean Throughput |             country_agg_cached |    99.03        |   ops/s |
|                                              Median Throughput |             country_agg_cached |    99.07        |   ops/s |
|                                                 Max Throughput |             country_agg_cached |    99.3         |   ops/s |
|                                        50th percentile latency |             country_agg_cached |     2.28683     |      ms |
|                                        90th percentile latency |             country_agg_cached |     3.45561     |      ms |
|                                        99th percentile latency |             country_agg_cached |     3.83689     |      ms |
|                                      99.9th percentile latency |             country_agg_cached |     4.39798     |      ms |
|                                       100th percentile latency |             country_agg_cached |     4.6001      |      ms |
|                                   50th percentile service time |             country_agg_cached |     1.43025     |      ms |
|                                   90th percentile service time |             country_agg_cached |     1.70903     |      ms |
|                                   99th percentile service time |             country_agg_cached |     1.95273     |      ms |
|                                 99.9th percentile service time |             country_agg_cached |     2.63577     |      ms |
|                                  100th percentile service time |             country_agg_cached |     2.77928     |      ms |
|                                                     error rate |             country_agg_cached |     0           |       % |
|                                                 Min Throughput |                         scroll |    20.05        | pages/s |
|                                                Mean Throughput |                         scroll |    20.06        | pages/s |
|                                              Median Throughput |                         scroll |    20.06        | pages/s |
|                                                 Max Throughput |                         scroll |    20.07        | pages/s |
|                                        50th percentile latency |                         scroll |   123.728       |      ms |
|                                        90th percentile latency |                         scroll |   126.994       |      ms |
|                                        99th percentile latency |                         scroll |   146.606       |      ms |
|                                       100th percentile latency |                         scroll |   161.786       |      ms |
|                                   50th percentile service time |                         scroll |   121.415       |      ms |
|                                   90th percentile service time |                         scroll |   124.792       |      ms |
|                                   99th percentile service time |                         scroll |   144.102       |      ms |
|                                  100th percentile service time |                         scroll |   159.452       |      ms |
|                                                     error rate |                         scroll |     0           |       % |
|                                                 Min Throughput |                     expression |     1.5         |   ops/s |
|                                                Mean Throughput |                     expression |     1.5         |   ops/s |
|                                              Median Throughput |                     expression |     1.5         |   ops/s |
|                                                 Max Throughput |                     expression |     1.5         |   ops/s |
|                                        50th percentile latency |                     expression |   332.406       |      ms |
|                                        90th percentile latency |                     expression |   336.178       |      ms |
|                                        99th percentile latency |                     expression |   349.702       |      ms |
|                                       100th percentile latency |                     expression |   350.252       |      ms |
|                                   50th percentile service time |                     expression |   331.126       |      ms |
|                                   90th percentile service time |                     expression |   334.905       |      ms |
|                                   99th percentile service time |                     expression |   348.412       |      ms |
|                                  100th percentile service time |                     expression |   348.679       |      ms |
|                                                     error rate |                     expression |     0           |       % |
|                                                 Min Throughput |                painless_static |     1.4         |   ops/s |
|                                                Mean Throughput |                painless_static |     1.4         |   ops/s |
|                                              Median Throughput |                painless_static |     1.4         |   ops/s |
|                                                 Max Throughput |                painless_static |     1.4         |   ops/s |
|                                        50th percentile latency |                painless_static |   424.343       |      ms |
|                                        90th percentile latency |                painless_static |   431.837       |      ms |
|                                        99th percentile latency |                painless_static |   441.434       |      ms |
|                                       100th percentile latency |                painless_static |   442.237       |      ms |
|                                   50th percentile service time |                painless_static |   422.939       |      ms |
|                                   90th percentile service time |                painless_static |   430.634       |      ms |
|                                   99th percentile service time |                painless_static |   440.14        |      ms |
|                                  100th percentile service time |                painless_static |   440.874       |      ms |
|                                                     error rate |                painless_static |     0           |       % |
|                                                 Min Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                                Mean Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                              Median Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                                 Max Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                        50th percentile latency |               painless_dynamic |   431.19        |      ms |
|                                        90th percentile latency |               painless_dynamic |   435.314       |      ms |
|                                        99th percentile latency |               painless_dynamic |   447.68        |      ms |
|                                       100th percentile latency |               painless_dynamic |   451.084       |      ms |
|                                   50th percentile service time |               painless_dynamic |   429.399       |      ms |
|                                   90th percentile service time |               painless_dynamic |   433.318       |      ms |
|                                   99th percentile service time |               painless_dynamic |   446.788       |      ms |
|                                  100th percentile service time |               painless_dynamic |   450.219       |      ms |
|                                                     error rate |               painless_dynamic |     0           |       % |
|                                                 Min Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                                Mean Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                              Median Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                                 Max Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                        50th percentile latency | decay_geo_gauss_function_score |   374.378       |      ms |
|                                        90th percentile latency | decay_geo_gauss_function_score |   389.191       |      ms |
|                                        99th percentile latency | decay_geo_gauss_function_score |   391.85        |      ms |
|                                       100th percentile latency | decay_geo_gauss_function_score |   392.543       |      ms |
|                                   50th percentile service time | decay_geo_gauss_function_score |   372.867       |      ms |
|                                   90th percentile service time | decay_geo_gauss_function_score |   387.9         |      ms |
|                                   99th percentile service time | decay_geo_gauss_function_score |   390.2         |      ms |
|                                  100th percentile service time | decay_geo_gauss_function_score |   390.619       |      ms |
|                                                     error rate | decay_geo_gauss_function_score |     0           |       % |
|                                                 Min Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                                Mean Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                              Median Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                                 Max Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                        50th percentile latency |   decay_geo_gauss_script_score |   401.698       |      ms |
|                                        90th percentile latency |   decay_geo_gauss_script_score |   410.091       |      ms |
|                                        99th percentile latency |   decay_geo_gauss_script_score |   428.357       |      ms |
|                                       100th percentile latency |   decay_geo_gauss_script_score |   431.643       |      ms |
|                                   50th percentile service time |   decay_geo_gauss_script_score |   399.86        |      ms |
|                                   90th percentile service time |   decay_geo_gauss_script_score |   408.528       |      ms |
|                                   99th percentile service time |   decay_geo_gauss_script_score |   426.957       |      ms |
|                                  100th percentile service time |   decay_geo_gauss_script_score |   430.013       |      ms |
|                                                     error rate |   decay_geo_gauss_script_score |     0           |       % |
|                                                 Min Throughput |     field_value_function_score |     1.5         |   ops/s |
|                                                Mean Throughput |     field_value_function_score |     1.5         |   ops/s |
|                                              Median Throughput |     field_value_function_score |     1.5         |   ops/s |
|                                                 Max Throughput |     field_value_function_score |     1.51        |   ops/s |
|                                        50th percentile latency |     field_value_function_score |   136.794       |      ms |
|                                        90th percentile latency |     field_value_function_score |   138.645       |      ms |
|                                        99th percentile latency |     field_value_function_score |   140.362       |      ms |
|                                       100th percentile latency |     field_value_function_score |   141.246       |      ms |
|                                   50th percentile service time |     field_value_function_score |   135.269       |      ms |
|                                   90th percentile service time |     field_value_function_score |   136.876       |      ms |
|                                   99th percentile service time |     field_value_function_score |   138.512       |      ms |
|                                  100th percentile service time |     field_value_function_score |   139.136       |      ms |
|                                                     error rate |     field_value_function_score |     0           |       % |
|                                                 Min Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                                Mean Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                              Median Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                                 Max Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                        50th percentile latency |       field_value_script_score |   197.463       |      ms |
|                                        90th percentile latency |       field_value_script_score |   201.006       |      ms |
|                                        99th percentile latency |       field_value_script_score |   241.363       |      ms |
|                                       100th percentile latency |       field_value_script_score |   269.868       |      ms |
|                                   50th percentile service time |       field_value_script_score |   195.967       |      ms |
|                                   90th percentile service time |       field_value_script_score |   199.248       |      ms |
|                                   99th percentile service time |       field_value_script_score |   239.841       |      ms |
|                                  100th percentile service time |       field_value_script_score |   268.763       |      ms |
|                                                     error rate |       field_value_script_score |     0           |       % |
|                                                 Min Throughput |                    large_terms |     1.1         |   ops/s |
|                                                Mean Throughput |                    large_terms |     1.1         |   ops/s |
|                                              Median Throughput |                    large_terms |     1.1         |   ops/s |
|                                                 Max Throughput |                    large_terms |     1.1         |   ops/s |
|                                        50th percentile latency |                    large_terms |   543.912       |      ms |
|                                        90th percentile latency |                    large_terms |   547.238       |      ms |
|                                        99th percentile latency |                    large_terms |   571.009       |      ms |
|                                       100th percentile latency |                    large_terms |   578.712       |      ms |
|                                   50th percentile service time |                    large_terms |   534.931       |      ms |
|                                   90th percentile service time |                    large_terms |   538.092       |      ms |
|                                   99th percentile service time |                    large_terms |   561.818       |      ms |
|                                  100th percentile service time |                    large_terms |   569.064       |      ms |
|                                                     error rate |                    large_terms |     0           |       % |
|                                                 Min Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                                Mean Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                              Median Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                                 Max Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                        50th percentile latency |           large_filtered_terms |   548.604       |      ms |
|                                        90th percentile latency |           large_filtered_terms |   555.543       |      ms |
|                                        99th percentile latency |           large_filtered_terms |   578.383       |      ms |
|                                       100th percentile latency |           large_filtered_terms |   586.958       |      ms |
|                                   50th percentile service time |           large_filtered_terms |   539.855       |      ms |
|                                   90th percentile service time |           large_filtered_terms |   546.654       |      ms |
|                                   99th percentile service time |           large_filtered_terms |   569.742       |      ms |
|                                  100th percentile service time |           large_filtered_terms |   578.006       |      ms |
|                                                     error rate |           large_filtered_terms |     0           |       % |
|                                                 Min Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                                Mean Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                              Median Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                                 Max Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                        50th percentile latency |         large_prohibited_terms |   527.754       |      ms |
|                                        90th percentile latency |         large_prohibited_terms |   531.253       |      ms |
|                                        99th percentile latency |         large_prohibited_terms |   535.281       |      ms |
|                                       100th percentile latency |         large_prohibited_terms |   535.465       |      ms |
|                                   50th percentile service time |         large_prohibited_terms |   518.843       |      ms |
|                                   90th percentile service time |         large_prohibited_terms |   522.73        |      ms |
|                                   99th percentile service time |         large_prohibited_terms |   526.707       |      ms |
|                                  100th percentile service time |         large_prohibited_terms |   527.02        |      ms |
|                                                     error rate |         large_prohibited_terms |     0           |       % |
|                                                 Min Throughput |           desc_sort_population |     1.5         |   ops/s |
|                                                Mean Throughput |           desc_sort_population |     1.51        |   ops/s |
|                                              Median Throughput |           desc_sort_population |     1.51        |   ops/s |
|                                                 Max Throughput |           desc_sort_population |     1.51        |   ops/s |
|                                        50th percentile latency |           desc_sort_population |     5.5268      |      ms |
|                                        90th percentile latency |           desc_sort_population |     5.91803     |      ms |
|                                        99th percentile latency |           desc_sort_population |     6.12278     |      ms |
|                                       100th percentile latency |           desc_sort_population |     6.14367     |      ms |
|                                   50th percentile service time |           desc_sort_population |     3.78196     |      ms |
|                                   90th percentile service time |           desc_sort_population |     3.97551     |      ms |
|                                   99th percentile service time |           desc_sort_population |     4.14683     |      ms |
|                                  100th percentile service time |           desc_sort_population |     4.19459     |      ms |
|                                                     error rate |           desc_sort_population |     0           |       % |
|                                                 Min Throughput |            asc_sort_population |     1.5         |   ops/s |
|                                                Mean Throughput |            asc_sort_population |     1.51        |   ops/s |
|                                              Median Throughput |            asc_sort_population |     1.51        |   ops/s |
|                                                 Max Throughput |            asc_sort_population |     1.51        |   ops/s |
|                                        50th percentile latency |            asc_sort_population |     4.73636     |      ms |
|                                        90th percentile latency |            asc_sort_population |     5.1548      |      ms |
|                                        99th percentile latency |            asc_sort_population |    41.998       |      ms |
|                                       100th percentile latency |            asc_sort_population |    78.611       |      ms |
|                                   50th percentile service time |            asc_sort_population |     3.0662      |      ms |
|                                   90th percentile service time |            asc_sort_population |     3.18948     |      ms |
|                                   99th percentile service time |            asc_sort_population |    39.9719      |      ms |
|                                  100th percentile service time |            asc_sort_population |    76.6236      |      ms |
|                                                     error rate |            asc_sort_population |     0           |       % |
|                                                 Min Throughput | asc_sort_with_after_population |     1.5         |   ops/s |
|                                                Mean Throughput | asc_sort_with_after_population |     1.51        |   ops/s |
|                                              Median Throughput | asc_sort_with_after_population |     1.51        |   ops/s |
|                                                 Max Throughput | asc_sort_with_after_population |     1.51        |   ops/s |
|                                        50th percentile latency | asc_sort_with_after_population |     6.19328     |      ms |
|                                        90th percentile latency | asc_sort_with_after_population |     6.72275     |      ms |
|                                        99th percentile latency | asc_sort_with_after_population |     6.88792     |      ms |
|                                       100th percentile latency | asc_sort_with_after_population |     6.89045     |      ms |
|                                   50th percentile service time | asc_sort_with_after_population |     4.54978     |      ms |
|                                   90th percentile service time | asc_sort_with_after_population |     4.88187     |      ms |
|                                   99th percentile service time | asc_sort_with_after_population |     5.03946     |      ms |
|                                  100th percentile service time | asc_sort_with_after_population |     5.05401     |      ms |
|                                                     error rate | asc_sort_with_after_population |     0           |       % |
|                                                 Min Throughput |            desc_sort_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput |            desc_sort_geonameid |     6.02        |   ops/s |
|                                              Median Throughput |            desc_sort_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput |            desc_sort_geonameid |     6.03        |   ops/s |
|                                        50th percentile latency |            desc_sort_geonameid |     5.61848     |      ms |
|                                        90th percentile latency |            desc_sort_geonameid |     5.94872     |      ms |
|                                        99th percentile latency |            desc_sort_geonameid |     6.14102     |      ms |
|                                       100th percentile latency |            desc_sort_geonameid |     6.14694     |      ms |
|                                   50th percentile service time |            desc_sort_geonameid |     4.4259      |      ms |
|                                   90th percentile service time |            desc_sort_geonameid |     4.68748     |      ms |
|                                   99th percentile service time |            desc_sort_geonameid |     4.97231     |      ms |
|                                  100th percentile service time |            desc_sort_geonameid |     5.01756     |      ms |
|                                                     error rate |            desc_sort_geonameid |     0           |       % |
|                                                 Min Throughput | desc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput | desc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                              Median Throughput | desc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput | desc_sort_with_after_geonameid |     6.03        |   ops/s |
|                                        50th percentile latency | desc_sort_with_after_geonameid |    14.0931      |      ms |
|                                        90th percentile latency | desc_sort_with_after_geonameid |    16.115       |      ms |
|                                        99th percentile latency | desc_sort_with_after_geonameid |    18.0218      |      ms |
|                                       100th percentile latency | desc_sort_with_after_geonameid |    18.0224      |      ms |
|                                   50th percentile service time | desc_sort_with_after_geonameid |    12.9568      |      ms |
|                                   90th percentile service time | desc_sort_with_after_geonameid |    15.1084      |      ms |
|                                   99th percentile service time | desc_sort_with_after_geonameid |    16.7593      |      ms |
|                                  100th percentile service time | desc_sort_with_after_geonameid |    16.9827      |      ms |
|                                                     error rate | desc_sort_with_after_geonameid |     0           |       % |
|                                                 Min Throughput |             asc_sort_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput |             asc_sort_geonameid |     6.02        |   ops/s |
|                                              Median Throughput |             asc_sort_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput |             asc_sort_geonameid |     6.03        |   ops/s |
|                                        50th percentile latency |             asc_sort_geonameid |     4.62365     |      ms |
|                                        90th percentile latency |             asc_sort_geonameid |     5.05052     |      ms |
|                                        99th percentile latency |             asc_sort_geonameid |     5.22239     |      ms |
|                                       100th percentile latency |             asc_sort_geonameid |     5.25253     |      ms |
|                                   50th percentile service time |             asc_sort_geonameid |     3.44913     |      ms |
|                                   90th percentile service time |             asc_sort_geonameid |     3.60107     |      ms |
|                                   99th percentile service time |             asc_sort_geonameid |     3.68395     |      ms |
|                                  100th percentile service time |             asc_sort_geonameid |     3.71943     |      ms |
|                                                     error rate |             asc_sort_geonameid |     0           |       % |
|                                                 Min Throughput |  asc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput |  asc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                              Median Throughput |  asc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput |  asc_sort_with_after_geonameid |     6.03        |   ops/s |
|                                        50th percentile latency |  asc_sort_with_after_geonameid |     5.03019     |      ms |
|                                        90th percentile latency |  asc_sort_with_after_geonameid |     5.4246      |      ms |
|                                        99th percentile latency |  asc_sort_with_after_geonameid |     5.76356     |      ms |
|                                       100th percentile latency |  asc_sort_with_after_geonameid |     5.85535     |      ms |
|                                   50th percentile service time |  asc_sort_with_after_geonameid |     3.75358     |      ms |
|                                   90th percentile service time |  asc_sort_with_after_geonameid |     4.06571     |      ms |
|                                   99th percentile service time |  asc_sort_with_after_geonameid |     4.25657     |      ms |
|                                  100th percentile service time |  asc_sort_with_after_geonameid |     4.2727      |      ms |
|                                                     error rate |  asc_sort_with_after_geonameid |     0           |       % |

Results from my branch which includes the fix:

|---------------------------------------------------------------:|-------------------------------:|----------------:|--------:|
|                     Cumulative indexing time of primary shards |                                |    14.9185      |     min |
|             Min cumulative indexing time across primary shards |                                |     2.85472     |     min |
|          Median cumulative indexing time across primary shards |                                |     2.9371      |     min |
|             Max cumulative indexing time across primary shards |                                |     3.15958     |     min |
|            Cumulative indexing throttle time of primary shards |                                |     0           |     min |
|    Min cumulative indexing throttle time across primary shards |                                |     0           |     min |
| Median cumulative indexing throttle time across primary shards |                                |     0           |     min |
|    Max cumulative indexing throttle time across primary shards |                                |     0           |     min |
|                        Cumulative merge time of primary shards |                                |     0.159183    |     min |
|                       Cumulative merge count of primary shards |                                |     5           |         |
|                Min cumulative merge time across primary shards |                                |     0.00345     |     min |
|             Median cumulative merge time across primary shards |                                |     0.0157167   |     min |
|                Max cumulative merge time across primary shards |                                |     0.0852167   |     min |
|               Cumulative merge throttle time of primary shards |                                |     0.0234333   |     min |
|       Min cumulative merge throttle time across primary shards |                                |     0           |     min |
|    Median cumulative merge throttle time across primary shards |                                |     0           |     min |
|       Max cumulative merge throttle time across primary shards |                                |     0.0234333   |     min |
|                      Cumulative refresh time of primary shards |                                |     1.67967     |     min |
|                     Cumulative refresh count of primary shards |                                |    46           |         |
|              Min cumulative refresh time across primary shards |                                |     0.27745     |     min |
|           Median cumulative refresh time across primary shards |                                |     0.305833    |     min |
|              Max cumulative refresh time across primary shards |                                |     0.469167    |     min |
|                        Cumulative flush time of primary shards |                                |     1.25548     |     min |
|                       Cumulative flush count of primary shards |                                |    10           |         |
|                Min cumulative flush time across primary shards |                                |     0.21925     |     min |
|             Median cumulative flush time across primary shards |                                |     0.2472      |     min |
|                Max cumulative flush time across primary shards |                                |     0.27925     |     min |
|                                        Total Young Gen GC time |                                |     1.121       |       s |
|                                       Total Young Gen GC count |                                |    45           |         |
|                                          Total Old Gen GC time |                                |     0           |       s |
|                                         Total Old Gen GC count |                                |     0           |         |
|                                                     Store size |                                |     2.94586     |      GB |
|                                                  Translog size |                                |     2.56114e-07 |      GB |
|                                         Heap used for segments |                                |     0           |      MB |
|                                       Heap used for doc values |                                |     0           |      MB |
|                                            Heap used for terms |                                |     0           |      MB |
|                                            Heap used for norms |                                |     0           |      MB |
|                                           Heap used for points |                                |     0           |      MB |
|                                    Heap used for stored fields |                                |     0           |      MB |
|                                                  Segment count |                                |    89           |         |
|                                    Total Ingest Pipeline count |                                |     0           |         |
|                                     Total Ingest Pipeline time |                                |     0           |       s |
|                                   Total Ingest Pipeline failed |                                |     0           |         |
|                                                     error rate |                   index-append |     0           |       % |
|                                       100th percentile latency |            refresh-after-index | 10732.7         |      ms |
|                                  100th percentile service time |            refresh-after-index | 10732.7         |      ms |
|                                                     error rate |            refresh-after-index |   100           |       % |
|                                                 Min Throughput |                    index-stats |    89.98        |   ops/s |
|                                                Mean Throughput |                    index-stats |    89.98        |   ops/s |
|                                              Median Throughput |                    index-stats |    89.98        |   ops/s |
|                                                 Max Throughput |                    index-stats |    89.99        |   ops/s |
|                                        50th percentile latency |                    index-stats |     2.67475     |      ms |
|                                        90th percentile latency |                    index-stats |     3.49978     |      ms |
|                                        99th percentile latency |                    index-stats |     3.88879     |      ms |
|                                      99.9th percentile latency |                    index-stats |     5.07837     |      ms |
|                                       100th percentile latency |                    index-stats |     5.85732     |      ms |
|                                   50th percentile service time |                    index-stats |     1.36093     |      ms |
|                                   90th percentile service time |                    index-stats |     1.57976     |      ms |
|                                   99th percentile service time |                    index-stats |     1.97002     |      ms |
|                                 99.9th percentile service time |                    index-stats |     2.02621     |      ms |
|                                  100th percentile service time |                    index-stats |     2.03324     |      ms |
|                                                     error rate |                    index-stats |     0           |       % |
|                                                 Min Throughput |                     node-stats |    89.75        |   ops/s |
|                                                Mean Throughput |                     node-stats |    89.9         |   ops/s |
|                                              Median Throughput |                     node-stats |    89.92        |   ops/s |
|                                                 Max Throughput |                     node-stats |    89.95        |   ops/s |
|                                        50th percentile latency |                     node-stats |     3.23653     |      ms |
|                                        90th percentile latency |                     node-stats |     4.12945     |      ms |
|                                        99th percentile latency |                     node-stats |     5.11187     |      ms |
|                                      99.9th percentile latency |                     node-stats |     6.08176     |      ms |
|                                       100th percentile latency |                     node-stats |     6.18163     |      ms |
|                                   50th percentile service time |                     node-stats |     2.29206     |      ms |
|                                   90th percentile service time |                     node-stats |     2.76165     |      ms |
|                                   99th percentile service time |                     node-stats |     4.14635     |      ms |
|                                 99.9th percentile service time |                     node-stats |     5.36388     |      ms |
|                                  100th percentile service time |                     node-stats |     5.47781     |      ms |
|                                                     error rate |                     node-stats |     0           |       % |
|                                                 Min Throughput |                        default |    49.94        |   ops/s |
|                                                Mean Throughput |                        default |    49.96        |   ops/s |
|                                              Median Throughput |                        default |    49.97        |   ops/s |
|                                                 Max Throughput |                        default |    49.98        |   ops/s |
|                                        50th percentile latency |                        default |     3.10619     |      ms |
|                                        90th percentile latency |                        default |     4.08708     |      ms |
|                                        99th percentile latency |                        default |     4.64534     |      ms |
|                                      99.9th percentile latency |                        default |     8.88669     |      ms |
|                                       100th percentile latency |                        default |    10.9282      |      ms |
|                                   50th percentile service time |                        default |     1.95389     |      ms |
|                                   90th percentile service time |                        default |     2.36799     |      ms |
|                                   99th percentile service time |                        default |     2.92647     |      ms |
|                                 99.9th percentile service time |                        default |     7.24396     |      ms |
|                                  100th percentile service time |                        default |     9.73177     |      ms |
|                                                     error rate |                        default |     0           |       % |
|                                                 Min Throughput |                           term |    99.73        |   ops/s |
|                                                Mean Throughput |                           term |    99.83        |   ops/s |
|                                              Median Throughput |                           term |    99.85        |   ops/s |
|                                                 Max Throughput |                           term |    99.89        |   ops/s |
|                                        50th percentile latency |                           term |     2.92205     |      ms |
|                                        90th percentile latency |                           term |     3.42724     |      ms |
|                                        99th percentile latency |                           term |     3.82527     |      ms |
|                                      99.9th percentile latency |                           term |     7.63367     |      ms |
|                                       100th percentile latency |                           term |    11.1277      |      ms |
|                                   50th percentile service time |                           term |     2.12522     |      ms |
|                                   90th percentile service time |                           term |     2.43555     |      ms |
|                                   99th percentile service time |                           term |     2.75201     |      ms |
|                                 99.9th percentile service time |                           term |     6.49252     |      ms |
|                                  100th percentile service time |                           term |     9.82611     |      ms |
|                                                     error rate |                           term |     0           |       % |
|                                                 Min Throughput |                         phrase |   109.73        |   ops/s |
|                                                Mean Throughput |                         phrase |   109.83        |   ops/s |
|                                              Median Throughput |                         phrase |   109.85        |   ops/s |
|                                                 Max Throughput |                         phrase |   109.9         |   ops/s |
|                                        50th percentile latency |                         phrase |     2.53479     |      ms |
|                                        90th percentile latency |                         phrase |     3.01195     |      ms |
|                                        99th percentile latency |                         phrase |     3.54029     |      ms |
|                                      99.9th percentile latency |                         phrase |    15.006       |      ms |
|                                       100th percentile latency |                         phrase |    18.005       |      ms |
|                                   50th percentile service time |                         phrase |     1.74877     |      ms |
|                                   90th percentile service time |                         phrase |     2.10537     |      ms |
|                                   99th percentile service time |                         phrase |     2.60654     |      ms |
|                                 99.9th percentile service time |                         phrase |    10.0943      |      ms |
|                                  100th percentile service time |                         phrase |    17.3029      |      ms |
|                                                     error rate |                         phrase |     0           |       % |
|                                                 Min Throughput |           country_agg_uncached |     3           |   ops/s |
|                                                Mean Throughput |           country_agg_uncached |     3           |   ops/s |
|                                              Median Throughput |           country_agg_uncached |     3           |   ops/s |
|                                                 Max Throughput |           country_agg_uncached |     3           |   ops/s |
|                                        50th percentile latency |           country_agg_uncached |   138.293       |      ms |
|                                        90th percentile latency |           country_agg_uncached |   149.663       |      ms |
|                                        99th percentile latency |           country_agg_uncached |   156.159       |      ms |
|                                       100th percentile latency |           country_agg_uncached |   158.189       |      ms |
|                                   50th percentile service time |           country_agg_uncached |   136.876       |      ms |
|                                   90th percentile service time |           country_agg_uncached |   148.403       |      ms |
|                                   99th percentile service time |           country_agg_uncached |   154.805       |      ms |
|                                  100th percentile service time |           country_agg_uncached |   156.452       |      ms |
|                                                     error rate |           country_agg_uncached |     0           |       % |
|                                                 Min Throughput |             country_agg_cached |    98.58        |   ops/s |
|                                                Mean Throughput |             country_agg_cached |    99           |   ops/s |
|                                              Median Throughput |             country_agg_cached |    99.05        |   ops/s |
|                                                 Max Throughput |             country_agg_cached |    99.29        |   ops/s |
|                                        50th percentile latency |             country_agg_cached |     2.24814     |      ms |
|                                        90th percentile latency |             country_agg_cached |     3.45649     |      ms |
|                                        99th percentile latency |             country_agg_cached |     3.76037     |      ms |
|                                      99.9th percentile latency |             country_agg_cached |     3.91796     |      ms |
|                                       100th percentile latency |             country_agg_cached |     3.9418      |      ms |
|                                   50th percentile service time |             country_agg_cached |     1.35517     |      ms |
|                                   90th percentile service time |             country_agg_cached |     1.68261     |      ms |
|                                   99th percentile service time |             country_agg_cached |     2.06296     |      ms |
|                                 99.9th percentile service time |             country_agg_cached |     2.29668     |      ms |
|                                  100th percentile service time |             country_agg_cached |     2.31993     |      ms |
|                                                     error rate |             country_agg_cached |     0           |       % |
|                                                 Min Throughput |                         scroll |    20.05        | pages/s |
|                                                Mean Throughput |                         scroll |    20.05        | pages/s |
|                                              Median Throughput |                         scroll |    20.05        | pages/s |
|                                                 Max Throughput |                         scroll |    20.07        | pages/s |
|                                        50th percentile latency |                         scroll |   136.009       |      ms |
|                                        90th percentile latency |                         scroll |   142.206       |      ms |
|                                        99th percentile latency |                         scroll |   147.463       |      ms |
|                                       100th percentile latency |                         scroll |   149.218       |      ms |
|                                   50th percentile service time |                         scroll |   133.091       |      ms |
|                                   90th percentile service time |                         scroll |   139.803       |      ms |
|                                   99th percentile service time |                         scroll |   145.222       |      ms |
|                                  100th percentile service time |                         scroll |   146.898       |      ms |
|                                                     error rate |                         scroll |     0           |       % |
|                                                 Min Throughput |                     expression |     1.5         |   ops/s |
|                                                Mean Throughput |                     expression |     1.5         |   ops/s |
|                                              Median Throughput |                     expression |     1.5         |   ops/s |
|                                                 Max Throughput |                     expression |     1.5         |   ops/s |
|                                        50th percentile latency |                     expression |   321.258       |      ms |
|                                        90th percentile latency |                     expression |   334.713       |      ms |
|                                        99th percentile latency |                     expression |   342.656       |      ms |
|                                       100th percentile latency |                     expression |   343.252       |      ms |
|                                   50th percentile service time |                     expression |   319.339       |      ms |
|                                   90th percentile service time |                     expression |   332.754       |      ms |
|                                   99th percentile service time |                     expression |   340.651       |      ms |
|                                  100th percentile service time |                     expression |   340.659       |      ms |
|                                                     error rate |                     expression |     0           |       % |
|                                                 Min Throughput |                painless_static |     1.4         |   ops/s |
|                                                Mean Throughput |                painless_static |     1.4         |   ops/s |
|                                              Median Throughput |                painless_static |     1.4         |   ops/s |
|                                                 Max Throughput |                painless_static |     1.4         |   ops/s |
|                                        50th percentile latency |                painless_static |   399.607       |      ms |
|                                        90th percentile latency |                painless_static |   411.235       |      ms |
|                                        99th percentile latency |                painless_static |   445.419       |      ms |
|                                       100th percentile latency |                painless_static |   466.456       |      ms |
|                                   50th percentile service time |                painless_static |   398.064       |      ms |
|                                   90th percentile service time |                painless_static |   409.713       |      ms |
|                                   99th percentile service time |                painless_static |   443.995       |      ms |
|                                  100th percentile service time |                painless_static |   465.206       |      ms |
|                                                     error rate |                painless_static |     0           |       % |
|                                                 Min Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                                Mean Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                              Median Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                                 Max Throughput |               painless_dynamic |     1.4         |   ops/s |
|                                        50th percentile latency |               painless_dynamic |   403.304       |      ms |
|                                        90th percentile latency |               painless_dynamic |   415.35        |      ms |
|                                        99th percentile latency |               painless_dynamic |   420.506       |      ms |
|                                       100th percentile latency |               painless_dynamic |   421.215       |      ms |
|                                   50th percentile service time |               painless_dynamic |   402.141       |      ms |
|                                   90th percentile service time |               painless_dynamic |   413.886       |      ms |
|                                   99th percentile service time |               painless_dynamic |   419.586       |      ms |
|                                  100th percentile service time |               painless_dynamic |   420.293       |      ms |
|                                                     error rate |               painless_dynamic |     0           |       % |
|                                                 Min Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                                Mean Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                              Median Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                                 Max Throughput | decay_geo_gauss_function_score |     1           |   ops/s |
|                                        50th percentile latency | decay_geo_gauss_function_score |   352.713       |      ms |
|                                        90th percentile latency | decay_geo_gauss_function_score |   354.868       |      ms |
|                                        99th percentile latency | decay_geo_gauss_function_score |   362.09        |      ms |
|                                       100th percentile latency | decay_geo_gauss_function_score |   366.608       |      ms |
|                                   50th percentile service time | decay_geo_gauss_function_score |   351.167       |      ms |
|                                   90th percentile service time | decay_geo_gauss_function_score |   353.26        |      ms |
|                                   99th percentile service time | decay_geo_gauss_function_score |   360.291       |      ms |
|                                  100th percentile service time | decay_geo_gauss_function_score |   365.044       |      ms |
|                                                     error rate | decay_geo_gauss_function_score |     0           |       % |
|                                                 Min Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                                Mean Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                              Median Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                                 Max Throughput |   decay_geo_gauss_script_score |     1           |   ops/s |
|                                        50th percentile latency |   decay_geo_gauss_script_score |   384.917       |      ms |
|                                        90th percentile latency |   decay_geo_gauss_script_score |   393.224       |      ms |
|                                        99th percentile latency |   decay_geo_gauss_script_score |   403.572       |      ms |
|                                       100th percentile latency |   decay_geo_gauss_script_score |   404.466       |      ms |
|                                   50th percentile service time |   decay_geo_gauss_script_score |   383.385       |      ms |
|                                   90th percentile service time |   decay_geo_gauss_script_score |   391.183       |      ms |
|                                   99th percentile service time |   decay_geo_gauss_script_score |   402.103       |      ms |
|                                  100th percentile service time |   decay_geo_gauss_script_score |   402.781       |      ms |
|                                                     error rate |   decay_geo_gauss_script_score |     0           |       % |
|                                                 Min Throughput |     field_value_function_score |     1.5         |   ops/s |
|                                                Mean Throughput |     field_value_function_score |     1.5         |   ops/s |
|                                              Median Throughput |     field_value_function_score |     1.5         |   ops/s |
|                                                 Max Throughput |     field_value_function_score |     1.5         |   ops/s |
|                                        50th percentile latency |     field_value_function_score |   129.766       |      ms |
|                                        90th percentile latency |     field_value_function_score |   141.085       |      ms |
|                                        99th percentile latency |     field_value_function_score |   145.17        |      ms |
|                                       100th percentile latency |     field_value_function_score |   145.696       |      ms |
|                                   50th percentile service time |     field_value_function_score |   128.224       |      ms |
|                                   90th percentile service time |     field_value_function_score |   139.273       |      ms |
|                                   99th percentile service time |     field_value_function_score |   143.382       |      ms |
|                                  100th percentile service time |     field_value_function_score |   144.433       |      ms |
|                                                     error rate |     field_value_function_score |     0           |       % |
|                                                 Min Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                                Mean Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                              Median Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                                 Max Throughput |       field_value_script_score |     1.5         |   ops/s |
|                                        50th percentile latency |       field_value_script_score |   201.995       |      ms |
|                                        90th percentile latency |       field_value_script_score |   209.175       |      ms |
|                                        99th percentile latency |       field_value_script_score |   245.659       |      ms |
|                                       100th percentile latency |       field_value_script_score |   277.35        |      ms |
|                                   50th percentile service time |       field_value_script_score |   200.701       |      ms |
|                                   90th percentile service time |       field_value_script_score |   207.558       |      ms |
|                                   99th percentile service time |       field_value_script_score |   243.796       |      ms |
|                                  100th percentile service time |       field_value_script_score |   276.325       |      ms |
|                                                     error rate |       field_value_script_score |     0           |       % |
|                                                 Min Throughput |                    large_terms |     1.1         |   ops/s |
|                                                Mean Throughput |                    large_terms |     1.1         |   ops/s |
|                                              Median Throughput |                    large_terms |     1.1         |   ops/s |
|                                                 Max Throughput |                    large_terms |     1.1         |   ops/s |
|                                        50th percentile latency |                    large_terms |   558.621       |      ms |
|                                        90th percentile latency |                    large_terms |   575.088       |      ms |
|                                        99th percentile latency |                    large_terms |   584.8         |      ms |
|                                       100th percentile latency |                    large_terms |   585.265       |      ms |
|                                   50th percentile service time |                    large_terms |   549.546       |      ms |
|                                   90th percentile service time |                    large_terms |   565.672       |      ms |
|                                   99th percentile service time |                    large_terms |   575.41        |      ms |
|                                  100th percentile service time |                    large_terms |   575.685       |      ms |
|                                                     error rate |                    large_terms |     0           |       % |
|                                                 Min Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                                Mean Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                              Median Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                                 Max Throughput |           large_filtered_terms |     1.1         |   ops/s |
|                                        50th percentile latency |           large_filtered_terms |   562.047       |      ms |
|                                        90th percentile latency |           large_filtered_terms |   579.132       |      ms |
|                                        99th percentile latency |           large_filtered_terms |   587.518       |      ms |
|                                       100th percentile latency |           large_filtered_terms |   589.435       |      ms |
|                                   50th percentile service time |           large_filtered_terms |   553.251       |      ms |
|                                   90th percentile service time |           large_filtered_terms |   570.79        |      ms |
|                                   99th percentile service time |           large_filtered_terms |   579.485       |      ms |
|                                  100th percentile service time |           large_filtered_terms |   581.571       |      ms |
|                                                     error rate |           large_filtered_terms |     0           |       % |
|                                                 Min Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                                Mean Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                              Median Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                                 Max Throughput |         large_prohibited_terms |     1.1         |   ops/s |
|                                        50th percentile latency |         large_prohibited_terms |   544.585       |      ms |
|                                        90th percentile latency |         large_prohibited_terms |   562.376       |      ms |
|                                        99th percentile latency |         large_prohibited_terms |   576.493       |      ms |
|                                       100th percentile latency |         large_prohibited_terms |   576.652       |      ms |
|                                   50th percentile service time |         large_prohibited_terms |   536.069       |      ms |
|                                   90th percentile service time |         large_prohibited_terms |   553.927       |      ms |
|                                   99th percentile service time |         large_prohibited_terms |   567.9         |      ms |
|                                  100th percentile service time |         large_prohibited_terms |   568.545       |      ms |
|                                                     error rate |         large_prohibited_terms |     0           |       % |
|                                                 Min Throughput |           desc_sort_population |     1.5         |   ops/s |
|                                                Mean Throughput |           desc_sort_population |     1.51        |   ops/s |
|                                              Median Throughput |           desc_sort_population |     1.51        |   ops/s |
|                                                 Max Throughput |           desc_sort_population |     1.51        |   ops/s |
|                                        50th percentile latency |           desc_sort_population |     5.83199     |      ms |
|                                        90th percentile latency |           desc_sort_population |     6.40678     |      ms |
|                                        99th percentile latency |           desc_sort_population |     6.56707     |      ms |
|                                       100th percentile latency |           desc_sort_population |     6.62493     |      ms |
|                                   50th percentile service time |           desc_sort_population |     4.17976     |      ms |
|                                   90th percentile service time |           desc_sort_population |     4.36236     |      ms |
|                                   99th percentile service time |           desc_sort_population |     4.56165     |      ms |
|                                  100th percentile service time |           desc_sort_population |     4.56992     |      ms |
|                                                     error rate |           desc_sort_population |     0           |       % |
|                                                 Min Throughput |            asc_sort_population |     1.5         |   ops/s |
|                                                Mean Throughput |            asc_sort_population |     1.51        |   ops/s |
|                                              Median Throughput |            asc_sort_population |     1.51        |   ops/s |
|                                                 Max Throughput |            asc_sort_population |     1.51        |   ops/s |
|                                        50th percentile latency |            asc_sort_population |     5.32994     |      ms |
|                                        90th percentile latency |            asc_sort_population |     5.82029     |      ms |
|                                        99th percentile latency |            asc_sort_population |     6.22858     |      ms |
|                                       100th percentile latency |            asc_sort_population |     6.38707     |      ms |
|                                   50th percentile service time |            asc_sort_population |     3.6178      |      ms |
|                                   90th percentile service time |            asc_sort_population |     3.80185     |      ms |
|                                   99th percentile service time |            asc_sort_population |     3.98445     |      ms |
|                                  100th percentile service time |            asc_sort_population |     4.01286     |      ms |
|                                                     error rate |            asc_sort_population |     0           |       % |
|                                                 Min Throughput | asc_sort_with_after_population |     1.5         |   ops/s |
|                                                Mean Throughput | asc_sort_with_after_population |     1.51        |   ops/s |
|                                              Median Throughput | asc_sort_with_after_population |     1.51        |   ops/s |
|                                                 Max Throughput | asc_sort_with_after_population |     1.51        |   ops/s |
|                                        50th percentile latency | asc_sort_with_after_population |     6.4384      |      ms |
|                                        90th percentile latency | asc_sort_with_after_population |     7.0876      |      ms |
|                                        99th percentile latency | asc_sort_with_after_population |     7.36099     |      ms |
|                                       100th percentile latency | asc_sort_with_after_population |     7.43844     |      ms |
|                                   50th percentile service time | asc_sort_with_after_population |     4.91549     |      ms |
|                                   90th percentile service time | asc_sort_with_after_population |     5.10619     |      ms |
|                                   99th percentile service time | asc_sort_with_after_population |     5.27522     |      ms |
|                                  100th percentile service time | asc_sort_with_after_population |     5.29588     |      ms |
|                                                     error rate | asc_sort_with_after_population |     0           |       % |
|                                                 Min Throughput |            desc_sort_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput |            desc_sort_geonameid |     6.02        |   ops/s |
|                                              Median Throughput |            desc_sort_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput |            desc_sort_geonameid |     6.03        |   ops/s |
|                                        50th percentile latency |            desc_sort_geonameid |     5.48076     |      ms |
|                                        90th percentile latency |            desc_sort_geonameid |     6.00734     |      ms |
|                                        99th percentile latency |            desc_sort_geonameid |     6.26044     |      ms |
|                                       100th percentile latency |            desc_sort_geonameid |     6.29781     |      ms |
|                                   50th percentile service time |            desc_sort_geonameid |     4.34716     |      ms |
|                                   90th percentile service time |            desc_sort_geonameid |     4.5983      |      ms |
|                                   99th percentile service time |            desc_sort_geonameid |     4.80134     |      ms |
|                                  100th percentile service time |            desc_sort_geonameid |     4.82802     |      ms |
|                                                     error rate |            desc_sort_geonameid |     0           |       % |
|                                                 Min Throughput | desc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput | desc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                              Median Throughput | desc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput | desc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                        50th percentile latency | desc_sort_with_after_geonameid |    18.3231      |      ms |
|                                        90th percentile latency | desc_sort_with_after_geonameid |    20.3584      |      ms |
|                                        99th percentile latency | desc_sort_with_after_geonameid |    27.2554      |      ms |
|                                       100th percentile latency | desc_sort_with_after_geonameid |    27.3688      |      ms |
|                                   50th percentile service time | desc_sort_with_after_geonameid |    17.168       |      ms |
|                                   90th percentile service time | desc_sort_with_after_geonameid |    19.3803      |      ms |
|                                   99th percentile service time | desc_sort_with_after_geonameid |    25.7227      |      ms |
|                                  100th percentile service time | desc_sort_with_after_geonameid |    25.8356      |      ms |
|                                                     error rate | desc_sort_with_after_geonameid |     0           |       % |
|                                                 Min Throughput |             asc_sort_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput |             asc_sort_geonameid |     6.02        |   ops/s |
|                                              Median Throughput |             asc_sort_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput |             asc_sort_geonameid |     6.03        |   ops/s |
|                                        50th percentile latency |             asc_sort_geonameid |     4.92353     |      ms |
|                                        90th percentile latency |             asc_sort_geonameid |     5.24371     |      ms |
|                                        99th percentile latency |             asc_sort_geonameid |     5.54651     |      ms |
|                                       100th percentile latency |             asc_sort_geonameid |     5.55844     |      ms |
|                                   50th percentile service time |             asc_sort_geonameid |     3.74649     |      ms |
|                                   90th percentile service time |             asc_sort_geonameid |     3.89364     |      ms |
|                                   99th percentile service time |             asc_sort_geonameid |     4.0023      |      ms |
|                                  100th percentile service time |             asc_sort_geonameid |     4.00615     |      ms |
|                                                     error rate |             asc_sort_geonameid |     0           |       % |
|                                                 Min Throughput |  asc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                Mean Throughput |  asc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                              Median Throughput |  asc_sort_with_after_geonameid |     6.02        |   ops/s |
|                                                 Max Throughput |  asc_sort_with_after_geonameid |     6.03        |   ops/s |
|                                        50th percentile latency |  asc_sort_with_after_geonameid |     5.07973     |      ms |
|                                        90th percentile latency |  asc_sort_with_after_geonameid |     5.53334     |      ms |
|                                        99th percentile latency |  asc_sort_with_after_geonameid |     5.83067     |      ms |
|                                       100th percentile latency |  asc_sort_with_after_geonameid |     5.83473     |      ms |
|                                   50th percentile service time |  asc_sort_with_after_geonameid |     4.00845     |      ms |
|                                   90th percentile service time |  asc_sort_with_after_geonameid |     4.19059     |      ms |
|                                   99th percentile service time |  asc_sort_with_after_geonameid |     4.40769     |      ms |
|                                  100th percentile service time |  asc_sort_with_after_geonameid |     4.41827     |      ms |
|                                                     error rate |  asc_sort_with_after_geonameid |     0           |       % |

@jpountz
Copy link
Contributor

jpountz commented Feb 2, 2023

Sorry for the lag @javanna and thanks for running benchmarks. If the slowdown to match_all and term queries is not significant, then I feel better about the performance impact of this change. It's a great cleanup/simplification.

@javanna javanna merged commit 283f8ac into elastic:main Feb 6, 2023
@javanna javanna deleted the refactoring/remove_shortcut_total_hit_count branch February 6, 2023 10:24
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 27, 2023
javanna added a commit that referenced this pull request Feb 27, 2023
This reverts commit 283f8ac in the 8.7 branch (#89047).

We have found a performance regression around executing search requests with size greater than zero that hold queries that can shortcut their total hit count, like term and match_all. The previous shortcut total hit count optimization done in ES was able to shortcut those while the top score docs collector in Lucene does not support that. This can be improved further on main but for 8.7 we are going the safe path of reverting and leaving things how they were.
@javanna javanna removed the v8.7.0 label Feb 28, 2023
@javanna javanna added the v8.8.0 label Mar 10, 2023
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 29, 2023
We have removed shortcut total hit count with elastic#89047 and later noticed a
couple of benchmark regressions. While we have moved to skip counting
when possible when not collecting hits (e.g. size=0), which is the case
where Elasticsearch uses TotalHitCountCollector and the shortcutting is
supported natively in Lucene.

For the case where hits are collected, the total hit count is counted as
part of the collection in TopScoreDocCollector and TopFieldCollector,
where Lucene does not support skipping the counting as it is hard to
determine whether more competitive hits need to be collected or not.

The previous change caused a regression specifically when collecting
hits because we ended up removing our manual shortcut in favour of
counting which causes overhead.

With this change we reintroduce the shortcut total hit count method,
and only use it when strictly necessary. When size is 0, we rely
entirely on Lucene to shortcut the total hit counting, while when hits
are collected we do it our way, for now.

While at it, a few more tests are added to cover for situations that
were not covered before.
javanna added a commit that referenced this pull request Mar 29, 2023
javanna added a commit that referenced this pull request Mar 29, 2023
Reverts #89047

We have removed shortcut total hit count with#89047 and later noticed a couple of benchmark regressions. This PR reverts such change and reinstates the original logic for shortcut total hit count.
@javanna javanna removed the v8.8.0 label Mar 29, 2023
javanna added a commit that referenced this pull request Mar 30, 2023
We have removed shortcut total hit count with #89047 and later noticed a couple of benchmark regressions, which made us restore our shortcut total hit count mechanism.

When not collecting hits (e.g. size=0) we can leverage Lucene skipping mechanism instead of our handmade shortcut total hit count, as Elasticsearch uses TotalHitCountCollector which calls Weight#count. The advantage of this is that it supports shortcutting for many more queries than the only 3 which our manual mechanism supports (match_all, term and field exists).

While at it, a few more tests are added to cover for situations that were not covered before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up shortcutTotalHitCount using the new Weight#count API
6 participants