search Request performance drops significantly when setting size to Integer.MAX_VALUE #13125

falkorichter · 2015-08-26T17:04:28Z

We are updating some data in our els store. The data has a creation timestamp so we fetch buckets of 1 hour length from the database. Therefore we need to fetch all items from this date bucket. Since we don´t know how many items could be in the bucket, we se the size of search to Integer.MAX_VALUE.

We´re seeing a dramatic drop in performance when setting the hit size that high. When we set the size closer to the actual size, the performance of the request is again expectably fast.

Scroll to my comment for a solution using the scroll API

we tried to reproduce the problem as well using the REST API directly:
With Integer.MAX_VALUE (POST {{elastic_search_host}}/resolve_log_v1/entry/_search?pretty)

{
    "size" : 2147483647,
    "query" : {
        "bool" : {
            "must" : [
                { "range" : { "trigger.timestamp" : { "gte": "2015-07-26T16:00:00.000Z", "lte" : "2015-07-26T17:00:00.000Z" } } }
            ]
        }
    }
}

Result:

{
  "took": 22856,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 21,
    "max_score": 1,
    "hits": [

same request with size set to 21

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 21,
    "max_score": 1,
    "hits": [
      {
        "_index": "resolve_log_v1",
        "_type": "entry",
        "_id": "4851a61c-3a17-494d-b72e-9bce172d45e6",

when setting the size to 0 we still get a quick response.

The total count of entry objects (:

curl -X POST -H "Content-Type: application/json" -d '{    "size" : 0 }' 'http://localhost:9222/resolve_log_v1/entry/_search?pretty':
{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1022824,
    "max_score": 0,
    "hits": []
  }
}

To optimize, we now do two queries, 1 count, one to geht the hits:

BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().must(rangeQuery("trigger.timestamp").gte(fromJoda.toString()).lte(toJoda.toString()));
SearchResponse sizeResponse = client.prepareSearch(SearchProvider.INDEX)
                    .setTypes(TYPE)
                    .setSearchType(SearchType.COUNT)
                    .setQuery(queryBuilder)
                    .execute()
                    .actionGet();

SearchResponse response = client.prepareSearch(SearchProvider.INDEX)
                    .setTypes(TYPE)
                    .setQuery(queryBuilder)
                    .setSize((int) sizeResponse.getHits().getTotalHits())
                    .execute()
                    .actionGet();

this is not ideal, as the bucket theoretically could change.

The text was updated successfully, but these errors were encountered:

nik9000 · 2015-08-26T18:30:34Z

I'm pretty sure this is a duplicate. I'll hunt down what this is a duplicate of. OTOH you've left a workaround so thanks for that.

jasontedor · 2015-08-26T18:31:20Z

Have you considered using a scroll? I would advise against setting the result size so large given the data structures etc. that are allocated to handle result sets. The performance impact you're seeing is not a surprise.

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

Note that such a request opens a "scroll context" that will enable you to continue to fetch the results of the initial search request.

nik9000 · 2015-08-26T18:34:34Z

Have you considered using a scroll? I would advise against setting the result size so large given the data structures etc. that are allocated to handle result sets. The performance impact you're seeing is not a surprise.

Yeah - this is really much better than your workaround. I'd love for this to be less of a thing - for elasticsearch to reject requests that are unreasonably large with a helpful message pointing you to scroll. And for some requests to be streamable. But neither of those are implemented yet. I thought there were issues opened for them but I've misplaced them.

clintongormley · 2015-08-26T18:38:00Z

See #9311 and #11511

Definitely a duplicate, and scrolling is the right answer

nik9000 · 2015-08-26T18:38:55Z

Definitely a duplicate, and scrolling is the right answer

There we go. Thanks.

falkorichter · 2015-08-27T11:56:35Z

I think I got the solution. Scroll to get a snapshot of the request that I can work with, then scroll through the list and collect all the items.

I´ll post my working solution when I find time testing and finishin it. Thx for the good hints and answers. 🙇

jasontedor · 2015-08-27T17:26:44Z

That's exactly right.

falkorichter · 2015-09-02T07:43:30Z

here is my solution, hopefully it helps people that come across the same problem:

SearchRequestBuilder sizeQuery = client.prepareSearch(SearchProvider.INDEX)
        .setTypes(TYPE)
        .setSearchType(SearchType.SCAN)
        .setSize(10)
        .setScroll(TimeValue.timeValueSeconds(60))
        .setQuery(queryBuilder);

SearchResponse scrollResponse = sizeQuery.execute().actionGet();
long totalHits = scrollResponse.getHits().getTotalHits();

//might be an optimization, delete the scroll when size is 0, it´s deleted anyways when getting the first result...
//if (totalHits == 0) {
//    ClearScrollRequest clear = new ClearScrollRequest();
//    clear.addScrollId(scrollResponse.getScrollId());
//    client.clearScroll(clear);
//    return result;
//}

String scrollID = scrollResponse.getScrollId();
SearchResponse response;
do {
    response = client.prepareSearchScroll(scrollID)
            .setScroll(TimeValue.timeValueSeconds(60))
            .get();
    scrollID = response.getScrollId();
    for (SearchHit searchHit : response.getHits()) {
        EsEntry esEntry = new EsEntry();
        esEntry.setId(searchHit.getId());
        esEntry.setRawData(searchHit.getSource());
        result.getEntryList().add(esEntry);
    }
} while (response.getHits().hits().length > 0);
return result;

iDIKSHA · 2018-09-04T07:19:36Z

Well Search type scan and count are now deprecated #1745 So how this solution will be implemented with this?

jasontedor added :Search/Search Search-related issues that do not fall into other categories feedback_needed labels Aug 26, 2015

clintongormley closed this as completed Aug 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

search Request performance drops significantly when setting size to Integer.MAX_VALUE #13125

search Request performance drops significantly when setting size to Integer.MAX_VALUE #13125

falkorichter commented Aug 26, 2015

nik9000 commented Aug 26, 2015

jasontedor commented Aug 26, 2015

nik9000 commented Aug 26, 2015

clintongormley commented Aug 26, 2015

nik9000 commented Aug 26, 2015

falkorichter commented Aug 27, 2015

jasontedor commented Aug 27, 2015

falkorichter commented Sep 2, 2015

iDIKSHA commented Sep 4, 2018

search Request performance drops significantly when setting size to Integer.MAX_VALUE #13125

search Request performance drops significantly when setting size to Integer.MAX_VALUE #13125

Comments

falkorichter commented Aug 26, 2015

nik9000 commented Aug 26, 2015

jasontedor commented Aug 26, 2015

nik9000 commented Aug 26, 2015

clintongormley commented Aug 26, 2015

nik9000 commented Aug 26, 2015

falkorichter commented Aug 27, 2015

jasontedor commented Aug 27, 2015

falkorichter commented Sep 2, 2015

iDIKSHA commented Sep 4, 2018