Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search Request performance drops significantly when setting size to Integer.MAX_VALUE #13125

Closed
falkorichter opened this issue Aug 26, 2015 · 9 comments
Labels
feedback_needed :Search/Search Search-related issues that do not fall into other categories

Comments

@falkorichter
Copy link

We are updating some data in our els store. The data has a creation timestamp so we fetch buckets of 1 hour length from the database. Therefore we need to fetch all items from this date bucket. Since we don´t know how many items could be in the bucket, we se the size of search to Integer.MAX_VALUE.

We´re seeing a dramatic drop in performance when setting the hit size that high. When we set the size closer to the actual size, the performance of the request is again expectably fast.

Scroll to my comment for a solution using the scroll API

we tried to reproduce the problem as well using the REST API directly:
With Integer.MAX_VALUE (POST {{elastic_search_host}}/resolve_log_v1/entry/_search?pretty)

{
    "size" : 2147483647,
    "query" : {
        "bool" : {
            "must" : [
                { "range" : { "trigger.timestamp" : { "gte": "2015-07-26T16:00:00.000Z", "lte" : "2015-07-26T17:00:00.000Z" } } }
            ]
        }
    }
}

Result:

{
  "took": 22856,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 21,
    "max_score": 1,
    "hits": [

same request with size set to 21

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 21,
    "max_score": 1,
    "hits": [
      {
        "_index": "resolve_log_v1",
        "_type": "entry",
        "_id": "4851a61c-3a17-494d-b72e-9bce172d45e6",

when setting the size to 0 we still get a quick response.

The total count of entry objects (:

curl -X POST -H "Content-Type: application/json" -d '{    "size" : 0 }' 'http://localhost:9222/resolve_log_v1/entry/_search?pretty':
{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1022824,
    "max_score": 0,
    "hits": []
  }
}

To optimize, we now do two queries, 1 count, one to geht the hits:

BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().must(rangeQuery("trigger.timestamp").gte(fromJoda.toString()).lte(toJoda.toString()));
SearchResponse sizeResponse = client.prepareSearch(SearchProvider.INDEX)
                    .setTypes(TYPE)
                    .setSearchType(SearchType.COUNT)
                    .setQuery(queryBuilder)
                    .execute()
                    .actionGet();

SearchResponse response = client.prepareSearch(SearchProvider.INDEX)
                    .setTypes(TYPE)
                    .setQuery(queryBuilder)
                    .setSize((int) sizeResponse.getHits().getTotalHits())
                    .execute()
                    .actionGet();

this is not ideal, as the bucket theoretically could change.

@nik9000
Copy link
Member

nik9000 commented Aug 26, 2015

I'm pretty sure this is a duplicate. I'll hunt down what this is a duplicate of. OTOH you've left a workaround so thanks for that.

@jasontedor
Copy link
Member

Have you considered using a scroll? I would advise against setting the result size so large given the data structures etc. that are allocated to handle result sets. The performance impact you're seeing is not a surprise.

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

Note that such a request opens a "scroll context" that will enable you to continue to fetch the results of the initial search request.

@jasontedor jasontedor added :Search/Search Search-related issues that do not fall into other categories feedback_needed labels Aug 26, 2015
@nik9000
Copy link
Member

nik9000 commented Aug 26, 2015

Have you considered using a scroll? I would advise against setting the result size so large given the data structures etc. that are allocated to handle result sets. The performance impact you're seeing is not a surprise.

Yeah - this is really much better than your workaround. I'd love for this to be less of a thing - for elasticsearch to reject requests that are unreasonably large with a helpful message pointing you to scroll. And for some requests to be streamable. But neither of those are implemented yet. I thought there were issues opened for them but I've misplaced them.

@clintongormley
Copy link
Contributor

See #9311 and #11511

Definitely a duplicate, and scrolling is the right answer

@nik9000
Copy link
Member

nik9000 commented Aug 26, 2015

Definitely a duplicate, and scrolling is the right answer

There we go. Thanks.

@falkorichter
Copy link
Author

I think I got the solution. Scroll to get a snapshot of the request that I can work with, then scroll through the list and collect all the items.

I´ll post my working solution when I find time testing and finishin it. Thx for the good hints and answers. 🙇

@jasontedor
Copy link
Member

That's exactly right.

@falkorichter
Copy link
Author

here is my solution, hopefully it helps people that come across the same problem:

SearchRequestBuilder sizeQuery = client.prepareSearch(SearchProvider.INDEX)
        .setTypes(TYPE)
        .setSearchType(SearchType.SCAN)
        .setSize(10)
        .setScroll(TimeValue.timeValueSeconds(60))
        .setQuery(queryBuilder);

SearchResponse scrollResponse = sizeQuery.execute().actionGet();
long totalHits = scrollResponse.getHits().getTotalHits();

//might be an optimization, delete the scroll when size is 0, it´s deleted anyways when getting the first result...
//if (totalHits == 0) {
//    ClearScrollRequest clear = new ClearScrollRequest();
//    clear.addScrollId(scrollResponse.getScrollId());
//    client.clearScroll(clear);
//    return result;
//}

String scrollID = scrollResponse.getScrollId();
SearchResponse response;
do {
    response = client.prepareSearchScroll(scrollID)
            .setScroll(TimeValue.timeValueSeconds(60))
            .get();
    scrollID = response.getScrollId();
    for (SearchHit searchHit : response.getHits()) {
        EsEntry esEntry = new EsEntry();
        esEntry.setId(searchHit.getId());
        esEntry.setRawData(searchHit.getSource());
        result.getEntryList().add(esEntry);
    }
} while (response.getHits().hits().length > 0);
return result;

@iDIKSHA
Copy link

iDIKSHA commented Sep 4, 2018

Well Search type scan and count are now deprecated #1745 So how this solution will be implemented with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback_needed :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

5 participants