Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrolling in ES 5.4 #26238

Closed
GaryTurpin opened this issue Aug 16, 2017 · 6 comments
Closed

Scrolling in ES 5.4 #26238

GaryTurpin opened this issue Aug 16, 2017 · 6 comments

Comments

@GaryTurpin
Copy link

GaryTurpin commented Aug 16, 2017

Elasticsearch version (bin/elasticsearch --version):
5.4
Nest 5.4

Description of the problem including expected versus actual behavior:
We recently upgraded from ES 2.4 to ES 5.4 and everything went great except we noticed that our scrolling times increased in the amount of time that it takes. We use scan/scroll to pull back about 3 million docs from our index. In ES 2.4, this would take roughly 40 seconds to get all of the docs. When we moved to ES 5.4, we are averaging about 240 seconds. Any ideas why this would be dramatically slower with the similar json (only change was changing the initial query to use search instead of scan)?

image

@jpountz
Copy link
Contributor

jpountz commented Aug 16, 2017

Is my understanding correct that the time went up after you both upgraded and switched from _scan to _search? If yes, then this is likely due to the fact that scan pulls num_shards * size hits per page while _search pulls only size hits per page. You should get better results by increasing your size parameter so that it is on par with what it used to be in practice with _scan. Ie. multiply it by the number of shards that you are querying.

Please reopen this issue if the recommendation does not fix the problem.

@jpountz jpountz closed this as completed Aug 16, 2017
@GaryTurpin
Copy link
Author

Your understanding is correct with us changing from _scan to _search when we upgraded to ES 5.4. We currently have 4 shards and we had the size set to 50. Last night, I changed that size from 50 to 1000 and didn't see a difference at all in the average time to scroll through the whole set of docs. Do you have any other suggestions?

@jpountz
Copy link
Contributor

jpountz commented Aug 17, 2017

Can you share the first request that you are issuing that includes the query and sort order?

If you can get the nodes hot threads while you are paginating through matches, it might also give us information about the bottleneck.

@jpountz jpountz reopened this Aug 17, 2017
@GaryTurpin
Copy link
Author

Here it is.
/_search?scroll=2m {"from":0,"size":1000,"sort":[{"_doc":{}}],"_source":{"includes":["_id","_score","title","posted","provider","city","state","zip","location","description.html","ispremium","company","jobproviderid","featuredstartdate","featuredexpirationdate","careertitleid","geoarea","geoareatype","source","trackingpixelurl","employerid","tagsinternal","industryid","urlhostname","url"]},"query":{"bool":{"must":[{"term":{"ispremium":{"value":"true"}}},{"terms":{"jobproviderid":["1","2","3","10","11","12","18"]}},{"terms":{"industryid":["1"]}}]}}}

@GaryTurpin
Copy link
Author

After looking at this a lot more, it looks like the took is roughly what I would expect. I think the issue may be either in Nest or with the way we use Nest. Should I create a new issue with Nest and refer them to this issue? I think it has something with deserialization.

@nik9000
Copy link
Member

nik9000 commented Aug 17, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants