Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid indexes getting created when master crashes due to OOM #9129

Closed
mosiddi opened this issue Jan 3, 2015 · 11 comments
Closed

Invalid indexes getting created when master crashes due to OOM #9129

mosiddi opened this issue Jan 3, 2015 · 11 comments
Assignees

Comments

@mosiddi
Copy link

mosiddi commented Jan 3, 2015

Recently in one of our ES cluster, we ran into an interesting issue. The below is the sequence of events that happened -

ES Cluster: 3 query, 3 master, 3 data nodes
Azure VMs
Masters are A2 machines with heap size set to 2 GB

  1. Around 1000 index create requests were sent to master 1 within a span of 15 minutes, master successfully created a set of indexes (~800) and failed with timeout exception for the rest
    a. We were not waiting for create request to really complete from ES side, just ack was what we took dependency on. This is something we will fix at our end.
    b. Above when I say 'successfully created', I mean we got proper ack and not timeout exception
  2. Master 1 crashed in the middle of processing # 1 above. 2 reasons -
    a. Heap size grew and there was an OOM
    b. In the call stack it was Marvel exporter
    There were couple of GC collector calls for the initial create request timeouts we saw before master crashed.
  3. A new master (3) took the role of master and started looking into index shard rebalancing. It kept failing for a sizeable number of indexes (~500) with the below exeption

[Failed to start shard, message [IndexShardGatewayRecoveryException[[][2] failed to fetch index version after copying it over]; nested: IndexShardGatewayRecoveryException[[*************][2] shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFoundException[no segments file found in store(least_used[rate_limited(mmapfs(F:\data*\nodes\0\indices*********\2\index), type=MERGE, rate=20.0)]): files: []]; ]]

  1. This kept happening

When we looked into the indexes for which the master was failing with shard initialization, what we noticed was either the indexes (folder and files in data nodes) didn't exist OR the folder exists with no files in any of the data nodes.

Couple of Qs -

  1. How master maintains the admin write consistency quorum. When it adds a new index, does it update other master and syncs with them so the quorum is maintained?
  2. Does ES maintain the create index sub-states transaction ally (checkpoints) so whatever state the index creation was, when one master crashes the new master can pick maintain idempotency?
  3. Is this something we have seen earlier?
@clintongormley
Copy link
Contributor

@bleskes could you comment on this?

@clintongormley
Copy link
Contributor

Related to #9130

@bleskes
Copy link
Contributor

bleskes commented Jan 5, 2015

@mosiddi thx for the detailed report. I want to do some research first. I'll get back to you asap.

@mosiddi
Copy link
Author

mosiddi commented Jan 5, 2015

Thanks @bleskes! I'll wait for ur analysis.

@bleskes
Copy link
Contributor

bleskes commented Jan 13, 2015

@mosiddi sorry for taking long to get back to you. I did some research but I can not see how this can happen given the current information - which means I'm missing something. Any chance you save the logs and case share them? a reproduction would also be great.

Regarding your questions:

  1. How master maintains the admin write consistency quorum. When it adds a new index, does it update other master and syncs with them so the quorum is maintained?

When the master updates the cluster state it publishes it to all the nodes and waits for their response (up to 30 seconds)

  1. Does ES maintain the create index sub-states transaction ally (checkpoints) so whatever state the index creation was, when one master crashes the new master can pick maintain idempotency?

Not sure exactly what you mean, but a master crash shouldn't break the indices - another master should just pick things up where the old master left.

  1. Is this something we have seen earlier?

No :)

@mosiddi
Copy link
Author

mosiddi commented Jan 13, 2015

I do have the logs... I can share... Can you tell me how to share the logs with you as they will be a bit huge (I will filter out not needed contents though)..

@bleskes
Copy link
Contributor

bleskes commented Jan 13, 2015

you can mail them to me using first name dot last name at elasticsearch.com . Compress and depending on size you can use something like wetransfer.com . If possible please don't remove anything - you never know what might be relevant...

@mosiddi
Copy link
Author

mosiddi commented Jan 13, 2015

I mailed you the logs from master 01 and master 03

@mosiddi
Copy link
Author

mosiddi commented Jan 13, 2015

Hi @bleskes - Can you also look @ #9192 and comment :)

@bleskes
Copy link
Contributor

bleskes commented Jan 13, 2015

Thx for the logs. I think it clarifies things. Your first master got overloaded by the create index request (which were fired without waiting for an answer) to a point it got an OutOfMemory exception. Master 3 took over but because of your very high timeout settings it took a long time (45s ping timeout, 10 retries -> it takes 7.5 minutes worse case for a master failure to be discovered). In that time the old master, which was not completely out but rather dis-functional due to the OOM created some havoc by sending conflicting information to the nodes - most notably in this case - it was publishing cluster state which didn't contain indices that the new master has created. That mislead the nodes to think the indices were deleted and removed them. The error you see are the result of the nodes responding to the new master, telling it that the indices it's talking about do not exist on disk.

A lot work went into 1.4 to make it more resilient to these issues (and others see: http://www.elasticsearch.org/guide/en/elasticsearch/resiliency/current/ ). I suggest you upgrade and check again if this happens to you.

I also strongly suggest to remove your custom fault detection ping timeout settings and keep the defaults.

@mosiddi
Copy link
Author

mosiddi commented Jan 13, 2015

Thanks @bleskes ! This is good data. We will plan to move to 1.4 then. I'm closing the issue since I've the answer now.

@mosiddi mosiddi closed this as completed Jan 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants