Invalid indexes getting created when master crashes due to OOM #9129

mosiddi · 2015-01-03T09:27:16Z

Recently in one of our ES cluster, we ran into an interesting issue. The below is the sequence of events that happened -

ES Cluster: 3 query, 3 master, 3 data nodes
Azure VMs
Masters are A2 machines with heap size set to 2 GB

Around 1000 index create requests were sent to master 1 within a span of 15 minutes, master successfully created a set of indexes (~800) and failed with timeout exception for the rest
a. We were not waiting for create request to really complete from ES side, just ack was what we took dependency on. This is something we will fix at our end.
b. Above when I say 'successfully created', I mean we got proper ack and not timeout exception
Master 1 crashed in the middle of processing # 1 above. 2 reasons -
a. Heap size grew and there was an OOM
b. In the call stack it was Marvel exporter
There were couple of GC collector calls for the initial create request timeouts we saw before master crashed.
A new master (3) took the role of master and started looking into index shard rebalancing. It kept failing for a sizeable number of indexes (~500) with the below exeption

[Failed to start shard, message [IndexShardGatewayRecoveryException[[][2] failed to fetch index version after copying it over]; nested: IndexShardGatewayRecoveryException[[*************][2] shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFoundException[no segments file found in store(least_used[rate_limited(mmapfs(F:\data*\nodes\0\indices*********\2\index), type=MERGE, rate=20.0)]): files: []]; ]]

This kept happening

When we looked into the indexes for which the master was failing with shard initialization, what we noticed was either the indexes (folder and files in data nodes) didn't exist OR the folder exists with no files in any of the data nodes.

Couple of Qs -

How master maintains the admin write consistency quorum. When it adds a new index, does it update other master and syncs with them so the quorum is maintained?
Does ES maintain the create index sub-states transaction ally (checkpoints) so whatever state the index creation was, when one master crashes the new master can pick maintain idempotency?
Is this something we have seen earlier?

clintongormley · 2015-01-05T11:26:16Z

@bleskes could you comment on this?

clintongormley · 2015-01-05T11:40:46Z

Related to #9130

bleskes · 2015-01-05T13:01:52Z

@mosiddi thx for the detailed report. I want to do some research first. I'll get back to you asap.

mosiddi · 2015-01-05T13:09:36Z

Thanks @bleskes! I'll wait for ur analysis.

bleskes · 2015-01-13T06:44:28Z

@mosiddi sorry for taking long to get back to you. I did some research but I can not see how this can happen given the current information - which means I'm missing something. Any chance you save the logs and case share them? a reproduction would also be great.

Regarding your questions:

How master maintains the admin write consistency quorum. When it adds a new index, does it update other master and syncs with them so the quorum is maintained?

When the master updates the cluster state it publishes it to all the nodes and waits for their response (up to 30 seconds)

Does ES maintain the create index sub-states transaction ally (checkpoints) so whatever state the index creation was, when one master crashes the new master can pick maintain idempotency?

Not sure exactly what you mean, but a master crash shouldn't break the indices - another master should just pick things up where the old master left.

Is this something we have seen earlier?

No :)

mosiddi · 2015-01-13T07:00:13Z

I do have the logs... I can share... Can you tell me how to share the logs with you as they will be a bit huge (I will filter out not needed contents though)..

bleskes · 2015-01-13T13:56:12Z

you can mail them to me using first name dot last name at elasticsearch.com . Compress and depending on size you can use something like wetransfer.com . If possible please don't remove anything - you never know what might be relevant...

mosiddi · 2015-01-13T14:08:41Z

I mailed you the logs from master 01 and master 03

mosiddi · 2015-01-13T16:04:34Z

Hi @bleskes - Can you also look @ #9192 and comment :)

bleskes · 2015-01-13T17:36:35Z

Thx for the logs. I think it clarifies things. Your first master got overloaded by the create index request (which were fired without waiting for an answer) to a point it got an OutOfMemory exception. Master 3 took over but because of your very high timeout settings it took a long time (45s ping timeout, 10 retries -> it takes 7.5 minutes worse case for a master failure to be discovered). In that time the old master, which was not completely out but rather dis-functional due to the OOM created some havoc by sending conflicting information to the nodes - most notably in this case - it was publishing cluster state which didn't contain indices that the new master has created. That mislead the nodes to think the indices were deleted and removed them. The error you see are the result of the nodes responding to the new master, telling it that the indices it's talking about do not exist on disk.

A lot work went into 1.4 to make it more resilient to these issues (and others see: http://www.elasticsearch.org/guide/en/elasticsearch/resiliency/current/ ). I suggest you upgrade and check again if this happens to you.

I also strongly suggest to remove your custom fault detection ping timeout settings and keep the defaults.

mosiddi · 2015-01-13T17:51:09Z

Thanks @bleskes ! This is good data. We will plan to move to 1.4 then. I'm closing the issue since I've the answer now.

clintongormley assigned bleskes Jan 5, 2015

mosiddi closed this as completed Jan 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid indexes getting created when master crashes due to OOM #9129

Invalid indexes getting created when master crashes due to OOM #9129

mosiddi commented Jan 3, 2015

clintongormley commented Jan 5, 2015

clintongormley commented Jan 5, 2015

bleskes commented Jan 5, 2015

mosiddi commented Jan 5, 2015

bleskes commented Jan 13, 2015

mosiddi commented Jan 13, 2015

bleskes commented Jan 13, 2015

mosiddi commented Jan 13, 2015

mosiddi commented Jan 13, 2015

bleskes commented Jan 13, 2015

mosiddi commented Jan 13, 2015

Invalid indexes getting created when master crashes due to OOM #9129

Invalid indexes getting created when master crashes due to OOM #9129

Comments

mosiddi commented Jan 3, 2015

clintongormley commented Jan 5, 2015

clintongormley commented Jan 5, 2015

bleskes commented Jan 5, 2015

mosiddi commented Jan 5, 2015

bleskes commented Jan 13, 2015

mosiddi commented Jan 13, 2015

bleskes commented Jan 13, 2015

mosiddi commented Jan 13, 2015

mosiddi commented Jan 13, 2015

bleskes commented Jan 13, 2015

mosiddi commented Jan 13, 2015