-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid loading shard metadata while closing #29140
Avoid loading shard metadata while closing #29140
Conversation
If `ShardStateMetaData.FORMAT.loadLatestState` is called while a shard is closing, the shard metadata directory may be deleted after its existence has been checked but before the Lucene `Directory` has been created. When the `Directory` is created, the just-deleted directory is brought back into existence. There are three places where `loadLatestState` is called in a manner that leaves it open to this race. This change ensures that these calls occur either under a `ShardLock` or else while holding a reference to the existing `Store`. In either case, this protects the shard metadata directory from concurrent deletion. Cf elastic#19338, elastic#21463, elastic#25335 and https://issues.apache.org/jira/browse/LUCENE-7375
Pinging @elastic/es-distributed |
Note to reviewers: I have assumed a certain amount of consistency between I also don't have a good plan for testing this. Pointers appreciated. |
@bleskes, any thoughts here? |
Maybe it's a naive solution, but isn't it enough to just make sure all access in the |
We discussed this on Zoom, and decided that it'd be more appropriate to ask the NB the Within |
I tried this. I don't particularly like having the call to |
Why don't you like it? IndexShard is already the one that writes it. Alternatively we can keep an in memory copy of it, thought I personally don't feel it's needed. |
Really, just that it involved importing things that weren't already there, which hinted that something was wrong. If you're good with it then that's enough. Next up is to try and get a failing test for this. |
I think I'm missing something - |
I added a test that fails occasionally on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new AlreadyClosedException(shardId + " can't load shard state metadata - shard is closed"); | ||
} | ||
|
||
return ShardStateMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry, dataLocations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very useful, thanks. This makes things much simpler. I pushed 3eff6c9.
public ShardStateMetaData loadShardStateMetaDataIfOpen(NamedXContentRegistry namedXContentRegistry, Path[] dataLocations) | ||
throws IOException { | ||
synchronized (mutex) { | ||
if (state == IndexShardState.CLOSED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is not needed if making our own ShardStateMetaData
so I will remove it.
@@ -2059,6 +2061,17 @@ public void startRecovery(RecoveryState recoveryState, PeerRecoveryTargetService | |||
} | |||
} | |||
|
|||
public ShardStateMetaData loadShardStateMetaDataIfOpen(NamedXContentRegistry namedXContentRegistry, Path[] dataLocations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per comment below this is not needed since we can make our own ShardStateMetaData
.
@@ -2059,6 +2061,17 @@ public void startRecovery(RecoveryState recoveryState, PeerRecoveryTargetService | |||
} | |||
} | |||
|
|||
public ShardStateMetaData loadShardStateMetaDataIfOpen(NamedXContentRegistry namedXContentRegistry, Path[] dataLocations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, I think, because otherwise it was possible we'd get hold of an IndexShard
while it was closing and then fail to load the metadata since it'd already been deleted. However, as per comment below we don't need to touch the disk here.
@@ -139,7 +140,10 @@ private StoreFilesMetaData listStoreMetaData(ShardId shardId) throws IOException | |||
return new StoreFilesMetaData(shardId, Store.MetadataSnapshot.EMPTY); | |||
} | |||
final IndexSettings indexSettings = indexService != null ? indexService.getIndexSettings() : new IndexSettings(metaData, settings); | |||
final ShardPath shardPath = ShardPath.loadShardPath(logger, nodeEnv, shardId, indexSettings); | |||
final ShardPath shardPath; | |||
try (ShardLock ignored = nodeEnv.shardLock(shardId, TimeUnit.SECONDS.toMillis(5))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at how we could be in a situation in which the shard lock is unavailable for a long time. This'd be the case if the shard was open, but that means there's an IndexShard
so we don't get here. More precisely, there are some circumstances in which we could get here and then fail to get the shard lock because the shard is now open, but retrying is the thing to do here.
All the other usages of the shard lock seem short-lived. They protect some IO (e.g. deleting the shards, etc) so may take some time, but not infinitely long.
Also, we obtain the same shard lock a few lines down, in Store.readMetadataSnapshot
, unless ShardPath.loadShardPath
returns null
.
Could you clarify, @ywelsch?
ShardStateMetaData shardStateMetaData = ShardStateMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY, | ||
nodeEnv.availableShardPaths(request.shardId)); | ||
|
||
ShardStateMetaData shardStateMetaData = safelyLoadLatestState(shardId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I moved this code around in 7f835cc. I'm not 100% comfortable with the changes made since I'm unfamiliar with all the invariants that may or may not hold here - please tread carefully.
@@ -138,7 +159,9 @@ protected NodeGatewayStartedShards nodeOperation(NodeRequest request) { | |||
ShardPath shardPath = null; | |||
try { | |||
IndexSettings indexSettings = new IndexSettings(metaData, settings); | |||
shardPath = ShardPath.loadShardPath(logger, nodeEnv, shardId, indexSettings); | |||
try (ShardLock ignored = nodeEnv.shardLock(shardId, TimeUnit.SECONDS.toMillis(5))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We obtain the same shard lock a few lines down, in Store.tryOpenIndex(...)
, unless ShardPath.loadShardPath
returns null
in which case we throw a different exception.
listingThread.start(); | ||
} | ||
|
||
// Deleting an index asserts that it really is gone from disk, so no other assertions are necessary here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I pushed 48f6d46
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would like to wait for @ywelsch blessings as well.
if (indexShard != null) { | ||
final ShardStateMetaData shardStateMetaData = indexShard.getShardStateMetaData(); | ||
final String allocationId = shardStateMetaData.allocationId != null ? | ||
shardStateMetaData.allocationId.getId() : null; | ||
logger.debug("{} shard state info found: [{}]", shardId, shardStateMetaData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be chatty. Can we move back to trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I pushed 7e58bc6
final IndexShard indexShard = indicesService.getShardOrNull(shardId); | ||
if (indexShard != null) { | ||
final ShardStateMetaData shardStateMetaData = indexShard.getShardStateMetaData(); | ||
final String allocationId = shardStateMetaData.allocationId != null ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocationIds have been around since I don't know how long. When can this be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its declaration says this:
elasticsearch/server/src/main/java/org/elasticsearch/index/shard/ShardStateMetaData.java
Lines 44 to 45 in 6538542
@Nullable | |
public final AllocationId allocationId; // can be null if we read from legacy format (see fromXContent and MultiDataPathUpgrader) |
There are lots of other null checks too. Maybe worth addressing separately?
I'm good with doing this at a different PR.
…On Thu, May 24, 2018 at 1:16 PM David Turner ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
server/src/main/java/org/elasticsearch/gateway/TransportNodesListGatewayStartedShards.java
<#29140 (comment)>
:
>
+ final IndexShard indexShard = indicesService.getShardOrNull(shardId);
+ if (indexShard != null) {
+ final ShardStateMetaData shardStateMetaData = indexShard.getShardStateMetaData();
+ final String allocationId = shardStateMetaData.allocationId != null ?
Its declaration says this:
https://github.com/elastic/elasticsearch/blob/65385426033fe105df8aee61d97d7d92b4ab0ecf/server/src/main/java/org/elasticsearch/index/shard/ShardStateMetaData.java#L44-L45
There are lots of other null checks too. Maybe worth addressing separately?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29140 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA9bJ0C1Ez-HYuP4phC-IwM5qswB0uUPks5t1pZzgaJpZM4SwXd3>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few more asks and comments.
@@ -2065,6 +2065,12 @@ public void startRecovery(RecoveryState recoveryState, PeerRecoveryTargetService | |||
} | |||
} | |||
|
|||
public ShardStateMetaData getShardStateMetaData() { | |||
synchronized (mutex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can avoid the mutex here. just do a one-time volatile read of shardrouting (which is an immutable object). indexSettings.getUUID()
are a final object and the uuid is immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I pushed 1d4e044
@@ -139,7 +140,10 @@ private StoreFilesMetaData listStoreMetaData(ShardId shardId) throws IOException | |||
return new StoreFilesMetaData(shardId, Store.MetadataSnapshot.EMPTY); | |||
} | |||
final IndexSettings indexSettings = indexService != null ? indexService.getIndexSettings() : new IndexSettings(metaData, settings); | |||
final ShardPath shardPath = ShardPath.loadShardPath(logger, nodeEnv, shardId, indexSettings); | |||
final ShardPath shardPath; | |||
try (ShardLock ignored = nodeEnv.shardLock(shardId, TimeUnit.SECONDS.toMillis(5))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In TransportNodesListGatewayStartedShards
and in Store.readMetadataSnapshot
, which we call below, we catch the ShardLockObtainFailedException
and treat it either as an empty store (in case of TransportNodesListShardStoreMetaData
) or as an ok target for primary allocation (see TransportNodesListGatewayStartedShards
and PrimaryShardAllocator.buildNodeShardsResult
), but we've made sure not to end up in a situation where the master goes into a potentially long retry loop (which causes a reroute storm on the master). I don't want to open this box of Pandora here, so my suggestion is to add
} catch (ShardLockObtainFailedException ex) {
logger.info(() -> new ParameterizedMessage("{}: failed to obtain shard lock", shardId), ex);
return new StoreFilesMetaData(shardId, Store.MetadataSnapshot.EMPTY);
}
here so as not to mess with existing behavior.
if (shardPath == null) { | ||
throw new IllegalStateException(shardId + " no shard path found"); | ||
} | ||
Store.tryOpenIndex(shardPath.resolveIndex(), shardId, nodeEnv::shardLock, logger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of acquiring the shard lock for a second time, I would prefer if we would do it once, and move this call under that lock and just rename tryOpenIndex
to tryOpenIndexUnderLock
, removing the locking mechanism from it.
Same thing for TransportNodesListShardStoreMetaData
. You can then also remove the ShardLocker
interface, which irked me for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
final ShardStateMetaData shardStateMetaData; | ||
try (ShardLock ignored = nodeEnv.shardLock(shardId, TimeUnit.SECONDS.toMillis(5))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just spotted this - there are still two calls to nodeEnv.shardLock
here. TBH I don't know what we should be doing on failure of this one.
Thanks to @ywelsch for further guidance about failure cases. Thinking further about elasticsearch/server/src/main/java/org/elasticsearch/gateway/PrimaryShardAllocator.java Lines 259 to 265 in 0ff2c60
|
This PR represents an actual issue, and all the other issues that point to it were closed in its favour, but the consequences of I would like to explore the idea of loading the metadata of every on-disk index much earlier in the lifecycle of a node, avoiding these concurrency issues (of course introducing different ones in their place, but perhaps the new ones will be less tricky). |
I think it makes sense to explore alternative ways of coordinating the loading of shard state metadata. We have fixed the current test failures by weakening the assertions on the existence of a shard folder after clean-up. As there is no immediate plan to work on this, I'm closing this one out. |
If
ShardStateMetaData.FORMAT.loadLatestState
is called while a shard isclosing, the shard metadata directory may be deleted after its existence has
been checked but before the Lucene
Directory
has been created. When theDirectory
is created, the just-deleted directory is brought back intoexistence.
There are three places where
loadLatestState
is called in a manner thatleaves it open to this race. This change ensures that these calls occur either
under a
ShardLock
or else while holding a reference to the existingStore
.In either case, this protects the shard metadata directory from concurrent
deletion.
Cf #19338, #21463, #25335 and https://issues.apache.org/jira/browse/LUCENE-7375