-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure metadata folder is not resurrected when loading latest state file #19338
Ensure metadata folder is not resurrected when loading latest state file #19338
Conversation
* | ||
* Only supports the {@link Directory#openInput(String,IOContext)} method. | ||
*/ | ||
public class SimpleReadOnlyFSDirectory extends BaseDirectory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s1monw I found this to be the simplest solution, especially as MetaDataStateFormat
exposes the Directory
for the tests (to wrap it in MockDirectoryWrapper).
It really is quite scary to have to fork Lucene's entire |
I opened https://issues.apache.org/jira/browse/LUCENE-7375 for this |
@ywelsch is this stalled pending the linked Lucene change, or is it waiting on other review? |
no progress on the Lucene end. Should we proceed with the current solution here? |
Maybe another route here is to use |
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
Just want to mention that this test failed 5 times so far in March. The latest failure is here https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.x+multijob-unix-compatibility/os=ubuntu/619/console |
@bleskes The issue is that |
I think the easiest solution here would be to have Lucene expose SimpleFSIndexInput as public. We have been having test failures for more than a year now and it seems impossible to get this PR done, so I will just close it. |
@s1monw can you please take another look at this and see whether there's a good way to proceed? |
If `ShardStateMetaData.FORMAT.loadLatestState` is called while a shard is closing, the shard metadata directory may be deleted after its existence has been checked but before the Lucene `Directory` has been created. When the `Directory` is created, the just-deleted directory is brought back into existence. There are three places where `loadLatestState` is called in a manner that leaves it open to this race. This change ensures that these calls occur either under a `ShardLock` or else while holding a reference to the existing `Store`. In either case, this protects the shard metadata directory from concurrent deletion. Cf elastic#19338, elastic#21463, elastic#25335 and https://issues.apache.org/jira/browse/LUCENE-7375
If
MetaDataStateFormat.loadLatestState()
is called while the folder on which it operates is being deleted, it is possible that this method resurrects the folder. Possibly affects folders for shard state metadata, index metadata or global metadata.Test failure showing the issue:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/729/console
Log snippets showing issue:
and
The failing assertion checks after successful shard deletion that the shard folder has gone. It fails as the folder reappears. The reason for this is that there is still a concurrent operation on the node loading shard state metadata (triggered by async fetching).