-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance BufferedChecksumIndexInput error when a state file is empty #29358
Comments
Pinging @elastic/es-core-infra |
I wonder that this might have improved thanks to LUCENE-7831, which is in Elasticsearch 5.6+ and 6.0+. This issue improved the error message (it would contain |
@jpountz This is on startup where we read the index metadata (to possibly upgrade the index metadata). While a
How will this help? Are we taking different actions based on whether the file is empty or corrupted otherwise?
In the above message it was not the shard that got corrupted, but the index metadata. If that index metadata is not available on any other node, the whole index will effectively be gone. I'm not sure what kind of generic recommendation we could give there. It's probably more on a case-by-case basis (restore from snapshot, reindex from primary source, ...).
File system corruptions are serious, and the system should not try to automatically compensate for them, as this will hurt users even more further down the line. |
@ywelsch Thanks for clarifying! I see we are already doing the right thing at write time by writing to a temp file before renaming, so if I read you correctly there is nothing more to do since the error message was already improved via LUCENE-7831? |
@jpountz yes, maybe worth adding a test case in ES? |
OK, making it a test adoptme. |
@ywelsch Sorry, what is the test? That the exception message is that which comes from upstream? And the test breaks if upstream changes the message and we merely change the assertion? I am not following what the proposal is here. |
@jasontedor That |
FAO @andrershov |
Pinging @elastic/es-distributed (Team:Distributed) |
The metadata index is small and important and only read at startup. Today we rely on Lucene to spot if any of its components is corrupt, but Lucene does not necesssarily verify all checksums in order to catch a corruption. With this commit we run `CheckIndex` on the metadata index first, and fail on startup if a corruption is detected. Closes elastic#29358
We now keep metadata on disk in a Lucene index rather than in files whose checksum we verify ourselves. This mostly gives us more useful error messages on corruptions than the one in the OP, but I believe it isn't completely watertight since Lucene doesn't guarantee to verify every checksum since that would be desperately inefficient on large indices. However it should be fine on small things like the metadata index so I opened #73239 to explicitly check the checksums on this index first, which I think clears up this issue. At least, we now just expose the messages that Lucene reports so any remaining lack of clarity would need to be addressed in Lucene rather than in Elasticsearch. |
The metadata index is small and important and only read at startup. Today we rely on Lucene to spot if any of its components is corrupt, but Lucene does not necesssarily verify all checksums in order to catch a corruption. With this commit we run `CheckIndex` on the metadata index first, and fail on startup if a corruption is detected. Closes #29358
The metadata index is small and important and only read at startup. Today we rely on Lucene to spot if any of its components is corrupt, but Lucene does not necesssarily verify all checksums in order to catch a corruption. With this commit we run `CheckIndex` on the metadata index first, and fail on startup if a corruption is detected. Closes elastic#29358
The metadata index is small and important and only read at startup. Today we rely on Lucene to spot if any of its components is corrupt, but Lucene does not necesssarily verify all checksums in order to catch a corruption. With this commit we run `CheckIndex` on the metadata index first, and fail on startup if a corruption is detected. Closes #29358
Elasticsearch version (
bin/elasticsearch --version
): Any versionPlugins installed: [none]
JVM version (
java -version
): 1.8OS version (
uname -a
if on a Unix-like system): AnyProvide logs (if relevant):
If a state file is empty after some issue (hardware issue, NFS issue and others) a node will fail to start and we will see something like the following in the logs:
Discussing it with @jpountz ,
(pos=-16 getFilePointer()=0)
seems to indicate that the file is empty, but so far we return aIOException
without explaining that a state file is empty.So my question here is:
The text was updated successfully, but these errors were encountered: