You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS version : Linux 4.14.51-60.38.amzn1.x86_64 #1 SMP Tue Jun 26 23:06:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Occasionally cluster snapshot operation fails with Failed to finalize snapshot creation exception caused by AmazonS3Exception: The specified key does not exist. (see full stacktrace at the bottom). Resulted snapshot marked as partial. Exception is thrown in writeAtomic when moving a newly created temporary index file.
Temporary file itself is created in the same method by writeBlob method, which in turn calls writeBlob on S3BlobContainer. Note that before writing the blob blobExists is called to check whether it already exists.
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.
That's why move operation fails with The specified key does not exist – before writing the blob its existence is verified, which makes subsequent write operation to be eventually consistent and hence newly written index file is not immediately available.
As a potential solution BlobContainer interface could be extended with an additional method which writes the blob w/o verifying its existence. ChecksumBlobStoreFormat could use that method instead of writeBlob to make write operation immediately consistent. To avoid any potential name conflicts ChecksumBlobStoreFormat#tempBlobName could have some random part in it.
Provide logs (if relevant):
Full stacktrace
[2018-09-14T04:00:27,071][WARN ][o.e.s.SnapshotShardsService] [data.localdomain] [[index_name][6]][bucket_name:snapshot_2018-09-14t04:00:10z/vm0BgXHgSRuZad49bQ_OIA]
failed to snapshot shard
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to finalize snapshot creation [snapshot_2018-09-14t04:00:10z/vm0BgXHgSRuZad49bQ_OIA] with shard index [index-54]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1015) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1255) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:813) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:410) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:97) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:354) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 599340CA5D293CCD; S3 Extended Request ID: BJID/TmaSF3GyAt8PYxin+3MQ2F6si83vFqfi3xXlZkNYotCWPBnc9UYsU8PYMKXQ8ffzK8Yluw=), S3 Extended Request ID: BJID/TmaSF3GyAt8PYxin+3MQ2F6si83vFqfi3xXlZkNYotCWPBnc9UYsU8PYMKXQ8ffzK8Yluw=
at org.elasticsearch.repositories.s3.S3BlobContainer.move(S3BlobContainer.java:177) ~[?:?]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:986) ~[elasticsearch-6.3.0.jar:6.3.0]
... 10 more
Suppressed: java.nio.file.NoSuchFileException: Blob [pending-index-54] does not exist
at org.elasticsearch.repositories.s3.S3BlobContainer.deleteBlob(S3BlobContainer.java:116) ~[?:?]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:142) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:986) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1255) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:813) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:410) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:97) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:354) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 599340CA5D293CCD; S3 Extended Request ID: BJID/TmaSF3GyAt8PYxin+3MQ2F6si83vFqfi3xXlZkNYotCWPBnc9UYsU8PYMKXQ8ffzK8Yluw=)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639) ~[?:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304) ~[?:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) ~[?:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[?:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[?:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[?:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[?:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[?:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[?:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4247) ~[?:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194) ~[?:?]
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1870) ~[?:?]
at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$move$6(S3BlobContainer.java:172) ~[?:?]
at org.elasticsearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:57) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_171]
at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:56) ~[?:?]
at org.elasticsearch.repositories.s3.S3BlobContainer.move(S3BlobContainer.java:171) ~[?:?]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
The text was updated successfully, but these errors were encountered:
@gregolsen there's no reason to feel embarrassed. I prefer closing an issue that's already fixed to missing out on a bug report. Thanks for sharing your findings and analysis with us.
Elasticsearch version:
6.3.0
Plugins installed:
[discovery-ec2, repository-s3]
JVM version:
java version "1.8.0_171"
OS version :
Linux 4.14.51-60.38.amzn1.x86_64 #1 SMP Tue Jun 26 23:06:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Occasionally cluster snapshot operation fails with
Failed to finalize snapshot creation
exception caused byAmazonS3Exception: The specified key does not exist.
(see full stacktrace at the bottom). Resulted snapshot marked as partial. Exception is thrown inwriteAtomic
when moving a newly created temporary index file.elasticsearch/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
Line 138 in d879cf4
Temporary file itself is created in the same method by
writeBlob
method, which in turn callswriteBlob
onS3BlobContainer
. Note that before writing the blobblobExists
is called to check whether it already exists.elasticsearch/plugins/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobContainer.java
Lines 98 to 101 in 2b4e7a1
Most likely the problem is that S3 only guarantees eventual consistency for read-after-write in case "exists" request was issued before write.
According to S3 documentation (see "Amazon S3 Data Consistency Model
" section):
That's why
move
operation fails withThe specified key does not exist
– before writing the blob its existence is verified, which makes subsequent write operation to be eventually consistent and hence newly written index file is not immediately available.As a potential solution
BlobContainer
interface could be extended with an additional method which writes the blob w/o verifying its existence.ChecksumBlobStoreFormat
could use that method instead ofwriteBlob
to make write operation immediately consistent. To avoid any potential name conflictsChecksumBlobStoreFormat#tempBlobName
could have some random part in it.Provide logs (if relevant):
Full stacktrace
The text was updated successfully, but these errors were encountered: