[TEST] org.elasticsearch.index.store.CorruptedFileIT failure on 6.0 #26773

s1monw · 2017-09-25T12:05:19Z

Build URL: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+aggressive-opts/251/console

Reproduce command (does not seem to reproduce locally):

gradle :core:integTest -Dtests.seed=34AF58CE756B3308 -Dtests.class=org.elasticsearch.index.store.CorruptedFileIT -Dtests.method="testCorruptFileAndRecover" -Dtests.security.manager=true -Dtests.jvm.argline="-XX:+AggressiveOpts" -Dtests.locale=ar-TN -Dtests.timezone=Europe/Stockholm

gradle :core:integTest -Dtests.seed=34AF58CE756B3308 -Dtests.class=org.elasticsearch.index.store.CorruptedFileIT -Dtests.method="testCorruptFileThenSnapshotAndRestore" -Dtests.security.manager=true -Dtests.jvm.argline="-XX:+AggressiveOpts" -Dtests.locale=ar-TN -Dtests.timezone=Europe/Stockholm

Looks like the checksums pass even though the test creates a file corruption.

Failure:

09:25:13   1> [2017-09-25T11:19:40,543][WARN ][o.e.c.a.s.ShardStateAction] [node_s0] [test][0] received shard failed for shard id [[test][0]], allocation id [Hp8g1Jg-SC6fRgd93cSZPw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[test][0]: Recovery failed from {node_s0}{FLWMrOl2Q3ua6Mz5hdHeSA}{cRqSkkr3SfezpNCXfQz2bg}{127.0.0.1}{127.0.0.1:9521} into {node_s2}{u0g2d44DTaWzJ050pgiPuQ}{tf2OgPERT6-AX8jENIACyw}{127.0.0.1}{127.0.0.1:9520}]; nested: RemoteTransportException[[node_s0][127.0.0.1:38724][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [5] files with total size of [18.5kb]]; nested: RemoteTransportException[[File corruption occurred on recovery but checksums are ok]]; ]
09:25:13   1> org.elasticsearch.indices.recovery.RecoveryFailedException: [test][0]: Recovery failed from {node_s0}{FLWMrOl2Q3ua6Mz5hdHeSA}{cRqSkkr3SfezpNCXfQz2bg}{127.0.0.1}{127.0.0.1:9521} into {node_s2}{u0g2d44DTaWzJ050pgiPuQ}{tf2OgPERT6-AX8jENIACyw}{127.0.0.1}{127.0.0.1:9520}
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:75) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:617) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
09:25:13   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
09:25:13   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
09:25:13   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
09:25:13   1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s0][127.0.0.1:38724][internal:index/shard/recovery/start_recovery]
09:25:13   1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
09:25:13   1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:171) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1526) ~[main/:?]
09:25:13   1> 	... 5 more
09:25:13   1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [18.5kb]
09:25:13   1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:409) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:169) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
09:25:13   1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1526) ~[main/:?]
09:25:13   1> 	... 5 more
09:25:13   1> Caused by: org.elasticsearch.transport.RemoteTransportException: [File corruption occurred on recovery but checksums are ok]
09:25:13   1> 	Suppressed: org.elasticsearch.transport.RemoteTransportException: [node_s2][127.0.0.1:49996][internal:index/shard/recovery/file_chunk]
09:25:13   1> 	Caused by: org.apache.lucene.index.CorruptIndexException: verification failed (hardware problem?) : expected=125bti5 actual=125bti5 footer=null writtenLength=106 expectedLength=107 (resource=name [_0_1.liv], length [107], checksum [125bti5], writtenBy [7.0.0]) (resource=VerifyingIndexOutput(_0_1.liv))
09:25:13   1> 		at org.elasticsearch.index.store.Store$LuceneVerifyingIndexOutput.verify(Store.java:1170) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.index.store.Store.verify(Store.java:496) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.indices.recovery.RecoveryTarget.writeFileChunk(RecoveryTarget.java:484) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FileChunkTransportRequestHandler.messageReceived(PeerRecoveryTargetService.java:580) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FileChunkTransportRequestHandler.messageReceived(PeerRecoveryTargetService.java:553) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1526) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[main/:?]
09:25:13   1> 		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
09:25:13   1> 		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
09:25:13   1> 		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]

this ultimately hits an assertion error on the global checkpoint service:

09:25:14    > 	at __randomizedtesting.SeedInfo.seed([34AF58CE756B3308]:0)
09:25:14    > 	at org.elasticsearch.index.seqno.GlobalCheckpointTracker.getReplicationGroup(GlobalCheckpointTracker.java:357)
09:25:14    > 	at org.elasticsearch.index.seqno.SequenceNumbersService.getReplicationGroup(SequenceNumbersService.java:186)
09:25:14    > 	at org.elasticsearch.index.shard.IndexShard.getReplicationGroup(IndexShard.java:1814)
09:25:14    > 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$recoverToTarget$0(RecoverySourceHandler.java:136)
09:25:14    > 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$runUnderPrimaryPermit$2(RecoverySourceHandler.java:219)
09:25:14    > 	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:105)
09:25:14    > 	at org.elasticsearch.common.util.CancellableThreads.execute(CancellableThreads.java:86)
09:25:14    > 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.runUnderPrimaryPermit(RecoverySourceHandler.java:210)
09:25:14    > 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
09:25:14    > 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98)
09:25:14    > 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50)
09:25:14    > 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107)
09:25:14    > 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104)
09:25:14    > 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30)
09:25:14    > 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
09:25:14    > 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1526)
09:25:14    > 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
09:25:14    > 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
09:25:14    > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
09:25:14    > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
09:25:14    > 	at java.lang.Thread.run(Thread.java:748)```

The text was updated successfully, but these errors were encountered:

jasontedor · 2017-09-25T18:01:20Z

Closed by #26776

s1monw added blocker >test-failure Triaged test failures from CI v6.0.0-rc1 labels Sep 25, 2017

s1monw assigned bleskes and jasontedor Sep 25, 2017

jasontedor closed this as completed Sep 25, 2017

colings86 added :Core/Infra/Core Core issues without another label >test Issues or PRs that are addressing/adding tests labels Sep 26, 2017

jakelandis mentioned this issue Apr 15, 2019

[CI] CorruptedFileIT.testCorruptFileThenSnapshotAndRestore test failure #41201

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST] org.elasticsearch.index.store.CorruptedFileIT failure on 6.0 #26773

[TEST] org.elasticsearch.index.store.CorruptedFileIT failure on 6.0 #26773

s1monw commented Sep 25, 2017

jasontedor commented Sep 25, 2017

[TEST] org.elasticsearch.index.store.CorruptedFileIT failure on 6.0 #26773

[TEST] org.elasticsearch.index.store.CorruptedFileIT failure on 6.0 #26773

Comments

s1monw commented Sep 25, 2017

jasontedor commented Sep 25, 2017