[CI] DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38256

cbuescher · 2019-02-02T12:26:58Z

Build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/349/console

Using this reproduce line repeatedly, the test usually fails after a few tries locally (not every time though, but quite soon when running in a loop)

./gradlew :server:integTest \
  -Dtests.seed=B2F3F6260A9D92DB \
  -Dtests.class=org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT \
  -Dtests.method="testRestoreShrinkIndex" \
  -Dtests.security.manager=true \
  -Dtests.locale=sr-Latn-ME \
  -Dtests.timezone=America/Guadeloupe \
  -Dcompiler.java=11 \
  -Druntime.java=8

The logs show several ClassCastExceptions like this:

12:58:03   1> [2019-02-02T07:55:51,306][WARN ][o.e.s.SnapshotShardsService] [node_t1] [test-repo:test-snap-1/wuZ3MOnpQw6EbCxig7umvw] [ShardSnapshotStatus[state=SUCCESS, nodeId=GC6yybZDQsyUP9FCZGZGkw, reason=null]] failed to update snapshot state
12:58:03   1> org.elasticsearch.transport.RemoteTransportException: [node_t1][127.0.0.1:42747][internal:cluster/snapshot/update_snapshot_status]
12:58:03   1> Caused by: org.elasticsearch.transport.ResponseHandlerFailureTransportException: java.lang.ClassCastException: org.elasticsearch.snapshots.SnapshotShardsService$UpdateIndexShardSnapshotStatusResponse cannot be cast to org.elasticsearch.transport.TransportResponse$Empty
12:58:03   1> Caused by: java.lang.ClassCastException: org.elasticsearch.snapshots.SnapshotShardsService$UpdateIndexShardSnapshotStatusResponse cannot be cast to org.elasticsearch.transport.TransportResponse$Empty
12:58:03   1> 	at org.elasticsearch.snapshots.SnapshotShardsService$3.handleResponse(SnapshotShardsService.java:511) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1108) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1189) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1169) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:47) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:30) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.onResponse(TransportMasterNodeAction.java:191) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.onResponse(TransportMasterNodeAction.java:188) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.snapshots.SnapshotShardsService$4.clusterStateProcessed(SnapshotShardsService.java:546) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$SafeClusterStateTaskListener.clusterStateProcessed(MasterService.java:520) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.lambda$processedDifferentClusterState$1(MasterService.java:407) ~[main/:?]
12:58:03   1> 	at java.util.ArrayList.forEach(ArrayList.java:1257) [?:1.8.0_202]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.processedDifferentClusterState(MasterService.java:407) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService.onPublicationSuccess(MasterService.java:264) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:257) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [main/:?]
12:58:03   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [main/:?]
12:58:03   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [main/:?]
12:58:03   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [main/:?]
12:58:03   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
12:58:03   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
12:58:03   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]

Followed by a MasterNotDiscoveredException, possibly as a consequence of the former.

> Throwable #1: MasterNotDiscoveredException[null]
   >    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249)
   >    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:549)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
   >    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >    at java.lang.Thread.run(Thread.java:748)Throwable #2: MasterNotDiscoveredException[null]
   >    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249)
   >    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:549)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
   >    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >    at java.lang.Thread.run(Thread.java:748)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-02-02T12:26:59Z

Pinging @elastic/es-distributed

cbuescher · 2019-02-02T12:46:51Z

Muted with 50cdc61

dnhatn · 2019-02-02T14:17:44Z

Relates to #38256

original-brownbear · 2019-02-02T18:59:43Z

Fix incoming in #38264

* Fix Incorrect Transport Response Handler Type * The response type here is not empty and was always wrong but this only became visible now that 0a604e3 was introduced * As a result of 0a604e3 we started actually handling the response of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler * fix busy assert not handling `Exception` * Closes #38226 * Closes #38256

cbuescher added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.0.0 labels Feb 2, 2019

cbuescher mentioned this issue Feb 2, 2019

Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex #38257

Merged

dnhatn assigned original-brownbear Feb 2, 2019

original-brownbear mentioned this issue Feb 2, 2019

Fix Incorrect Transport Response Handler Type #38264

Merged

original-brownbear closed this as completed in #38264 Feb 3, 2019

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

DaveCTurner mentioned this issue Feb 13, 2019

DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38256

[CI] DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38256

cbuescher commented Feb 2, 2019

elasticmachine commented Feb 2, 2019

cbuescher commented Feb 2, 2019

dnhatn commented Feb 2, 2019

original-brownbear commented Feb 2, 2019

[CI] DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38256

[CI] DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38256

Comments

cbuescher commented Feb 2, 2019

elasticmachine commented Feb 2, 2019

cbuescher commented Feb 2, 2019

dnhatn commented Feb 2, 2019

original-brownbear commented Feb 2, 2019