Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38256

Closed
cbuescher opened this issue Feb 2, 2019 · 4 comments
Closed

[CI] DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex fails #38256

cbuescher opened this issue Feb 2, 2019 · 4 comments
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.0.0-beta1

Comments

@cbuescher
Copy link
Member

Build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/349/console

Using this reproduce line repeatedly, the test usually fails after a few tries locally (not every time though, but quite soon when running in a loop)

./gradlew :server:integTest \
  -Dtests.seed=B2F3F6260A9D92DB \
  -Dtests.class=org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT \
  -Dtests.method="testRestoreShrinkIndex" \
  -Dtests.security.manager=true \
  -Dtests.locale=sr-Latn-ME \
  -Dtests.timezone=America/Guadeloupe \
  -Dcompiler.java=11 \
  -Druntime.java=8

The logs show several ClassCastExceptions like this:

12:58:03   1> [2019-02-02T07:55:51,306][WARN ][o.e.s.SnapshotShardsService] [node_t1] [test-repo:test-snap-1/wuZ3MOnpQw6EbCxig7umvw] [ShardSnapshotStatus[state=SUCCESS, nodeId=GC6yybZDQsyUP9FCZGZGkw, reason=null]] failed to update snapshot state
12:58:03   1> org.elasticsearch.transport.RemoteTransportException: [node_t1][127.0.0.1:42747][internal:cluster/snapshot/update_snapshot_status]
12:58:03   1> Caused by: org.elasticsearch.transport.ResponseHandlerFailureTransportException: java.lang.ClassCastException: org.elasticsearch.snapshots.SnapshotShardsService$UpdateIndexShardSnapshotStatusResponse cannot be cast to org.elasticsearch.transport.TransportResponse$Empty
12:58:03   1> Caused by: java.lang.ClassCastException: org.elasticsearch.snapshots.SnapshotShardsService$UpdateIndexShardSnapshotStatusResponse cannot be cast to org.elasticsearch.transport.TransportResponse$Empty
12:58:03   1> 	at org.elasticsearch.snapshots.SnapshotShardsService$3.handleResponse(SnapshotShardsService.java:511) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1108) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1189) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1169) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:47) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:30) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.onResponse(TransportMasterNodeAction.java:191) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.onResponse(TransportMasterNodeAction.java:188) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.snapshots.SnapshotShardsService$4.clusterStateProcessed(SnapshotShardsService.java:546) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$SafeClusterStateTaskListener.clusterStateProcessed(MasterService.java:520) ~[main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.lambda$processedDifferentClusterState$1(MasterService.java:407) ~[main/:?]
12:58:03   1> 	at java.util.ArrayList.forEach(ArrayList.java:1257) [?:1.8.0_202]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.processedDifferentClusterState(MasterService.java:407) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService.onPublicationSuccess(MasterService.java:264) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:257) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [main/:?]
12:58:03   1> 	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [main/:?]
12:58:03   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [main/:?]
12:58:03   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [main/:?]
12:58:03   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [main/:?]
12:58:03   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
12:58:03   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
12:58:03   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]

Followed by a MasterNotDiscoveredException, possibly as a consequence of the former.

> Throwable #1: MasterNotDiscoveredException[null]
   >    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249)
   >    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:549)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
   >    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >    at java.lang.Thread.run(Thread.java:748)Throwable #2: MasterNotDiscoveredException[null]
   >    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322)
   >    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249)
   >    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:549)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
   >    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >    at java.lang.Thread.run(Thread.java:748)
@cbuescher cbuescher added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.0.0 labels Feb 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@cbuescher
Copy link
Member Author

Muted with 50cdc61

@dnhatn
Copy link
Member

dnhatn commented Feb 2, 2019

Relates to #38256

@original-brownbear
Copy link
Member

Fix incoming in #38264

original-brownbear added a commit that referenced this issue Feb 3, 2019
* Fix Incorrect Transport Response Handler Type
* The response type here is not empty and was always wrong but this only became visible now that 0a604e3 was introduced
   * As a result of 0a604e3 we started actually handling the response
of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler
* fix busy assert not handling `Exception`
* Closes #38226
* Closes #38256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI v7.0.0-beta1
Projects
None yet
Development

No branches or pull requests

5 participants