-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster gets hanged after setting "cluster.routing.allocation.node_concurrent_recoveries" up to 100. #36195
Comments
We have updated _cluster/settings rest level API to reject setting "cluster.routing.allocation.node_concurrent_recoveries" up to 50+ for workaround. |
Pinging @elastic/es-distributed |
shall we use |
@howardhuanghua thanks for reporting this. This is indeed an issue if the number of concurrent recoveries from a node are higher than the max size of the GENERIC thread pool (which is some value >=128, depending on the number of processors). That said, typically you should not have so many shards per node, and allowing such a high number of |
@ywelsch, thanks for your comment. Currently, we limit node_concurrent_recoveries setting <=50 in our product environment version based on 6.4.3 as follow,
Please give us some suggestions if you have, thanks a lot. |
Today we block using the generic threadpool on the target side until the source side has fully executed the recovery. We still block on the source side executing the recovery in a blocking fashion but there is no reason to block on the target side. This will release generic threads early if there are many concurrent recoveries happen. Relates to elastic#36195
Today we block using the generic thread-pool on the target side until the source side has fully executed the recovery. We still block on the source side executing the recovery in a blocking fashion but there is no reason to block on the target side. This will release generic threads early if there are many concurrent recoveries happen. Relates to #36195
Today we block using the generic thread-pool on the target side until the source side has fully executed the recovery. We still block on the source side executing the recovery in a blocking fashion but there is no reason to block on the target side. This will release generic threads early if there are many concurrent recoveries happen. Relates to #36195
Today a peer-recovery may run into a deadlock if the value of node_concurrent_recoveries is too high. This happens because the peer-recovery is executed in a blocking fashion. This commit attempts to make the recovery source partially non-blocking. I will make three follow-ups to make it fully non-blocking: (1) send translog operations, (2) primary relocation, (3) send commit files. Relates #36195
Today a peer-recovery may run into a deadlock if the value of node_concurrent_recoveries is too high. This happens because the peer-recovery is executed in a blocking fashion. This commit attempts to make the recovery source partially non-blocking. I will make three follow-ups to make it fully non-blocking: (1) send translog operations, (2) primary relocation, (3) send commit files. Relates #36195
Peer recovery is now non-blocking on both sides (except the relocation handoff step). I am closing this issue as making the handoff step async is optional. @howardhuanghua Thank you for reporting this. |
Elasticsearch version: 6.4.3/6.5.1
JVM version: 1.8.0.181
OS version: CentOS 7.4
Description of the problem including expected versus actual behavior:
Product environment: 15 nodes, 2700+ indices, 15000+ shards.
Cluster gets hanged after setting "cluster.routing.allocation.node_concurrent_recoveries" up to 100.
Steps to reproduce:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": 100,
"indices.recovery.max_bytes_per_sec": "400mb"
}
}'
We could see each node's generic thread pool used up to 128 which is full.
[c_log@VM_128_27_centos ~/elasticsearch-6.4.3/bin]$ curl localhost:9200/_cat/thread_pool/generic?v
node_name name active queue rejected
node-3 generic 128 949 0
node-2 generic 128 1093 0
node-1 generic 128 1076 0
Lot's of peer recoveries are waiting:
Jstack output for hanged node, all generic threads are waiting on txGet:
"elasticsearch[node-3][generic][T#128]" #179 daemon prio=5 os_prio=0 tid=0x00007fa8980c8800 nid=0x3cb9 waiting on condition [0x00007fa86ca0a000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000fbef56f0> (a org.elasticsearch.common.util.concurrent.BaseFuture$Sync) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source) at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:251) at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:94) at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:44) at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:32) at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.receiveFileInfo(RemoteRecoveryTargetHandler.java:133) at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$phase1$6(RecoverySourceHandler.java:387) at org.elasticsearch.indices.recovery.RecoverySourceHandler$$Lambda$3071/1370938617.run(Unknown Source) at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:105) at org.elasticsearch.common.util.CancellableThreads.execute(CancellableThreads.java:86) at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:386) at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309) at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
So, cluster should get hanged in distributed deadlocks.
Thanks,
Howard
The text was updated successfully, but these errors were encountered: