You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ES_VERSION: 7.3.1 JVM version : JDK1.8.0_112 OS version : linux Description of the problem including expected versus actual behavior:
At first, I need pre-create 320 of indexes on the cluster (8 data nodes) to reduce the pressure to create shards, the number of shards are 3200(1600 shards are primary shards, 1600 are replicas shards), I want the primary shards to be allocated firstly, so I set the settings of cluster like this:
The reason why I set the value so high is that I want the replicas shards to be allocated as soon as possible, and no data on the shards, so there are no pressure on the cluster.
What I except is that the 1600 replicas shards should be allocated successfully, but the result is the 1600 replicas shards are always in initializing state, no matter how long I wait, there has no change, It seems that initialzing of the 1600 replicas shards is blocked. Explain:
The replicas shard is allocated by recovering from peer shard(primary shard), And it uses generic threadpool to create thread to do the work: the thread initializes local variables and sends request to inform the peer primary shard.
The primary shard receives the request and also uses the generic threadpool to create thread to transfer data. but the generic threadpool can create at most 128 threads in my case. This is the problem.
When I set node_concurrent_recoveries to be 1000, every data node use generic threadpool to create 128 threads to initial replicas shards at same time, and send requests to the target node of primary shards. When peer primary shard receives the request, it attampts to creat new thread by generic threadpool to transfer data, but generic threadpool on each target date node has no ability to continue to create new thread, as the result, the request to create thread is put to the queue of threadpool on data node of primary shard. It forms a dead cycle, every thread of recovery is blocked by requesting data from target peer primary data node.
"elasticsearch[node1][generic][T#140]" #276 daemon prio=5 os_prio=0 tid=0x00007f99dc041000 nid=0x2d7f waiting on condition [0x00007f981d3d2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000180944970> (a org.elasticsearch.common.util.concurrent.BaseFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:248)
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:91)
at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45)
at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:33)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$1(PeerRecoveryTargetService.java:229)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$$Lambda$1622/173120083.run(Unknown Source)
at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:105)
at org.elasticsearch.common.util.CancellableThreads.execute(CancellableThreads.java:86)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:222)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:73)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:556)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Sulotion:
In my opinion, should we divide this recovery of thread into two types threadpool? Replicas shard uses one threadpool to create thread initializing local variables and sending request to inform the peer primary shard. Primary shard uses another threadpool to create thread transfering data to replicas shard.
type primary.
We can also add timeout about the request of tansfering data to solve the problem.
The text was updated successfully, but these errors were encountered:
ES_VERSION: 7.3.1
JVM version : JDK1.8.0_112
OS version : linux
Description of the problem including expected versus actual behavior:
of cause, 1600 primary shards are allocated successfully, 1600 replicas shards are unallocated, and no data on those indices.
The reason why I set the value so high is that I want the replicas shards to be allocated as soon as possible, and no data on the shards, so there are no pressure on the cluster.
What I except is that the 1600 replicas shards should be allocated successfully, but the result is the 1600 replicas shards are always in initializing state, no matter how long I wait, there has no change, It seems that initialzing of the 1600 replicas shards is blocked.
Explain:
elasticsearch/core/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java
Line 140 in 688ecce
elasticsearch/core/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
Line 169 in 688ecce
node_concurrent_recoveries
to be 1000, every data node use generic threadpool to create 128 threads to initial replicas shards at same time, and send requests to the target node of primary shards. When peer primary shard receives the request, it attampts to creat new thread by generic threadpool to transfer data, but generic threadpool on each target date node has no ability to continue to create new thread, as the result, the request to create thread is put to the queue of threadpool on data node of primary shard. It forms a dead cycle, every thread of recovery is blocked by requesting data from target peer primary data node.Sulotion:
In my opinion, should we divide this recovery of thread into two types threadpool? Replicas shard uses one threadpool to create thread initializing local variables and sending request to inform the peer primary shard. Primary shard uses another threadpool to create thread transfering data to replicas shard.
type primary.
We can also add timeout about the request of tansfering data to solve the problem.
The text was updated successfully, but these errors were encountered: