You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 10, 2022. It is now read-only.
2019/9/11 17:00. Our SRE stopped one instance of replica-server in our staging environment
trying to simulate the problem java-client can't recover.
2019/9/11 17:00. Some of our clients recovered right away while replica-server restarted, but some couldn't reconnect and kept retrying with ERR_TIMEOUT error.
2019-09-11 17:12:48,659 ERROR [nioEventLoopGroup-4-1] [com.xiaomi.message.mixin.fsm.ruleset.SaveAckRuleSet$3$1.operationComplete(SaveAckRuleSet.java:169)] - SaveB2CAckInfo fail, appId=2882303761517479657 msgId=sms-1c33f956-7dad-4327-8986-aa400cb4a8b2, ex:
com.xiaomi.infra.pegasus.client.PException: com.xiaomi.infra.pegasus.rpc.ReplicationException: ERR_TIMEOUT
at com.xiaomi.infra.pegasus.client.PegasusTable$8.onCompletion(PegasusTable.java:376)
at com.xiaomi.infra.pegasus.rpc.async.ClientRequestRound.thisRoundCompletion(ClientRequestRound.java:51)
at com.xiaomi.infra.pegasus.rpc.async.TableHandler.tryDelayCall(TableHandler.java:314)
at com.xiaomi.infra.pegasus.rpc.async.TableHandler.onRpcReply(TableHandler.java:295)
at com.xiaomi.infra.pegasus.rpc.async.TableHandler$3.run(TableHandler.java:326)
at com.xiaomi.infra.pegasus.rpc.async.ReplicaSession.tryNotifyWithSequenceID(ReplicaSession.java:226)
at com.xiaomi.infra.pegasus.rpc.async.ReplicaSession.access$300(ReplicaSession.java:32)
at com.xiaomi.infra.pegasus.rpc.async.ReplicaSession$4.run(ReplicaSession.java:270)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:123)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.xiaomi.infra.pegasus.rpc.ReplicationException: ERR_TIMEOUT
... 15 more
2019-09-11 17:12:48, the errors remained, we stopped the test.
Client Version
1.11.5-thrift-0.11.0-inlined-release
The text was updated successfully, but these errors were encountered:
2019-09-11 17:01:18,424 WARN ReplicaSession.tryNotifyWithSequenceID: actively close the session because it's not responding for 10 seconds
...
2019-09-11 17:12:39,607 WARN ReplicaSession.tryNotifyWithSequenceID: actively close the session because it's not responding for 10 seconds
Bugfix #32 was working in the situation, but the session didn't close. The last ERR_SESSION_RESET is at 17:01.
2019-09-11 17:01:06,910 WARN TableHandler.onRpcReply: replica server(rpc_address(10.38.161.207:32801)) doesn't serve gpid(gpid(13.3)), operator(com.xiaomi.infra.pegasus.operator.rrdb_multi_put_operator@41e12d6f), try(3), error_code(ERR_SESSION_RESET), need query meta
The root cause is that the connection was not even established when it tried to close the session.
2019-09-11 17:12:19,575 WARN ReplicaSession.tryNotifyWithSequenceID: actively close the session because it's not responding for 10 seconds
2019-09-11 17:12:19,575 INFO ReplicaSession.closeSession: channel rpc_address(10.38.161.207:32801) not connected, skip the close
The client repeatedly tried to reconnect to this server, but it didn't succeed.
2019-09-11 17:01:05,830 WARN ReplicaSession$2.operationComplete(ReplicaSession.java:153) - rpc_address(10.38.161.207:32801): try to connect to target failed
...
2019-09-11 17:01:06,909 WARN ReplicaSession$2.operationComplete(ReplicaSession.java:153) - rpc_address(10.38.161.207:32801): try to connect to target failed
2019/9/11 17:00. Our SRE stopped one instance of replica-server in our staging environment
trying to simulate the problem java-client can't recover.
2019/9/11 17:00. Some of our clients recovered right away while replica-server restarted, but some couldn't reconnect and kept retrying with ERR_TIMEOUT error.
Client Version
1.11.5-thrift-0.11.0-inlined-release
The text was updated successfully, but these errors were encountered: