Too long Raft.Stats call because of not stable in time AppendEntries.StoreLogs routine #302

maksm90 · 2018-12-26T10:07:32Z

I have encountered a problem on cloud virtualized storage that the rpc Raft.Stats called from consul may periodically stall. As a consequence consul leader emits to log messages about not healthy followers at the same time leaving the cluster is safe.

The primary investigation has revealed that Raft.Stats stalls on getting configuration of raft node (ConfigurationFuture wrapper) that deals with follower loop (through configurationsCh channel inside runFollower routine). AFAIC each request to Follower including heartbeats and raft RPCs are handled sequentially. The output of RPC time metrics have shown that the appendEntries rpc or, more precisely storeLogs stage has wide spread of latency. It happens because of not stable synchronization of logs latency to persistent storage (fdatasync syscall in BoltDB storage backend). And separate measurement of fdatasync latency confirmed this hypothesis.

It's a problem of cloud provider. But such issue exposes the looseness of architecture - lightweight read requests to Follower have to wait block ones. What about not blocking reads based on snapshot before commit of logs (such as in MVCC scheme)? Is it possible and could be implemented?

The text was updated successfully, but these errors were encountered:

catsby · 2019-09-20T16:23:45Z

Related issue:

Stats calls take arbitrarily long when raft is being used heavily. #356

mkeeler added the enhancement label Jan 23, 2019

catsby mentioned this issue Sep 20, 2019

Stats calls take arbitrarily long when raft is being used heavily. #356

Closed

hanshasselberg mentioned this issue Jan 7, 2020

Read latest configuration independently from main loop #379

Merged

hanshasselberg closed this as completed in #379 Jan 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too long Raft.Stats call because of not stable in time AppendEntries.StoreLogs routine #302

Too long Raft.Stats call because of not stable in time AppendEntries.StoreLogs routine #302

maksm90 commented Dec 26, 2018

catsby commented Sep 20, 2019

Too long Raft.Stats call because of not stable in time AppendEntries.StoreLogs routine #302

Too long Raft.Stats call because of not stable in time AppendEntries.StoreLogs routine #302

Comments

maksm90 commented Dec 26, 2018

catsby commented Sep 20, 2019