Apply deadlocks in spite of timeout #145

hasKeef · 2016-07-28T19:31:27Z

I admittedly didn't check closed issues to see if this was already opened somewhere

but at:
https://github.com/hashicorp/raft/blob/master/raft.go#L306

The timeout is only honored for enqueuing the request to write a log. In my case, I wrote a custom transport that seems to have issues since there's a lot of undocumented side things that transports need to consider. But luckily it's open source so I could figure it out.

But in real life, I can simulate this by taking down half of the cluster just as Apply is being called. Then a log will not reach quorum but rather than timing out, the whole future is just deadlocked. The timeout isn't forwarded on and used in the Error() call. And since the future can never reach quorum (for example, it was constructed when there were 8 nodes but now there are 3) the Error() call will just hang forever. Even if the nodes are restored, and we're back up to 8 nodes, it'll still be deadlocked because it won't retry those failed applies.

So it'd be good if the deferError also took a timeout and honored that while waiting on the channel, returning an appropriate timeout error if it expires.

ongardie · 2016-08-12T21:13:28Z

Hi @hasKeef,

I think there's two issues here:

The timeouts as defined are sort of lacking and misleading.
The scenario you describe where the future will never return.

For this second one, does the issue still apply in the issue-84-integration branch (see issue #84)? We rewrote all the membership change stuff there, so maybe it's already fixed.

stale · 2019-06-06T16:04:43Z

Hey there,
We wanted to check in on this request since it has been inactive for at least 90 days.
Have you reviewed the latest godocs?
If you think this is still an important issue in the latest version of the Raft library or
its documentation please feel let us know and we'll keep it open for investigation.
If there is still no activity on this request in 30 days, we will go ahead and close it.
Thank you!

stale · 2019-07-06T17:02:38Z

Hey there, This issue has been automatically closed because there hasn't been any activity for a while. If you are still experiencing problems, or still have questions, feel free to open a new one :+1

stale bot added the waiting-reply label Jun 6, 2019

stale bot closed this as completed Jul 6, 2019

tylertreat mentioned this issue Mar 20, 2022

Apply future deadlocks if quorum cannot be reached #498

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply deadlocks in spite of timeout #145

Apply deadlocks in spite of timeout #145

hasKeef commented Jul 28, 2016

ongardie commented Aug 12, 2016 •

edited

Loading

stale bot commented Jun 6, 2019

stale bot commented Jul 6, 2019

Apply deadlocks in spite of timeout #145

Apply deadlocks in spite of timeout #145

Comments

hasKeef commented Jul 28, 2016

ongardie commented Aug 12, 2016 • edited Loading

stale bot commented Jun 6, 2019

stale bot commented Jul 6, 2019

ongardie commented Aug 12, 2016 •

edited

Loading