Failed async BookKeeper writes should cause peer to to restart #390

lbradstreet · 2015-11-15T09:50:37Z

See https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/state/log/bookkeeper.clj#L67

If a given write has failed, then the task's local state is no longer going to be in sync with the played back log, and now new log entries will be written. The peer should either rollback to the old state and create a new ledger, or should suicide, causing a new peer replay the state and start writing to a new ledger. The unacked messages will then be replayed.

I suggest we do the second first, then create a new issue to implement the first at some point. I believe implementing the first is worthwhile because in case of a partition we may not want all the grouping peers to restart at the same time, and would rather them attempt to recover. It may be tricky to do so however.

lbradstreet · 2015-11-27T12:07:48Z

Given #410, maybe we should only restart if the write failed and we're still writing to the same ledger as the original write.

lbradstreet · 2016-01-20T19:47:59Z

Confirmed to be an issue by jepsen.

Also closes #500 by improving performance of write-take-batch

lbradstreet · 2016-01-25T15:39:59Z

Fixed in 4d3684e.

Also closes #500 by improving performance of write-take-batch

lbradstreet added bug state/windowing labels Nov 15, 2015

lbradstreet modified the milestones: 0.8.1, 0.8.2 Nov 15, 2015

lbradstreet mentioned this issue Nov 27, 2015

BookKeeper log compaction may cause async writes to fail #410

Closed

lbradstreet modified the milestones: 0.8.4, 0.8.3 Dec 7, 2015

lbradstreet removed this from the 0.8.4 milestone Jan 14, 2016

lbradstreet added the jepsen label Jan 20, 2016

lbradstreet added a commit that referenced this issue Jan 25, 2016

Failed BookKeeper writes now reboot the peer #390

4d3684e

Also closes #500 by improving performance of write-take-batch

lbradstreet closed this as completed Jan 25, 2016

lbradstreet added a commit that referenced this issue Jan 26, 2016

Failed BookKeeper writes now reboot the peer #390

6694e2a

Also closes #500 by improving performance of write-take-batch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed async BookKeeper writes should cause peer to to restart #390

Failed async BookKeeper writes should cause peer to to restart #390

lbradstreet commented Nov 15, 2015

lbradstreet commented Nov 27, 2015

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 25, 2016

Failed async BookKeeper writes should cause peer to to restart #390

Failed async BookKeeper writes should cause peer to to restart #390

Comments

lbradstreet commented Nov 15, 2015

lbradstreet commented Nov 27, 2015

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 25, 2016