handle-exception lifecycle doesn't appear to apply to write-batch #491

lbradstreet · 2016-01-20T13:14:44Z

Found by jepsen. I switched out :onyx/restart-pred-fn for a restart lifecycle and the job still seems to be being killed under certain scenarios. The main one I noticed is in write-batch. I'll look into this.

lbradstreet · 2016-01-20T13:20:15Z

Yup, write-batch isn't handled like read-batch or the others. https://github.com/onyx-platform/onyx/blob/develop/src/onyx/peer/task_lifecycle.clj#L284

lbradstreet · 2016-01-20T13:23:44Z

Hmm, I guess these should already be covered by after-batch which was placed in the task-lifecycle.

lbradstreet · 2016-01-20T13:25:26Z

Ah, it isn't covered by that because it doesn't invoke those task lifecycle calls via restartable-invocation. So that's the overall issue.

lbradstreet · 2016-01-25T12:31:18Z

@MichaelDrogalis this would be a good one to take if you have some time.

Basically, I'm not seeing handle-exception be able to handle an exception thrown in write-batch.

lbradstreet · 2016-01-27T15:13:54Z

PR #505 successfully passes Jepsen tests without :onyx/restart-pred-fn.

I think there might be additional task-lifecycle stages where we should allow restarts though. For example, build-new-segments and flow-retry-segments. I think build-new-segments isn't particularly risky, however I think that when a user really wants a job to be up no matter what, then we should cover all stages of the lifecycle.

MichaelDrogalis · 2016-01-27T15:25:07Z

I remember that we discussed this a while back and decided not to restart on exceptions thrown by internal Onyx code - e.g. nothing a user could have written. If we have a bug in Onyx, it should crash hard, because it's unlikely that there's hope of recovery for doing the right thing. I think assign-windows falls under this category.

lbradstreet · 2016-01-27T15:34:54Z

I think the distinction should be whether the function is pure or not. If we're doing stateful things, they might be transitory / fixed if the peer is restarted.

I think assign-windows, and possibly flow-retry-segments falls in this category (especially because it calls emit-latency which uses monitoring).

MichaelDrogalis · 2016-01-27T15:51:17Z

Monitoring calls, even when they fail, should never crash the job. We need to be tolerant of a failure here and there to emit metrics.

I still think that even if it's a transient, stateful issue, and it's Onyx's fault, restarting gives the user a false sense of security when something wrong is lurking.

MichaelDrogalis · 2016-01-28T15:58:12Z

Merged.

lbradstreet added the bug label Jan 20, 2016

lbradstreet self-assigned this Jan 20, 2016

lbradstreet added this to the 0.8.5 milestone Jan 20, 2016

lbradstreet mentioned this issue Jan 20, 2016

Deprecate :onyx/restart-pred-fn #485

Closed

lbradstreet assigned MichaelDrogalis and unassigned lbradstreet Jan 25, 2016

lbradstreet modified the milestones: 0.8.6, 0.8.5 Jan 25, 2016

MichaelDrogalis mentioned this issue Jan 26, 2016

Lifecycle exceptions for read/write batch #505

Merged

MichaelDrogalis closed this as completed Jan 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle-exception lifecycle doesn't appear to apply to write-batch #491

handle-exception lifecycle doesn't appear to apply to write-batch #491

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 25, 2016

lbradstreet commented Jan 27, 2016

MichaelDrogalis commented Jan 27, 2016

lbradstreet commented Jan 27, 2016

MichaelDrogalis commented Jan 27, 2016

MichaelDrogalis commented Jan 28, 2016

handle-exception lifecycle doesn't appear to apply to write-batch #491

handle-exception lifecycle doesn't appear to apply to write-batch #491

Comments

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 20, 2016

lbradstreet commented Jan 25, 2016

lbradstreet commented Jan 27, 2016

MichaelDrogalis commented Jan 27, 2016

lbradstreet commented Jan 27, 2016

MichaelDrogalis commented Jan 27, 2016

MichaelDrogalis commented Jan 28, 2016