You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.
Found by jepsen partitioning on a long running test.
So, what I think is happening is a peer fails, starts back up, throws an exception on startup (see exception below) when starting the log component. Note, the exception at the end is an error printing a future, which may be a aviso/pretty issue, but I don't think is relevant.
As a result of the exception during startup, the peer-lifecycle drops through to the fatal, and the peer drops off the cluster.
(defn ^{:no-doctrue} peer-lifecycle [started-peer config shutdown-ch ack-ch]
(try
(loop [live @started-peer]
(let [restart-ch (:restart-ch (:virtual-peer live))
[v ch] (alts!! [shutdown-ch restart-ch] :prioritytrue)]
(cond (= ch shutdown-ch)
(do (component/stop live)
(reset! started-peer nil)
(>!! ack-ch true))
(= ch restart-ch)
(do (component/stop live)
(Thread/sleep (or (:onyx.peer/retry-start-interval config) 2000))
(let [live (component/start live)]
(reset! started-peer live)
(recur live)))
:else (throw (ex-info"Read from a channel with no response implementation" {})))))
(catch Throwable e
(fatal"Peer lifecycle threw an exception")
(fatal e))))
Found by jepsen partitioning on a long running test.
So, what I think is happening is a peer fails, starts back up, throws an exception on startup (see exception below) when starting the log component. Note, the exception at the end is an error printing a future, which may be a aviso/pretty issue, but I don't think is relevant.
As a result of the exception during startup, the peer-lifecycle drops through to the fatal, and the peer drops off the cluster.
The text was updated successfully, but these errors were encountered: