Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

ConductorServiceTimeoutException #681

Closed
neverfox opened this issue Nov 21, 2016 · 6 comments
Closed

ConductorServiceTimeoutException #681

neverfox opened this issue Nov 21, 2016 · 6 comments

Comments

@neverfox
Copy link

We have a 24/7 job that listens to Datomic logs for new segments. It sometimes goes for a while without any work. Twice now, after long periods of idle time (on the order of days), the following has happened to one of our peer groups when work finally resumed, i.e. constantly spitting out in the logs:

16-11-21 18:17:31 onyx-datomic-902846474-4bh91 WARN [onyx.messaging.aeron:95] -
                                  java.lang.Thread.run           Thread.java: 745
    uk.co.real_logic.agrona.concurrent.AgentRunner.run      AgentRunner.java: 105
         uk.co.real_logic.aeron.ClientConductor.doWork  ClientConductor.java: 113
         uk.co.real_logic.aeron.ClientConductor.doWork  ClientConductor.java: 293
uk.co.real_logic.aeron.ClientConductor.onCheckTimeouts  ClientConductor.java: 338
uk.co.real_logic.aeron.exceptions.ConductorServiceTimeoutException: Timeout between service calls over 5000000000ns
@lbradstreet
Copy link
Member

@neverfox this should be fixed by 0.9.14. Please let me know if you see anything similar when you're on 0.9.14.

@lbradstreet
Copy link
Member

Closing until we hear of this re-occuring.

@neverfox
Copy link
Author

neverfox commented Jan 3, 2017

This is still happening with 0.9.15 and from what I can tell it's when an always-running job receives new work to do after a long hiatus (days/weeks) of doing nothing.

@lbradstreet
Copy link
Member

lbradstreet commented Jan 3, 2017 via email

@mariusz-jachimowicz-83
Copy link
Contributor

mariusz-jachimowicz-83 commented Jan 4, 2017

After reading real-logic/aeron#256 it seems that Aeron uses this timeout to set publication as dead after 5s of inactivity. Solution might be increase this timeout - https://github.com/real-logic/Aeron/wiki/Configuration-Options see Publication Connection Timeout.

@thenonameguy
Copy link
Contributor

thenonameguy commented Jan 17, 2019

This occurs to our code as well. Our job runs fine for 8-15 minutes then all of the tasks fail with this:

io.aeron.exceptions.ConductorServiceTimeoutException: Exceeded (ns): 5000000000
                          clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task lifecycle :lifecycle/write-batch. Killing the job. -> Exception type: io.aeron.exceptions.ConductorServiceTimeoutException. Exception message: Exceeded (ns): 5000000000

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants