Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout between service calls over 5000000000ns and publisher stops working #256

Closed
jiri-dev opened this issue Jul 6, 2016 · 2 comments
Closed

Comments

@jiri-dev
Copy link

jiri-dev commented Jul 6, 2016

Hi,
first of all, thank you for providing Aeron. We ran occasionaly into an issue where our publisher app is suspended for more than 5secs by GC collector which causes Aeron 0.9.5 to report following error around 5 million times:

io.aeron.exceptions.ConductorServiceTimeoutException: Timeout between service calls over 5000000000ns
at io.aeron.ClientConductor.onCheckTimeouts(ClientConductor.java:356)
at io.aeron.ClientConductor.doWork(ClientConductor.java:312)
at io.aeron.ClientConductor.doWorkUntil(ClientConductor.java:329)
at io.aeron.ClientConductor.releasePublication(ClientConductor.java:157)
at io.aeron.Publication.release(Publication.java:212)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at io.aeron.ActivePublications.close(ActivePublications.java:75)
at io.aeron.ClientConductor.onClose(ClientConductor.java:108)
at io.aeron.ClientConductor.onCheckTimeouts(ClientConductor.java:353)
at io.aeron.ClientConductor.doWork(ClientConductor.java:312)
at io.aeron.ClientConductor.doWork(ClientConductor.java:118)
at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:106)
at java.lang.Thread.run(Thread.java:745)

and then we get this error:

io.aeron.exceptions.DriverTimeoutException: No response from driver within timeout
at io.aeron.ClientConductor.doWorkUntil(ClientConductor.java:343)
at io.aeron.ClientConductor.releasePublication(ClientConductor.java:157)
at io.aeron.Publication.release(Publication.java:212)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at io.aeron.ActivePublications.close(ActivePublications.java:75)
at io.aeron.ClientConductor.onClose(ClientConductor.java:108)
at io.aeron.ClientConductor.onCheckTimeouts(ClientConductor.java:353)
at io.aeron.ClientConductor.doWork(ClientConductor.java:312)
at io.aeron.ClientConductor.doWork(ClientConductor.java:118)
at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:106)
at java.lang.Thread.run(Thread.java:745)

We print those from the error handler that we set on the Aeron.Context. When these errors occur, the publisher will never start working again and we need to restart our app. We use the embedded media driver.
Could you please advise how to proceed in such case? (E.g. whether somehow relaunch the media driver etc. so that the app could work again without restart.)
We tried also Aeron 0.9.9 but it behaves the same.
Thank you.
Jiri

@jiri-dev
Copy link
Author

jiri-dev commented Jul 6, 2016

Forgot to mention that to we run our app on Windows and to reproduce this we used Process Explorer (from SysInterals) to suspend the publisher app for around 7secs and then resumed it and then we got the issue. We also run it on Linux where we experienced the issue with GC causing the pause.

@mjpt777
Copy link
Contributor

mjpt777 commented Jul 6, 2016

The timeouts are use to detect dead clients. If your application is experiencing such delays then you may want to better configure it for low-latency. While you are improving your application and GC settings you can increase the timeouts.

https://github.com/real-logic/Aeron/wiki/Configuration-Options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants