-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Reconnect #1135
Conversation
Attempt to solve problems related to disconnect/reconnect for timeouts on responses or no-work. On PoolManager the IO_SERVICE is now persistent and never exits On Linux * Disconnect/Reconnect without issues for plain TCP connections * SSL Connections get shut down properly but at first reconnect attempt it goes timeout. After that reconnection is immediate. On Windows * Feedback needed.
Reconnects ok.
* Made boost's io_service global * No stops of io_service which prevent restart of resolver or sockets * On disconnects this behavior is implemented : mining is stopped only if switching pools (failovers) otherwise it keeps mining trying a fast reconnect on same pool * SSL and plain TCP reconnect properly on Linux Feed back needed on Windows.
Does not work on windows :(
|
Ok. This log makes sense to me. |
Works now, but consider adding some kind of delay for reconnecting:
|
Ok ... need some adjustments.
|
TCP and SSL WORK
|
Finally !!! I think this solves a good bunch of problems. |
Seems to improve resiliency for me across a modem reboot. +1 |
Tested on linux and windows, resiliency of process plain better, reiterate +1 |
{ | ||
// Post first deadline timer to give io_service | ||
// initial work | ||
m_io_work_timer.expires_from_now(boost::posix_time::seconds(60)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this fake job is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time io_service starts allocates it's own thread_pool.
When there are no more jobs in the poll queue it destroys all it's threads and recreation is very long time consuming. Also on Windows apparently it does not restart properly.
This fake job gives the poll queue always something to do so it never stops.
@@ -542,6 +584,7 @@ class MinerCLI | |||
if (m_minerType == MinerType::CUDA || m_minerType == MinerType::Mixed) | |||
CUDAMiner::listDevices(); | |||
#endif | |||
stop_io_service(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try return from function instead of exit()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not change the logic of functions not related to the scope of this pr.
Exit was there before: I only addedd the stop_io_service before exiting.
m_noEval, | ||
m_exit | ||
)) { | ||
stop_io_service(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stop_io_service()
seems to be a bad pattern. Is there a way to start the service only when needed?
Long term, I have better way of handling list of devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I start the io_service at MinerCLI constructor thus I need to stop it whenever MinerCLI exits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also ... aside from list-devices asio_service is always needed.
ethminer/MinerAux.h
Outdated
@@ -783,10 +833,12 @@ class MinerCLI | |||
sealers["cuda"] = Farm::SealerDescriptor{&CUDAMiner::instances, [](FarmFace& _farm, unsigned _index){ return new CUDAMiner(_farm, _index); }}; | |||
#endif | |||
|
|||
//EthStratumClient::pointer client = EthStratumClient::create(m_worktimeout, m_responsetimeout, m_email, m_report_stratum_hashrate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wilco
libethcore/Farm.h
Outdated
m_io_service.reset(); | ||
m_serviceThread.join(); | ||
} | ||
//if (m_serviceThread.joinable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete old code
libethcore/Farm.h
Outdated
std::thread m_serviceThread; ///< The IO service thread. | ||
boost::asio::io_service m_io_service; | ||
|
||
// std::thread m_serviceThread; ///< The IO service thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete old code.
libpoolprotocols/PoolManager.cpp
Outdated
// Workloop will determine if we're trying a fast reconnect to same pool | ||
// or if we're switching to failover(s) | ||
|
||
//if (m_farm.isMining()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete old code.
unsigned m_reconnectTries = 3; | ||
unsigned m_reconnectTry = 0; | ||
std::vector <URI> m_connections; | ||
std::atomic<bool> m_running = { false }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also g_running
flag. Can this be combined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining global running semaphore (which is only affected by signal capturing) with m_running on PoolManager requires a sensible redesign of running pattern and, at least, remove (on PoolManager) the start and stop methods as they would make no sense.
Nevertheless actual separation of running ethminer process from PoolManager service may (or may not) be helpful to implement further features like, say for example, bind to recent added feature --tstart/--tstop where poolconnection might be suspended in case all gpu exceed temperature threshold.
Anyway I would leave this task outside the scope of this specific PR.
libpoolprotocols/PoolManager.cpp
Outdated
// Otherwise do nothing and wait until connection state is NOT pending | ||
if (!p_client->isPendingState()) { | ||
|
||
dev::setThreadName("main"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to do this in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not ?
It's conditional and happens quite unfrequently.
Anyway I can't see clearly any other place where to place it to have the thread properly labelled in output logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just do it once in the beginning of the function?
The whole point of this PR is to have a io_service globally and persistently running. Otherwise reconnect would not happen. |
Please also add an entry to CHANGELOG.md. |
Hey @AndreaLanfranchi, two question have not been addressed here. |
if switching pools (failovers) otherwise it keeps mining trying a fast
reconnect on same pool
Feed back needed on Windows.