-
Notifications
You must be signed in to change notification settings - Fork 61
Conversation
91c7bea
to
70e2d67
Compare
Codecov Report
@@ Coverage Diff @@
## master #1384 +/- ##
==========================================
+ Coverage 80.19% 80.35% +0.15%
==========================================
Files 178 180 +2
Lines 10585 10624 +39
==========================================
+ Hits 8489 8537 +48
+ Misses 2096 2087 -9
Continue to review full report at Codecov.
|
766c0dc
to
c1c1c68
Compare
bool exiting = false; | ||
std::mutex exit_m; | ||
std::condition_variable exit_cv; | ||
SigHandler::get().start([&exit_m, &exit_cv, &exiting]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, a single handler should 'live' at an application level not at a library one since an application might incorporate not just libaktualizr but also another libraries and functionality that require triggering specific teardown actions on a system signal. Specifically, I would put a signal handler registration&instantion at aktualizr app (aktualizr-primary) level and triggers teardown functionality via Aktualizr API, it will allow an application developer to extend or override the signal handling functionality.
return; | ||
} | ||
|
||
boost::this_thread::sleep_for(boost::chrono::seconds(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just sleep for masked_secs_
and then run on_signal()
? Or this is for case when another signal arrives during masking/delay ? In this case it can be just ignored, not sure if I fully understand a value in the delayed signal handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial idea was that some users could mask signals to protect some critical operation, but only for a limited time so that there could never be any infinite blocking. But the aktualizr code base has changed quite a lot since then.
I'll probably remove this functionality entirely.
bool signal = signal_marker_.exchange(false); | ||
|
||
if (signal) { | ||
LOG_INFO << "received KILL request"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just KILL
signal handler
} | ||
}); | ||
|
||
signal(SIGHUP, signal_handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it configurable for an application level, i.e. the one who defines a signal handler is supposed to specify what is/are signals this handler is intended for. Ideally, I as an application/executable developer should have possibility to define different signal handlers for different signals.
I agree with @mike-sul the signal handler should not be a part of the libaktualizr... |
Yes, you're both right, this PR is not ready yet. |
7a0ecc6
to
470cc63
Compare
It should be simpler and more inter-operable now. |
36a4d85
to
f1231c6
Compare
} | ||
uptane_client_->completeInstall(); | ||
}); | ||
return future; | ||
} | ||
|
||
void Aktualizr::Shutdown() { | ||
Abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abort() stops the command queue threads which is stopped in CommandQueue() dtor which itself is triggered during Aktualizr instance destruction, so calling Abort() in case of signal handling from the app/exe that destroys Aktualizr instance anyway looks like redundant. Having said that, I think it, might be useful if there is a need in calling RunForever() and Shutdown() multiple times within a single Aktualizr instance lifetime. Perhaps, break it into two methods ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abort()
docstring says that it aborts the currently running command if it can be aborted, that's why it sounded appropriate to call it from the signal handling.
So you're suggesting moving away Abort()
from this method to the signal handler which will call Abort();Shutdown();
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or given the docstring I should maybe merge Shutdown() into Abort(): Abort() would also apply to RunForever()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it depends on what Shutdown() is intended for ? If we would like to stop Aktualizr during an app exit then this method is useless at all because teardown/shutdown happens at Aktualizr dtor.
If Shutdown() is intended for the case when an user/app would like to stop aktualizr's UptaneCycle and then run it again within a single Aktualizr instance lifetime then the user will need to Initialize() if abort is part of Shutdown. BTW, abort doesn't stop the report queue thread.
Whether abort() should be part of the signal handler routine depends on a need in aborting ongoing operation during restart. I don't have strong opinion here, taking into account that the most timing operation is download and we support download resuming I think, I am fine with abort, but it might have negative impact when some write to DB or communication with secondaries is interrupted. I think, I would go with abort() at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you exactly mean in your first point. You mean that Shutdown() wouldn't be useful as an handler to a gui event for example? And you're saying that calling Abort() there is not useful because it's also called later? But the point is that calling Abort() early can speed up the tear down and trigger the ~Aktualizr()
call in a shorter delay. Or am I missing something
But your second point sounds right, calling Initialize()
again after Shutdown()
sounds quite wrong. I'll move Abort()
to the signal handler.
if (exit_cond_.cv.wait_for(l, std::chrono::seconds(config_.uptane.polling_sec), | ||
[this] { return exit_cond_.flag; })) { | ||
break; | ||
} | ||
} | ||
uptane_client_->completeInstall(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If completeInstall() is not called will it be properly handled during the subsequent Aktualizr/UptaneCycle() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this method is used for rebooting. So:
- if the system reboots and aktualizr is killed before this method, nothing is lost
- if aktualizr receives SIGKILL or crashes before this method (rare),
isInstallCompletionRequired()
should still return true when systemd would restart aktualizr and it should reboot then. But I don't think we cover that with any unit test.
SigHandler::get().start([&aktualizr]() { aktualizr.Shutdown(); }); | ||
SigHandler::signal(SIGHUP); | ||
SigHandler::signal(SIGINT); | ||
SigHandler::signal(SIGTERM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope UptaneCycle() can end its work within systemd's DefaultTimeoutStopSec= default to 90s
otherwise it will send SIGKILL and causes non-graceful shutdown of Aktualizr. Perhaps, we might consider setting higher value of TimeoutStopSec=
in https://github.com/advancedtelematic/meta-updater/blob/master/recipes-sota/aktualizr/files/aktualizr-secondary.service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that 90s is actually reasonable. How long would you suggest it should wait?
In any way, we cannot avoid all the cases of abrupt shutdown and if aktualizr doesn't recover after one, we consider that a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long would you suggest it should wait?
I suggest to consider changing the default value (90s) if we face any issue during CI/test phase. No need to do anything here at the moment, just keep it in mind.
f1231c6
to
98ba6e4
Compare
Perhaps, it makes sense to add a test that test it from aktualizr client standpoint of view, e.g spawn aktualizr process and send SIGTERM signal to it and wait/join the aktualizr process ? |
98ba6e4
to
352c714
Compare
@mike-sul ok, I've added a test using the python fixtures |
352c714
to
143e63a
Compare
Catch the signal and transmit a shutdown message Signal are difficult to handle safely: use an atomic boolean and poll for changes Signed-off-by: Laurent Bonnans <[email protected]>
Interrupts the main loop cleanly when a signal is caught Signed-off-by: Laurent Bonnans <[email protected]>
- calls to signal(2) should be out of the library - remove the temporary mask feature Signed-off-by: Laurent Bonnans <[email protected]>
Signed-off-by: Laurent Bonnans <[email protected]>
143e63a
to
f88722a
Compare
Rebased after the merge of #1395 |
Signed-off-by: Laurent Bonnans <[email protected]>
@mike-sul Here is a try for handling signals more cleanly. I resurrected a very old branch of mine. Is it close to what you had in mind?
Right now, it's only enabled in aktualizr's
RunForever
loop. Adding it to other parts of the code might be possible but would need more thinking.