-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updates: new strategy based on local filesystem #245
Comments
This case feels like it overlaps at least somewhat with systemd inhibitors - any process that doesn't want the system to reboot can use those today. |
Bunch of self-notes. Removing a file to signal the end of the allowed finalization window is a critical step which may not be feasible at all times (e.g. because network/SSH/whatever is temporarily down). For this reason, there should be a way to encode an optional "not-after" timestamp so that windows can safely auto-expire. Other strategies like While the idea partially overlaps with https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html, it diverges enough in the semantics to the point that I think it's worth designing a separate flow. To that extent, it is possibly more similar to a "persisted time-bound flock". |
RationaleAfter some discussion with @lucab, we agreed that there is demand for having more manual control over Zincati updates. This could be achieved by giving users more fine-grained control over when update finalizations (reboots) are performed, and would thus be a compromise between full manual control over updates (#498) and fully automatic updates. Giving users the ability to more directly control when finalizations (reboots) are allowed should ideally divert most of the demand for fully manual updates (in most scenarios, if there is an update available, it should already be staged pretty quickly by Zincati). The idea here is to discourage users from ever needing to SSH into individual nodes to perform upgrades. A low level approach, such as checking for a file on the filesystem, should be flexible enough to address the need for manually controlling reboot windows. Advantages are that files can always be written even when Zincati is not running, and files are generally easier to manage from different environments since the only thing required is access to the specific filesystem directory (e.g. from a container bind-mount, scp, etc.) and is easily scriptable. This is similar in many ways to the Note: this is not a replacement for a "proper" alternative to ProposalMechanism
Location
Contents
Future considerations / AlternativeIntroduce a slightly higher level mechanism that wraps the above around a CLI or D-Bus method, e.g. |
Can you elaborate on this? We should try to enumerate some use cases, but I think many if not most of them will also want to inhibit reboots for other reasons. |
We had a realtime chat on this and I think my core argument is: zincati should monitor systemd for "block" locks (not "delay") and not even try to finalize if one is active, because what will happen is we'll finalize but be blocked on reboot which is exactly what we don't want. |
Arguably if we had this, we could try to train people doing interactive |
@cgwalters should we implement the monitoring on the rpm-ostree side instead? This would seem more natural to me. Perhaps this would also make it slightly less racy (but still racy nonetheless) since rpm-ostree is the one that actually calls |
What would happen though when Actually either way we choose though, zincati should probably know not to try to finalize+update - which would then mean we'd need an rpm-ostree API to proxy the state, or for zincati to monitor it too... It seems actually simpler to have this logic in zincati. The way I'm thinking of this now for example, I think we should do a similar thing in the MCO: openshift/machine-config-operator#2163 (comment) |
I was thinking |
In a fleet lock scenario (much like the MCO) what I think we want here is for the updater to avoid held nodes, not to pick one and keep trying to finalize until it unblocks, right? There's also a power/CPU efficiency argument here around edge triggering on when a block is lifted versus polling effectively. OTOH, I understand retries may fit in better to the zincati state machine. |
That said I agree with this; particularly since only Zincati uses it right now, and making that change doesn't conflict with having zincati do the monitoring either. |
Ahh I see, admittedly I hadn't thought of this. But yes, this makes total sense. I agree that "end components" like Zincati/MCO should have their own monitoring; for Zincati's case, it'd want to communicate that to fleet_lock. But additionally, I'm assuming we still want a quick check logic in rpm-ostree, almost mirroring the functionality of systemd's |
Agreed! |
This came up in a discussion today with the podman team. For users of podman machine on a desktop environment they would like a way to notify users an update exists and is staged but not allow the update to continue until the user clicks "OK, do update". One way to implement that would be to place a file telling the machine not to continue the update. Or maybe something like #498 would be better here, but podman-machine would still need to know an update was ready. |
@dustymabe et al., the podman-specific usecase is tracked at #539. It is currently missing the actual requirements/constraints in order to design an effective solution for that. See my initial reply there. |
Thank you @lucab for pointing me in the right direction. |
One interesting idea that came out of #204 (comment) is:
The idea is to have some kind of logic on each node to touch a file when finalization is allowed, and remove it when it is not allowed.
The controller can be a containerized agent, or some central task manager able to manipulate files on machines, or even a human via SSH (not recommended).
We won't provide the file-creation, only the updates-strategy in Zincati. Strategy name still to be decided.
The text was updated successfully, but these errors were encountered: