-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auto-update: simple rollback #11074
auto-update: simple rollback #11074
Conversation
@edsantiago, I'd love your eyes on the system tests. |
FYI: @fatherlinux @mrguitar @nullr0ute @containers/podman-maintainers PTAL |
Restarted. Had to make the CI image |
What happens if the application in the container crashes (exits with non zero exit code)? Will the rollback be performed? Or does it only work when |
Good call. I tested only with an empty image. Let me add a test when the command inside the container fails. I guess we need some massaging for that. /hold |
OK. It only fails when I think we can merge as it. The remainder may require healthchecks. |
Moving to WIP. I had a chat with @giuseppe to discuss a solution. We gravitated toward letting the container send sdnotify messages such that a restart will wait until the container send message. To do that, we need to massage the generate-systemd code a bit to not always enforce --sdnotify=conmon. Will tackle that tomorrow. |
@vrothberg can you make sure the |
b5f087a
to
e8467a1
Compare
It's now a manifest list with one image for amd64 👍 |
@edsantiago, do I need to the "socat dance" here as in the sdnotify tests? |
We need the actual images (or, the multiarch-testing group does). See https://github.com/containers/podman/blob/cbad5616961520831b1f169f03da2a9f81203f71/test/system/build-testimage for an example of how to produce those images. |
@vrothberg I'm sorry; I've looked at this one off-and-on today, but been unable to understand what's going wrong. My intuition is telling me "please do not use alpine", because that is docker.io/alpine, which is rate-limited, which is going to randomly screw up CI runs. I don't think that's the cause of the problem, but it's all I can recommend for now. |
Can you elaborate on "actual images"?
I'll add images for arm64, ppc64le and s390x to the manifest list. |
Done ✔️
|
e8467a1
to
a041246
Compare
Thanks a lot for checking, @edsantiago! I restarted one of the two failed jobs earlier and it passed. It smells like bug/flake that I don't want to introduce with this PR. I repushed with the updates tests to see how it's going now. |
a041246
to
b313ff5
Compare
Add support for simple rollbacks during `podman auto-update`. Rollbacks are enabled by default. If a systemd unit cannot be restarted after an update, the previous image will be retagged and the unit will be restarted a second time. Add system tests for rollbacks. Also fix a bug in the restart sequence; we have to use the channel to actually know whether the restart was successful or not. NOTE: To make rollbacks really useful, users must run their containers with `--sdnotify=container` such that the containers send the ready message over the (mounted) socket. This way, restarting the systemd units during auto update will block until the message has been received (or a timeout kicked in). Signed-off-by: Valentin Rothberg <[email protected]>
b313ff5
to
30df551
Compare
First run green. Let's see what the second one does. |
The empirical law: once is never, twice is always :) PTAL |
LGTM |
This sounds like a great feature. I assume it just tags the penultimate image layer as latest and restarts? It obviously can't go modify the image on the registry right? |
That's right. If we have an image
In some cases it could. Are you thinking about some kind of self-healing mechanism to prevent other nodes from pulling a potentially broken image? |
@containers/podman-maintainers PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ready to merge :) |
/lgtm |
/hold cancel |
Add support for simple rollbacks during
podman auto-update
. Rollbacksare enabled by default. If a systemd unit cannot be restarted after an
update, the previous image will be retagged and the unit will be
restarted a second time.
Add system tests for rollbacks. Also fix a bug in the restart sequence;
we have to use the channel to actually know whether the restart was
successful or not.
NOTE: To make rollbacks really useful, users must run their containers
with
--sdnotify=container
such that the containers send the readymessage over the (mounted) socket. This way, restarting the systemd
units during auto update will block until the message has been received
(or a timeout kicked in).
Signed-off-by: Valentin Rothberg [email protected]