Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help preserve release-branch CI VM Images #157

Merged
merged 1 commit into from
Jul 26, 2022

Conversation

cevich
Copy link
Member

@cevich cevich commented Jul 21, 2022

For release-branches, CI VM images must be retained long-term since they
are difficult/impossible to rebuild. A number of times, often due to
human error, these images have been accidentally lost.

Update automation tooling such that these images are specially marked
by the timestamp-updating container that runs with every CI build.
Later, when another container runs to check for disused images, ensure
the specially marked images are never deprecated or removed. Finally
when the deletion container runs, if a deprecated image is found
specially marked, issue a loud error that will be delivered to the
podman-monitor list.

Update documentation to reflect these changes.

@cevich cevich requested review from lsm5 and edsantiago July 21, 2022 19:56
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really have no clue about any of this, sorry. Minor comments, hope it helps.

imgts/entrypoint.sh Show resolved Hide resolved
imgts/entrypoint.sh Outdated Show resolved Hide resolved
imgts/entrypoint.sh Outdated Show resolved Hide resolved
@edsantiago
Copy link
Member

 crun : Depends: criu (>= 3.17.1-2) but 3.17.1-1 is to be installed

The log doesn't indicate where it's getting crun, but I see a kubic build yesterday. Unfortunately that log has no hits for "criu-", so I can't tell if that's where the dependency requirement is coming from.

And I don't know where the -2 is coming from, because the one built two days ago is -1.

Giving up.

@cevich
Copy link
Member Author

cevich commented Jul 22, 2022

And I don't know where the -2 is coming from,

It's probably a new package and/or the various repos aren't yet in-sync across eachother. In any case, the error is not related to my changes here, so I'll simply push forward with manual testing and hope for the best 😁

@cevich
Copy link
Member Author

cevich commented Jul 22, 2022

I really have no clue about any of this, sorry. Minor comments, hope it helps.

Did help. Thanks. I know this is a lot, however it's pretty important.

Background: There is no built-in way in GCE or AWS to prune disused VM images. Obviously new ones are added several times throughout the course of a new PR here. So we can't let them pile up to infinity. Here's some more details on the overall workflows and what this is trying to address:

  • The imgts container runs as part of the 'meta' task in (hopefully) every Cirrus-CI PR, Branch, Tag, Cron, etc. Bloody everywhere. It's primary job is to update the "last used on & by" metadata on GCE VM images.
  • The imgobsolete runs daily from this repo's cirrus-cron. It simply examines all VM images in GCE, looking at the metadata. If it finds images w/o metadata or a timestamp older than 30 days, it marks it in GCE for future deprecation in 30 more days. Any attempted use of a deprecated image will fail but the image will still be manually recoverable.
  • The imgpruine runs daily from this repo's cirrus-cron. It examines all GCE VM images that have been deprecated (i.e. 61-days past last-use). It randomly selects 10 of them and permanently deletes them.
  • So overall, this PR is intended to address a very long-standing problem where release-branches have their CI broken because VM images "go away" (for some reason). It's happened on a handful of occasions and is very detrimental to operations - potentially blocking or hindering critical security backport efforts. CI images should never ever ever be removed on a release branch.
  • There's no way in GCE to flag an image for deletion-protection. After merge, my intention is to go through CI on every release branch in every repo. On each, I will ensure the newly built imgts container is used for meta and that it successfully runs and labels the images "permanent".

So, I'm slightly nervous about the additions to imgts given how widely it's used. It's possible I could back out that specific change and simply manually go through all our repo's release branches and tag images permanent. But with 3-5 images per branch, that's a LOT of work to manually update them all (even with a small script).

HTH

@cevich
Copy link
Member Author

cevich commented Jul 22, 2022

Built container images:

  • quay.io/libpod/imgts:c5406681067683840
  • quay.io/libpod/imgobsolete:c5406681067683840
  • quay.io/libpod/imgprune:c5406681067683840
  • quay.io/libpod/gcsupld:c5406681067683840
  • quay.io/libpod/get_ci_vm:c5406681067683840
  • quay.io/libpod/orphanvms:c5406681067683840
  • quay.io/libpod/ccia:c5406681067683840

@cevich cevich force-pushed the allow_permanent_images branch from 9de1955 to 39becc6 Compare July 22, 2022 18:02
@cevich
Copy link
Member Author

cevich commented Jul 22, 2022

Force-push: Updates based on Ed's suggestions and some manual testing.

@cevich
Copy link
Member Author

cevich commented Jul 25, 2022

For ubuntu 2204, the error is:

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
 crun : Depends: criu (>= 3.17.1-2) but 3.17.1-1 is to be installed
E: Unable to correct problems, you have held broken packages.
    exit(100)
    Failure! Sleeping 5.670000 seconds

@lsm5 ping - does this imply that we need a newer crun built in OBS or a dependency fix of some sort?

@cevich
Copy link
Member Author

cevich commented Jul 25, 2022

Built container images:

  • quay.io/libpod/imgts:c6749274993065984
  • quay.io/libpod/imgobsolete:c6749274993065984
  • quay.io/libpod/imgprune:c6749274993065984
  • quay.io/libpod/gcsupld:c6749274993065984
  • quay.io/libpod/get_ci_vm:c6749274993065984
  • quay.io/libpod/orphanvms:c6749274993065984
  • quay.io/libpod/ccia:c6749274993065984

For release-branches, CI VM images must be retained long-term since they
are difficult/impossible to rebuild.  A number of times, often due to
human error, these images have been accidentally lost.

Update automation tooling such that these images are specially marked
by the timestamp-updating container that runs with every CI build.
Later, when another container runs to check for disused images, ensure
the specially marked images are never deprecated or removed.  Finally
when the deletion container runs, if a deprecated image is found
specially marked, issue a loud error that will be delivered to the
podman-monitor list.

Update documentation to reflect these changes.

Signed-off-by: Chris Evich <[email protected]>
@cevich cevich force-pushed the allow_permanent_images branch from 39becc6 to d7c8598 Compare July 25, 2022 18:43
@lsm5
Copy link
Member

lsm5 commented Jul 25, 2022

For ubuntu 2204, the error is:

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
 crun : Depends: criu (>= 3.17.1-2) but 3.17.1-1 is to be installed
E: Unable to correct problems, you have held broken packages.
    exit(100)
    Failure! Sleeping 5.670000 seconds

@lsm5 ping - does this imply that we need a newer crun built in OBS or a dependency fix of some sort?

looking ...

@lsm5
Copy link
Member

lsm5 commented Jul 25, 2022

crun rebuilt, please try again

@cevich
Copy link
Member Author

cevich commented Jul 25, 2022

Note: I'm doing the full image-bullds here in order to allow the Test XYZ (tooling image) tasks to run. Also to make sure the recent bot changes work, and just in case the images are actually useful.

@github-actions
Copy link

Cirrus CI build successful. Found built image names and IDs:

Name ID
build-push build-push-c5495735033528320
fedora fedora-c5495735033528320
fedora-aws ami-0df5df528071f1052
fedora-netavark fedora-netavark-c5495735033528320
fedora-netavark-aws-arm64 ami-00c26c58f6eaba850
fedora-podman-aws-arm64 ami-02ee8b3a782a78791
fedora-podman-py fedora-podman-py-c5495735033528320
prior-fedora prior-fedora-c5495735033528320
ubuntu ubuntu-c5495735033528320

edsantiago added a commit to edsantiago/libpod that referenced this pull request Jul 25, 2022
Source: containers/automation_images#157

Reason: see if new Ubuntu images have fixed runc

Fixes: containers#15025

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago
Copy link
Member

Submitted containers/podman#15065 to see if new ubuntu images include a runc that fixes containers/podman#15025

@cevich
Copy link
Member Author

cevich commented Jul 26, 2022

@edsantiago Tip: We build 99% identical container images as well. So it's easy to check simple things like runc, for example:

$ podman run -it --rm quay.io/libpod/ubuntu_podman:c5495735033528320 dpkg -l runc
...cut...
dpkg-query: warning: parsing file '/var/lib/dpkg/status' near line 875 package 'containers-common':
 missing 'Maintainer' field
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version        Architecture Description
+++-==============-==============-============-=================================
ii  runc           1.1.0-0ubuntu1 amd64        Open Container Project - runtime

(beware: These images are rather large, 600+mb compressed)

@cevich cevich marked this pull request as ready for review July 26, 2022 17:35
@cevich
Copy link
Member Author

cevich commented Jul 26, 2022

I think this is ready to get merged. The built-in tests are doing what they should:

I did some manual testing locally of the new imgts container, while manually fiddling with an image on the podman v1.0 branch - it properly applied the permanent=true label. I also tried it with several fake and real branch names, and it behaved as expected in all cases.

@edsantiago
Copy link
Member

/lgtm

@cevich
Copy link
Member Author

cevich commented Jul 26, 2022

@lsm5 @edsantiago Mind taking one last peek, to see if I made any obvious typos or gaffs?

@cevich
Copy link
Member Author

cevich commented Jul 26, 2022

beat me to it, thanks @edsantiago

@cevich cevich changed the title [WIP] Help preserve release-branch CI VM Images Help preserve release-branch CI VM Images Jul 26, 2022
@cevich cevich merged commit 41b0125 into containers:main Jul 26, 2022
cevich added a commit to cevich/podman that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/podman that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/podman that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/podman-py that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/skopeo that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/skopeo that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/skopeo that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/skopeo that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/storage that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/storage that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/skopeo that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/storage that referenced this pull request Jul 26, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
cevich added a commit to cevich/image that referenced this pull request Jul 27, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
TomSweeneyRedHat pushed a commit to TomSweeneyRedHat/common that referenced this pull request Aug 19, 2022
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Note: Will only affect future release-branches on this repo.

Signed-off-by: Chris Evich <[email protected]>
lsm5 pushed a commit to lsm5/skopeo that referenced this pull request Jan 5, 2023
Contains important updates re: preserving release-branch CI VM images.
Ref: containers/automation_images#157

Signed-off-by: Chris Evich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants