Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI - various fixes #7070

Merged
merged 1 commit into from
Jul 27, 2020
Merged

Conversation

edsantiago
Copy link
Member

@edsantiago edsantiago commented Jul 23, 2020

Primary purpose: upgrade crun to 0.14 on f31, in hopes of
eliminating the 'cgroups.freeze' flake that is plaguing CI.

While I'm at it:

  • remove a no-longer-needed dnf upgrade that was running in CI
    itself (not image building, in each actual CI run). The purpose
    was to upgrade conmon, but that was added a long time ago and
    the required conmon is now in stable. The effect of this
    dnf upgrade today was simply to cause flakes when fedora
    repos were offline.

  • remove a no-longer-needed check for varlink.

  • networking.sh : add a timeout! 'openssl s_client' will happily
    hang forever if a host is unreachable, which means we waste
    two hours waiting for Cirrus to time out.

  • timestamp.awk : include date (not just time) in START/END msgs.
    There are times when I'm looking at a CI log and it is ultra
    important to know if it is from yesterday or today.

  • add progress messages in some places where I've previously
    struggled to understand context in logs; and improve some
    unlikely error messages to include script name.

...then, after all that, wrote a new README about how to to
all this. Hope it helps someone.

Signed-off-by: Ed Santiago [email protected]

@mheon
Copy link
Member

mheon commented Jul 23, 2020

LGTM

@rhatdan
Copy link
Member

rhatdan commented Jul 23, 2020

/lgtm
/hold

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jul 23, 2020
@rhatdan
Copy link
Member

rhatdan commented Jul 23, 2020

/approve

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2020
@edsantiago
Copy link
Member Author

I don't think this is working. Cirrus logs show no progress. I think it's going to time out real soon.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 23, 2020
@edsantiago
Copy link
Member Author

Yep, timeout in networking.sh. Added a fix for that so future devs don't have to waste two hours like this.

@edsantiago
Copy link
Member Author

Step 1 complete. Step 2 just beginning now. At the time of my push, all tests except tide were green.

Notable changes since first revision: addition of timeout in networking.sh; removal of ad-hoc dnf step; addition of new README file which I hope will help the next non-Chris person needing to do this.

@edsantiago edsantiago changed the title [CI:IMG] CI - various fixes CI - various fixes Jul 24, 2020
@edsantiago
Copy link
Member Author

@cevich PTAL. This has lots of little changes, mostly independent of each other, just a hodgepodge of problems I've found over the last few months. Plus a new README because this process is super duper confusing. I know you've documented some of it but my README is intended purely as a simple recipe which any monkey -- i.e. me -- can follow step by step. It is not meant to cover all the complicated actions that take place in Cirrus.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the new doc, thanks @edsantiago!

Let's wait for @cevich's head nod before merging.

@cevich
Copy link
Member

cevich commented Jul 24, 2020

Yeah, thanks a bunch Ed, I'll be the first one to admit I can't do it all, well, and readable. I greatly appreciate the assistance (and documentation to boot!). Peeking now...

Copy link
Member

@cevich cevich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Ed, and again, very much appreciated. Only a few items I found for you to consider, and maybe one line to drop. My motivation behind recommending contrib/cirrus/README.md is also to get this wonderful work more up-front in developers faces. It should not be much of an obscure, back-end process. Rolling these VM images is a somewhat frequent occurrence as the workflow (and sometimes images) are used by all containers/* repos.

@@ -39,7 +39,7 @@ env:
UBUNTU_NAME: "ubuntu-20"
PRIOR_UBUNTU_NAME: "ubuntu-19"

_BUILT_IMAGE_SUFFIX: "podman-5869602141896704"
_BUILT_IMAGE_SUFFIX: "podman-6439450735542272"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 you made it through the "magic VM building workflow", yay 👏

(I found this out because the number in the name is a $CIRRUS_BUILD_ID...which maps back to this PR)

@@ -10,7 +10,7 @@ while read host port
do
if [[ "$port" -eq "443" ]]
then
item_test "SSL/TLS to $host:$port" "$(echo -n '' | openssl s_client -quiet -no_ign_eof -connect $host:$port &> /dev/null; echo $?)" -eq "0"
item_test "SSL/TLS to $host:$port" "$(echo -n '' | timeout 60 openssl s_client -quiet -no_ign_eof -connect $host:$port &> /dev/null; echo $?)" -eq "0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goods idea

contrib/cirrus/packer/README.how-to-update-cirrus-vms Outdated Show resolved Hide resolved
# That way this will be dead code in future CI:IMG PRs but will
# serve as an example for anyone in a similar future situation.
# $BIGTO ooe.sh $SUDO dnf --enablerepo=updates-testing -y upgrade crun

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment, I hope that will be helpful to someone.

contrib/cirrus/rootless_test.sh Show resolved Hide resolved
@@ -57,9 +57,6 @@ case "${OS_RELEASE_ID}" in

workaround_bfq_bug

# HACK: Need Conmon 2.0.17, currently in updates-testing on F31.
dnf update -y --enablerepo=updates-testing conmon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good, yes, thank you for confirming this can be removed.

@@ -7,7 +7,7 @@
BEGIN {
STARTTIME=systime()
printf "[%s] START", strftime("%T")
printf " - All [+xxxx] lines that follow are relative to right now.\n"
printf " - All [+xxxx] lines that follow are relative to %s.\n", strftime("%FT%T")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh yes...much more helpful-ing this way. Thanks.

Primary purpose: upgrade crun to 0.14 on f31, in hopes of
eliminating the 'cgroups.freeze' flake that is plaguing CI.

While I'm at it:
- remove a no-longer-needed dnf upgrade that was running in CI
  itself (not image building, in each actual CI run). The purpose
  was to upgrade conmon, but that was added a long time ago and
  the required conmon is now in stable. The effect of this
  dnf upgrade today was simply to cause flakes when fedora
  repos were offline.

- remove a no-longer-needed check for varlink.

- networking.sh : add a timeout! 'openssl s_client' will happily
  hang forever if a host is unreachable, which means we waste
  two hours waiting for Cirrus to time out.

- timestamp.awk : include date (not just time) in START/END msgs.
  There are times when I'm looking at a CI log and it is ultra
  important to know if it is from yesterday or today.

- add progress messages in some places where I've previously
  struggled to understand context in logs; and improve some
  unlikely error messages to include script name.

...then, after all that, wrote a new README about how to to
all this. Hope it helps someone.

Signed-off-by: Ed Santiago <[email protected]>
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@mheon @rhatdan PTAL

@mheon
Copy link
Member

mheon commented Jul 27, 2020

LGTM - but would like a final signoff from @cevich before merge

@rhatdan
Copy link
Member

rhatdan commented Jul 27, 2020

/lgtm
/hold cancel

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 27, 2020
@openshift-merge-robot openshift-merge-robot merged commit 956caf3 into containers:master Jul 27, 2020
@edsantiago edsantiago deleted the ci_fixes branch July 27, 2020 19:14
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants