Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Implement containerized system testing #8727

Closed
wants to merge 3 commits into from

Conversation

cevich
Copy link
Member

@cevich cevich commented Dec 15, 2020

@cevich cevich requested a review from edsantiago December 15, 2020 14:22
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cevich
To complete the pull request process, please assign umohnani8 after the PR has been reviewed.
You can assign the PR to them by writing /assign @umohnani8 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Dec 15, 2020

LGTM

@edsantiago
Copy link
Member

This is going to be a tricky one. The .cirrus.yml changes LGTM, with one probably-trivial question. The hard part is going to be updating the container image to un-comment-out the mount_program.*fuse-overlayfs line in /etc/containers/storage.conf and then forcing --cgroup-manager=cgroupfs (to avoid "Error: cannot open sd-bus: No such file or directory: OCI runtime command not found error"). Then figuring out how to deal with "write unixgram @aefbb->/run/systemd/journal/socket: sendmsg: no such file or directory". And then, of course, running tests and seeing what fails. This is going to need to wait until next year.

.cirrus.yml Outdated
container_system_test_task:
<<: *local_system_test_task
alias: container_system_test
skip: *branch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial low-priority question: is this *branch deliberate? In looking at the Cirrus graph, I notice that this is the only one of the right-hand boxes (system tests) that has it set; all the others -- rootless, local, remote -- skip on CI:DOCS but not on BRANCH. I don't see a strong reason to go either way, I'm just asking because it's a curious inconsistency.
cirrus-map-pr8727

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just me, or is the graph beginning to look like a... um... cetacean?

@cevich
Copy link
Member Author

cevich commented Dec 15, 2020

......This is going to need to wait until next year.

Or I COULD just close this PR 😁

TBH, this doesn't sound too awful. The containerized tests are executed through a re-run of setup_environment.sh and runner.sh inside the container. The hardest part will be me writing all the comments and not spelling something wrong 😉

is this *branch deliberate

Yes, I stuck it there because the containerized integration tests have it. All it does is bypass containerized testing when a PR merges or a branch is (otherwise) pushed. Long ago this was "okay-ed", since these tests are low-risk/low-value at the branch-level (assuming they passed PR testing). TBH, we probably could skip a lot more testing on branches, that would improve the cron-based fail-rate as well as make Matt's life easier come release-time.

Is it just me, or is the graph beginning to look like a... um... cetacean?

Are you're ocular orbs spherical today? Clearly it's a Elephant standing in line at the McDonalds drive-through, during a McRib BOGO sale*

* By one get one only valid when meal-deal is super-sized, and includes a super-extra-large-gulp milkshake

@cevich cevich force-pushed the restore_cntr_sys_test branch from 6c5d233 to f1bc72b Compare December 16, 2020 14:15
@cevich
Copy link
Member Author

cevich commented Dec 16, 2020

Might as well get the ball rolling while waiting for other test results...

@cevich cevich changed the title Cirrus: Implement containerized system testing [CI:SYS] Cirrus: Implement containerized system testing Dec 16, 2020
@cevich cevich force-pushed the restore_cntr_sys_test branch from 3806a8b to a633b76 Compare December 16, 2020 14:30
@cevich
Copy link
Member Author

cevich commented Dec 16, 2020

...hmmm, well that's no good: gopath_cache is failing to restore /var/tmp/go, but the keys are the correct values from the corresponding build jobs 🤔

@cevich cevich force-pushed the restore_cntr_sys_test branch 3 times, most recently from e9e9690 to 30248e0 Compare December 16, 2020 17:40
@cevich cevich changed the title [CI:SYS] Cirrus: Implement containerized system testing Cirrus: Implement containerized system testing Dec 16, 2020
@cevich cevich force-pushed the restore_cntr_sys_test branch 2 times, most recently from b821729 to a5c6041 Compare December 16, 2020 19:24
@cevich cevich marked this pull request as draft December 16, 2020 22:07
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2020
@cevich cevich force-pushed the restore_cntr_sys_test branch 2 times, most recently from d10b5e5 to 0db5531 Compare December 17, 2020 15:15
@cevich
Copy link
Member Author

cevich commented Dec 17, 2020

level=error msg="unable to write pod event: \"write unixgram @002ca->/run/systemd/journal/socket: sendmsg: no such file or directory\"

@edsantiago this seems to be the main item left failing. Could this be fixed by a bind-mount of that socket from the host, conditional on containerized execution of the system-tests??

@cevich cevich force-pushed the restore_cntr_sys_test branch 4 times, most recently from 4622216 to 71a43b6 Compare December 17, 2020 17:08
@cevich
Copy link
Member Author

cevich commented Dec 17, 2020

now Failed to add conmon to systemd sandbox cgroup: dial unix /run/systemd/private: connect: no such file or directory

Hmmm...welp, I dislike sharing something called 'private', but let's try the same medicine there...

@cevich
Copy link
Member Author

cevich commented Dec 17, 2020

...hmmm, that doesn't appear to work. With JOURNALD_ARGS="-v /dev/log:/dev/log -v /var/run/systemd/journal/socket:/var/run/systemd/journal/socket -v /run/systemd/private:/run/systemd/private" now we get:

level=warning msg="Failed to add conmon to systemd sandbox cgroup: Process with ID 65992 does not exist."
[+0205s] # Error: cannot open sd-bus: No such file or directory: OCI not found

@rhatdan
Copy link
Member

rhatdan commented Dec 17, 2020

I think you could change the events-backend from journald to file. I believe the issue is attempts to connect to systemd socket from inside of a container.

@vrothberg
Copy link
Member

Is there a way to execute system tests without waiting for others to complete before?

@edsantiago
Copy link
Member

No. @cevich started on a mechanism, #8754, which I believe adds undesirable complexity to the CI yaml for too little return.

@cevich cevich force-pushed the restore_cntr_sys_test branch from c3c5cd3 to 88f16e1 Compare December 18, 2020 13:47
@cevich
Copy link
Member Author

cevich commented Dec 18, 2020

I think you could change the events-backend from journald to file.

Thanks @rhatdan trying it with none to see if there is any affect...

(I'm not sure where file goes, might be something we want to grab as an artifact for debugging).

@cevich
Copy link
Member Author

cevich commented Dec 18, 2020

Is there a way to execute system tests without waiting for others to complete before?

The logic is way to complex since it also needs to not break [CI:DOCS]. For future ref. @vrothberg: Temporarily adding skip: $CI == $CI to all/most of the tasks is the easiest way. Like in the 'DO NOT MERGE' commit I made here.

@vrothberg
Copy link
Member

Thanks!

@cevich
Copy link
Member Author

cevich commented Dec 18, 2020

@edsantiago would it be a PITA to modify the system-tests such that:

When $container is defined non-empty, always use podman --events-backend=file for every command?

I'm pretty sure the setting of $container is podman (maybe docker) built-in behavior, so it should be a reliable indicator.

@cevich cevich force-pushed the restore_cntr_sys_test branch from 88f16e1 to f4c89a6 Compare December 18, 2020 14:13
@cevich
Copy link
Member Author

cevich commented Dec 18, 2020

(f4c89a6 is my naive attempt)

@edsantiago
Copy link
Member

That's not going to work with podman-remote. Don't even bother.

@cevich
Copy link
Member Author

cevich commented Dec 18, 2020

🤷‍♂️ CI is already "bothering"...but it told me it doesn't mind 😄

@cevich
Copy link
Member Author

cevich commented Dec 18, 2020

@edsantiago hmmm, maybe was worth. Even with that option I still see errors like:

/var/tmp/go/src/github.com/containers/podman/bin/podman --events-backend=file --events-backend=file run --name running-u8m0JU3ooEKMsupVpGj6vjbdi7zgjI -d quay.io/libpod/testimage:20200929 top
[+0202s] # time="2020-12-18T14:21:25Z" level=warning msg="Failed to add conmon to systemd sandbox cgroup: dial unix /run/systemd/private: connect: no such file or directory"
[+0202s] # Error: cannot open sd-bus: No such file or directory: OCI not found

So it seems we're going to need a much more complicated solution 😓

@cevich cevich force-pushed the restore_cntr_sys_test branch from f4c89a6 to 353430f Compare January 6, 2021 15:07
@cevich cevich force-pushed the restore_cntr_sys_test branch from 353430f to db2f235 Compare January 11, 2021 18:51
@cevich
Copy link
Member Author

cevich commented Jan 13, 2021

Giving up on this, it's too complicated to be properly implemented anytime soon.

@cevich cevich closed this Jan 13, 2021
@cevich cevich deleted the restore_cntr_sys_test branch June 30, 2021 18:00
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants