Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Add podman-machine integration test #14569

Merged
merged 2 commits into from
Jul 8, 2022

Conversation

cevich
Copy link
Member

@cevich cevich commented Jun 10, 2022

The podman-machine integration tests are designed to execute on
bare-metal, since they perform significant work with virtual-machines.
This test is costly to run at scale, so it is limited to being manually
triggered by developers (for now). A 'trigger' button will appear in the
task status page of the Github WebUI once all test dependencies are met.
In the Cirrus-CI WebUI, there is also a 'pre-trigger' button that may be
pressed if a developer doesn't wish to wait. Also:

  • Add a localmachine target in the Makefile on the off-chance
    developers wish to execute locally. Update the ginkgo-run target
    to accommodate re-use by the new localmachine target.
  • Exclude podman_machine task from success dependency verification.
    This also involves adding an exception to cirrus_yaml_test.py
    otherwise it will complain loudly.
  • NOTE: Inclusion of ec2_instance in any task will cause
    hack/get_ci_vm.sh to barf and be non-functional. Future updates will
    be made to restore functionality. Before then, simply comment out
    the ec2_instance section as a temporarily workaround.

Example error:

...
  File "/usr/share/automation/bin/cirrus-ci_env.py", line 213, in init_task_type_image
    raise ValueError(f"Invalid instance type "
ValueError: Invalid instance type (None) or image (None) for task ({'alias': 'podman_machine', 'env': {'TEST_FLAVOR': 'machine', 'PRIV_NAME': 'rootless', 'DISTRO_NV': 'fedora-36', 'VM_IMAGE_NAME': 'ami-06a41d8a81ab56afa'}})

# Finalizing get_ci_vm

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none labels Jun 10, 2022
@cevich cevich force-pushed the podman_machine_poc branch 14 times, most recently from 00c7339 to 6c32502 Compare June 13, 2022 20:47
@cevich cevich requested review from edsantiago and baude June 14, 2022 20:36
@cevich
Copy link
Member Author

cevich commented Jun 14, 2022

@baude PTAL at the machine podman fedora-36 root host results.. Note: The annotated results link at the bottom is wrong, since it knows nothing about AWS.

@edsantiago I need help updating your script to work across clouds. For example, this comes from AWS using the API. Getting that URL (for any task) in bash would be something like: https://api.cirrus-ci.com/v1/artifact/build/${CIRRUS_BUILD_ID}/$(tr ' ' '%20' <<<"$CIRRUS_TASK_NAME")/html/<output_file_name.log.html>. Maybe it would help make the URL shorter/simpler if the formatting script wrote to a fixed filename? e.g. "runner_output.log.html". I don't think there's a danger (anymore) of file-name clashes, since all runner.sh executions happen in separate tasks (unlike two years ago). WDYT?

@cevich cevich force-pushed the podman_machine_poc branch from 6c32502 to a502814 Compare June 14, 2022 23:29
@cevich
Copy link
Member Author

cevich commented Jun 14, 2022

Force-push: New image with podman-gvproxy available: podman machine task logs

@edsantiago
Copy link
Member

@cevich sorry for the late response. I think you're about ten steps ahead of me (as usual) and it has taken me a long time to catch up. If you'll forgive my baby-talk, this is my understanding after several hours of looking at this:

  • You are changing the artifact home from storage.googleapis.com to api.cirrus-ci.com.
    • this is a change on all logs, not just the ones for this new podman-machine test
  • new URL is keyed on Cirrus BUILD_ID (instead of TASK_ID, which is how it is now)
  • new URL includes spaces and duplication ("foo bar test/foo-bar-test.html")

Are those all true? Please correct me if I've misunderstood anything.

If those are all true, then:

  • is there any way to change the URL path to avoid spaces? Not a huge deal, but I might as well ask early
  • when will these changes take effect? If I update logformatter right now (today), will the new api.cirrus-ci.com links work, or do I need to wait for this PR (or some other PR) to merge?

@cevich
Copy link
Member Author

cevich commented Jun 15, 2022

@cevich sorry for the late response. I think you're about ten steps ahead of me (as usual) and it has taken me a long time to catch up.

You're not late, you're early! This is a proof-of-concept to see if all the parts even CAN be wired together. I'm expecting in the next few weeks I'll close this PR, and start work to refine things for the actual implementation much later. So we have plenty of time, and everything here (and in support of this PR) is very rough, early development work.

If you'll forgive my baby-talk, this is my understanding after several hours of looking at this:

Thanks for taking the time to look things over, as usual your analysis is tight and questions are on-point. Just keep in mind this is all very fluid at this point, so there is certainly room to mold the overall setup to make future maintenance easier.

  • You are changing the artifact home from storage.googleapis.com to api.cirrus-ci.com.

Nope not yet, this is just a proposal. The html file would still be preserved in "cloud storage", but with AWS coming into the scene, we'll have TWO storage locations instead of one.

  • this is a change on all logs, not just the ones for this new podman-machine test

Yes, in fact it will affect all artifacts for every task. With two clouds, we will no-longer reliably be able to go straight to the google-cloud-storage for retrieval. Using the cirrus REST API will provide a common base-URL endpoint which will work irrespective of cloud.

  • new URL is keyed on Cirrus BUILD_ID (instead of TASK_ID, which is how it is now)

Yes, this is the "more complex" part we were dealing with cirrus-support on for the main failures the other week. Their API takes care of looking up the task based on the build ID. This is needed to give a uniform interface for artifact retrieval across clouds, tags, branches, and PRs. It also accounts for possible future task re-runs (i.e. the task ID will change, but the build ID will remain constant).

  • new URL includes spaces and duplication ("foo bar test/foo-bar-test.html")

Yes, you'll see in the documentation it's based on the task name or alias. Maybe the task number can be used, but it's not documented, so that seems dangerous. The filename part is under our control. So something like annotated_runner_log.html would be perfectly fine. Since html_artifacts is per-task, there won't be filename clashes. Except possibly if someone were to try to aggregate them all in one directory.

Are those all true? Please correct me if I've misunderstood anything.

If those are all true, then:

Yes, mostly, you've got the gist of it for sure 😀

  • is there any way to change the URL path to avoid spaces? Not a huge deal, but I might as well ask early

It's possible to use the task alias, but currently this value is referenced in the success task as a way to aggregate normal and matrix task status together. So we're probably stuck with url-encoding the task name in order to reference individual matrix-items (really, they're generated tasks).

Currently the task names are formatted with spaces for human readability in github (as best as we can do, given the length limits). I think if we change these names to make them more URL-friendly, it would negatively impact human readability. I'm not opposed to changing the names if we can figure out a way that works. In all cases, I'm sure perl has an easy way to url-encode strings 😁

  • when will these changes take effect? If I update logformatter right now (today), will the new api.cirrus-ci.com links work, or do I need to wait for this PR (or some other PR) to merge?

Nope no need to wait for this PR (and it will probably be closed soon anyway). If you change your script to use the REST API endpoint it will work today, and in the future, and irrespective of which cloud the tasks run on. Speaking of which, I expect it will likely be several more weeks/months before the AWS-based podman-machine task is ready, so I think there's plenty of time to tinker with the logformatter, names, URLs, and whatnot.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial comments merely on a small part of my first pass.

contrib/cirrus/setup_environment.sh Outdated Show resolved Hide resolved
contrib/cirrus/setup_environment.sh Outdated Show resolved Hide resolved
contrib/cirrus/setup_environment.sh Outdated Show resolved Hide resolved
contrib/cirrus/setup_environment.sh Outdated Show resolved Hide resolved
@cevich
Copy link
Member Author

cevich commented Jun 15, 2022

Okay @ashley-cui I made the switch to rootless here for running the podman-machine tests, let see what happens...

.cirrus.yml Outdated Show resolved Hide resolved
@cevich
Copy link
Member Author

cevich commented Jun 15, 2022

@ashley-cui Progress! Great call on running rootless:

[+0605s] FAIL! -- 20 Passed | 4 Failed | 0 Flaked | 0 Pending | 0 Skipped

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2022
@cevich cevich marked this pull request as draft June 30, 2022 16:33
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2022
@cevich cevich force-pushed the podman_machine_poc branch 2 times, most recently from bfed075 to 6387e2d Compare June 30, 2022 18:16
@cevich
Copy link
Member Author

cevich commented Jun 30, 2022

I'm calling this "ready as it's gonna get" - configuration/execution wise. Obviously a few tests are failing and so there'll probably be another final rebase or three to get those fixes incorporated.

@cevich cevich marked this pull request as ready for review June 30, 2022 20:42
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2022
@cevich cevich force-pushed the podman_machine_poc branch from 6387e2d to d52865a Compare July 1, 2022 15:03
@cevich
Copy link
Member Author

cevich commented Jul 1, 2022

Force-pushed: Rebased on main. I'm not going to hit the trigger button on the new podman-machine task. It's "working" but some tests are failing. This will allow maintainers to review/merge this if necessary w/o the podman-machine tests passing. Or trigger the job if fixes have been merged.

cevich added 2 commits July 1, 2022 11:25
In order to support execution on various non-GCP cloud environments, the
BFQ scheduler workaround needs updating.  Previously it assumed the root
disk was always `/dev/sda`.  With the addition of new clouds (AWS) and
different environment types, the assumption is not always valid.  Update
the workaround to take care in looking up the block device where '/'
comes from.

Also update the scheduler to 'none', as all modern clouds already have
highly optimized underlying storage configurations.  There's no reason
to complicate I/O paths further by hard-coding specific scheduler(s) for
all environment types.

Signed-off-by: Chris Evich <[email protected]>
The podman-machine integration tests are designed to execute on
bare-metal, since they perform significant work with virtual-machines.
This test is costly to run at scale, so it is limited to being manually
triggered by developers (for now).  A 'trigger' button will appear in the
task status page of the Github WebUI once all test dependencies are met.
In the Cirrus-CI WebUI, there is also a 'pre-trigger' button that may be
pressed if a developer doesn't wish to wait. Also:

* Add a `localmachine` target in the `Makefile` on the off-chance
  developers wish to execute locally.  Update the `ginkgo-run` target
  to accommodate re-use by the new `localmachine` target.
* Exclude `podman_machine` task from `success` dependency verification.
  This also involves adding an exception to `cirrus_yaml_test.py`
  otherwise it will complain loudly.
* ***NOTE*** Inclusion of `ec2_instance` in *any* task will cause
  `hack/get_ci_vm.sh` to barf and be non-functional.  Future updates will
  be made to restore functionality.  Before then, simply comment out
  the `ec2_instance` section as a temporarily workaround.

Signed-off-by: Chris Evich <[email protected]>
@cevich cevich force-pushed the podman_machine_poc branch from d52865a to 8cff1c2 Compare July 1, 2022 15:25
@cevich
Copy link
Member Author

cevich commented Jul 1, 2022

Force-push: Rebased.

@cevich
Copy link
Member Author

cevich commented Jul 1, 2022

Re-ran a flake.

@baude
Copy link
Member

baude commented Jul 5, 2022

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 5, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: baude, cevich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 5, 2022
@edsantiago
Copy link
Member

I have a few small nits, but @cevich is on PTO this week so I'm OK merging.
/lgtm
/hold

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jul 5, 2022
@edsantiago
Copy link
Member

cirrus-map-pr14569

@rhatdan
Copy link
Member

rhatdan commented Jul 8, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2022
@openshift-ci openshift-ci bot merged commit 862cc42 into containers:main Jul 8, 2022
@cevich cevich deleted the podman_machine_poc branch April 18, 2023 14:48
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 29, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants