Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: make systemd tests parallel-safe (*) #24048

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

edsantiago
Copy link
Member

Mostly just switch to safename. Rewrite setup() to guarantee
unique service file names, atomically created.

Signed-off-by: Ed Santiago [email protected]

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 23, 2024
@rhatdan
Copy link
Member

rhatdan commented Sep 24, 2024

LGTM

Comment on lines 54 to 55
mv -Z "container-$cname.service" $UNIT_FILE.tmp.$$ && \
mv -Z $UNIT_FILE.tmp.$$ $UNIT_FILE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, why do we move to a tmp file just to move to the real location directly after? I do not see th epoint of doing this vs one single mv command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultraparanoia. mv is not atomic across filesystems, so this is theoretically possible:

  1. mv starts writing foo.service
  2. another parallel job runs systemctl daemon-reload
  3. systemd sees (incomplete) foo.service and does whatever systemd does with it and bad things happen
  4. mv completes

Even with all this paranoia, #24010 still triggers (on my laptop), so I'm pretty sure that the fragment-file flake has nothing to do with corrupt systemd service files ... but I just want to eliminate every last possibility of test errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it but then please comment the reason here because it is not obvious otherwise.

Comment on lines 247 to 248
mv -Z "container-$cname.service" $TEMPLATE_FILE.tmp.$$ && \
mv -Z $TEMPLATE_FILE.tmp.$$ $TEMPLATE_FILE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Mostly just switch to safename. Rewrite setup() to guarantee
unique service file names, atomically created.

* IMPORTANT NOTE: enabling parallelization on these tests
  triggers containers#24010 ("fragment file" flake), but only on my
  f40 laptop. I have never seen the flake in Cirrus despite
  many many runs in containers#23275. I am submitting this for review
  and merging because even though _something_ is broken,
  this breakage is unlikely to affect our CI.

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago edsantiago marked this pull request as draft September 24, 2024 12:18
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 24, 2024
@edsantiago edsantiago marked this pull request as ready for review September 24, 2024 12:21
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 24, 2024
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2024
Copy link
Contributor

openshift-ci bot commented Sep 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit f9f72f5 into containers:main Sep 24, 2024
55 of 56 checks passed
@edsantiago edsantiago deleted the safename-250 branch September 24, 2024 13:46
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Dec 24, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Dec 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants