Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix container startup for empty pidfile #10288

Merged
merged 1 commit into from
May 10, 2021

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented May 10, 2021

Commit 728b73d introduced a regression. Containers created with a
previous version do no longer start successfully. The problem is that
the PidFile in the container config is empty for those containers. If
the PidFile is empty we have to set it to the previous default.

[NO TESTS NEEDED] We should investigate why the system upgrade test did
not caught this.

Fixes #10274

Commit 728b73d introduced a regression. Containers created with a
previous version do no longer start successfully. The problem is that
the PidFile in the container config is empty for those containers. If
the PidFile is empty we have to set it to the previous default.

[NO TESTS NEEDED] We should investigate why the system upgrade test did
not caught this.

Fixes containers#10274

Signed-off-by: Paul Holzinger <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 10, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2021
@Luap99
Copy link
Member Author

Luap99 commented May 10, 2021

@edsantiago I see quite a few system upgrade tests are skipped. This one could have caught this regression:

@test "start" {
skip "FIXME: this leaves a mount behind: root/overlay/sha/merged"
run_podman --cgroup-manager=cgroupfs start -a mydonecontainer
is "$output" "++$RANDOM_STRING_1++" "start on already-run container"
}

@rhatdan
Copy link
Member

rhatdan commented May 10, 2021

LGTM
@containers/podman-maintainers PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I prefer waiting until @edsantiago had a chance to look at it.

@mheon
Copy link
Member

mheon commented May 10, 2021 via email

@edsantiago
Copy link
Member

@Luap99 can you get the upgrade test to pass? If so, please feel free to un-skip it. All the skips are because of problems too deep for me to fix.

@Luap99
Copy link
Member Author

Luap99 commented May 10, 2021

@edsantiago I looked into this and found the root cause of the failures in the upgrade tests. Inside the container fuse-overlayfs is used, outside the native overlayfs is used. This causes troubles with the mount points. I need more time to look in this for all test cases.

@Luap99
Copy link
Member Author

Luap99 commented May 10, 2021

Lets get this merged for now.

@vrothberg
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2021
@openshift-merge-robot openshift-merge-robot merged commit 5a8b3cf into containers:master May 10, 2021
@Luap99 Luap99 deleted the fix-10274 branch May 10, 2021 14:12
@edsantiago
Copy link
Member

Trying to follow up on this, I just checked out and built 5a8b3cf (master, with this PR merged), removed the skip from the upgrade "start" test, then ran:

$ sudo env PODMAN_UPGRADE_FROM=v2.1.1 bats ./test/upgrade/test-upgrade.bats
...
 - pods (skipped: TBI)
 ✗ start
   (from function `die' in file test/upgrade/../system/helpers.bash, line 412,
    from function `run_podman' in file test/upgrade/../system/helpers.bash, line 220,
    in test file test/upgrade/test-upgrade.bats, line 237)
     `run_podman --cgroup-manager=cgroupfs start -a mydonecontainer' failed with status 125
   # podman --cgroup-manager=cgroupfs start -a mydonecontainer
   Error: unable to start container f7c814b606bf06c3f2d0202b4b70cdd23462add3fa92af25c9a192d7d27649cb: opening file `` for writing: No such file or directory: OCI not found
   [ rc=125 (** EXPECTED 0 **) ]
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   #| FAIL: exit code is 125; expected 0
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I would have expected that test to pass (possibly with the leftover-mount problem)?

@Luap99
Copy link
Member Author

Luap99 commented May 10, 2021

@edsantiago It worked for me locally. Did you forget to set PODMAN=bin/podman?

@edsantiago
Copy link
Member

Blush. That was it, thank you. I now see the leftover-mount bug and will see if there's some way I can work around it.

@Luap99
Copy link
Member Author

Luap99 commented May 10, 2021

@edsantiago I have working tests locally I will push a PR soon.

jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 10, 2021
There is a regression in the latest 3.2.0 RC in stable:
- containers/podman#10274
- containers/podman#10288

Also hit that bug myself locally on FSB.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 10, 2021
There is a regression in the latest 3.2.0 RC in stable:
- containers/podman#10274
- containers/podman#10288

No Bodhi updates yet with the fix.

Also hit that bug myself locally on FSB.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 10, 2021
There is a regression in the latest 3.2.0 RC in stable:
- containers/podman#10274
- containers/podman#10288

No Bodhi updates yet with the fix.

Also hit that bug myself locally on FSB.
@jlebon
Copy link
Contributor

jlebon commented May 10, 2021

Can we get this into Bodhi ASAP to replace the broken one already pushed to the repos?

@mheon
Copy link
Member

mheon commented May 10, 2021

We need to back the RC out entirely - replacing it with another RC is not a good idea, we're likely to hit even more bugs. We have to go back to 3.1.2 until we have things stable and ready to release (at least another week). Unfortunately @lsm5 is out today so our main packager is gone.

@edsantiago
Copy link
Member

This update has been pushed to stable.

I don't think there's any possible way to undo that. I think the only option is to urgently push out something with a version (or epoch) that overrides the bad release.

@mheon
Copy link
Member

mheon commented May 10, 2021

Epoch bump is probably our only option now. I am uncomfortable pushing another 3.2.0 build at present.

@TomSweeneyRedHat
Copy link
Member

FWIW, I'm fine bumping to 3.1.3 in the near term, but I'm with @mheon on not going to 3.2.0 right now.

@mheon
Copy link
Member

mheon commented May 10, 2021 via email

dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this pull request May 11, 2021
There is a regression in the latest 3.2.0 RC in stable:
- containers/podman#10274
- containers/podman#10288

No Bodhi updates yet with the fix.

Also hit that bug myself locally on FSB.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 12, 2021
There is a regression in the latest 3.2.0 RC in stable:
- containers/podman#10274
- containers/podman#10288

No Bodhi updates yet with the fix.

Also hit that bug myself locally on FSB.

(cherry picked from commit b72844c)
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 12, 2021
There is a regression in the latest 3.2.0 RC in stable:
- containers/podman#10274
- containers/podman#10288

No Bodhi updates yet with the fix.

Also hit that bug myself locally on FSB.

(cherry picked from commit b72844c)
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to start container after update **podman 3.1.2-1.fc34 -> 3.2.0-0.1.rc1.fc34**
8 participants