Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman info faills with "open /tmp/run-1000/libpod/pause.pid: no such file or directory" on Travis Ubuntu arm64/ppc64le/s390x #4570

Closed
junaruga opened this issue Nov 26, 2019 · 17 comments · Fixed by #4637
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@junaruga
Copy link
Contributor

junaruga commented Nov 26, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman info and podman version shows a following error in Travis Ubuntu xenial native CPU architecture: arm64, ppc64le and s390x environments. This does not happen in the Travis Ubuntu amd64 (x86_64) environment. This issue is related to #3679 .

$ podman info
Error: could not get runtime: error setting up the process: open /tmp/run-1000/libpod/pause.pid: no such file or directory

Steps to reproduce the issue:

  1. Fork https://github.com/junaruga/ci-multi-arch-native-container-test , run it on your forked repository's master branch enabling the repository's Travis CI.
    Or just see https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/builds/617312487 the case: aarch64-fedora, ppc64le-fedora or s390x-fedora.

Describe the results you received:

In case of 46.6: aarch64-fedora,
https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617259570#L126

$ $PODMAN info
Error: could not get runtime: error setting up the process: open /tmp/run-1000/libpod/pause.pid: no such file or directory

Describe the results you expected:

$ $PODMAN info
host:
...

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ podman version
Error: could not get runtime: error setting up the process: open /tmp/run-1000/libpod/pause.pid: no such file or directory

podman --version works to show the version.

$ podman --version
podman version 1.6.2
time="2019-11-26T16:24:31Z" level=warning msg="unable to find /home/travis/.config/containers/registries.conf. some podman (image shortnames) commands may be limited"

Output of podman info --debug:

Error: could not get runtime: error setting up the process: open /tmp/run-1000/libpod/pause.pid: no such file or directory

Package info (e.g. output of rpm -q podman or apt list podman):

$ apt list podman
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Listing...
podman/xenial,now 1.6.2-1~ubuntu16.04~ppa1 arm64 [installed]

See https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617303586#L323

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 26, 2019
@mheon
Copy link
Member

mheon commented Nov 26, 2019

Haven't we addressed this recently? I feel like we have code to handle it being missing... @giuseppe

@giuseppe
Copy link
Member

I think this is a new one, it should never fail if the pause.pid file is missing.

@junaruga
Copy link
Contributor Author

Is there a temporary workflow to run podman info on the environments? Such as mkdir -p /tmp/run-1000/libpod before podman info?

@giuseppe
Copy link
Member

can you please share the output for env?

@junaruga
Copy link
Contributor Author

HI Giuseppe, thank you for checking this issue.
Sure. Okay.

In case of aarch64-fedora case, here it is.
https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617709779#L330

$ env
TRAVIS_ARCH=aarch64
rvm_bin_path=/home/travis/.rvm/bin
MYSQL_UNIX_PORT=/var/run/mysqld/mysqld.sock
HAS_JOSH_K_SEAL_OF_APPROVAL=true
GEM_HOME=/home/travis/.rvm/gems/ruby-2.6.5
NVM_CD_FLAGS=
TRAVIS_TEST_RESULT=
SHELL=/bin/bash
TERM=xterm
PODMAN=podman
IRBRC=/home/travis/.rvm/rubies/ruby-2.6.5/.irbrc
TRAVIS_COMMIT=290d1f791d43a24dc36c96ca740c41bdf2dc5327
TRAVIS_OS_NAME=linux
TRAVIS_APT_PROXY=http://apt.cache.travis-ci.com
TRAVIS_JOB_NAME=aarch64-fedora
TRAVIS_INTERNAL_RUBY_REGEX=^ruby-(2\.[0-4]\.[0-9]|1\.9\.3)
OLDPWD=/home/travis/build
MY_RUBY_HOME=/home/travis/.rvm/rubies/ruby-2.6.5
TRAVIS_ROOT=/
TRAVIS_TIMER_ID=024812a0
ANSI_GREEN=\033[32;1m
NVM_DIR=/home/travis/.nvm
USER=travis
SUDO_USER=root
TRAVIS_LANGUAGE=shell
TRAVIS_INFRA=
SUDO_UID=0
ANSI_RESET=\033[0m
rvm_path=/home/travis/.rvm
TRAVIS_DIST=xenial
TRAVIS=true
TRAVIS_REPO_SLUG=junaruga/ci-multi-arch-native-container-test
ANSI_YELLOW=\033[33;1m
USERNAME=travis
TRAVIS_BUILD_STAGE_NAME=
TRAVIS_COMMIT_MESSAGE=Add commands to debug an environment. (#11)
TRAVIS_PULL_REQUEST=false
PAGER=cat
TRAVIS_CMD=env
TRAVIS_CPU_ARCH=arm64
rvm_prefix=/home/travis
PATH=/home/travis/bin:/home/travis/.local/bin:/home/travis/.rvm/gems/ruby-2.6.5/bin:/home/travis/.rvm/gems/ruby-2.6.5@global/bin:/home/travis/.rvm/rubies/ruby-2.6.5/bin:/home/travis/.phpenv/shims:/home/travis/.nvm/versions/node/v8.12.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/travis/.phpenv/bin:/home/travis/.rvm/bin
TRAVIS_PULL_REQUEST_SHA=
TRAVIS_OSX_IMAGE=
TRAVIS_JOB_WEB_URL=https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617709779
TRAVIS_TMPDIR=/tmp/tmp.saDK01bWCN
TRAVIS_BUILD_WEB_URL=https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/builds/617709774
APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1
PWD=/home/travis/build/junaruga/ci-multi-arch-native-container-test
CONTINUOUS_INTEGRATION=true
LANG=C.UTF-8
SETARCH=
TRAVIS_ENABLE_INFRA_DETECTION=true
TRAVIS_SUDO=true
TRAVIS_TAG=
TRAVIS_ALLOW_FAILURE=true
TRAVIS_HOME=/home/travis
TRAVIS_INIT=systemd
rvm_version=1.29.9 (latest)
TRAVIS_JOB_NUMBER=52.5
TRAVIS_EVENT_TYPE=push
SHLVL=1
PS4=+
SUDO_COMMAND=/bin/bash /home/travis/build.sh
HOME=/home/travis
ANSI_CLEAR=\033[0K
DIST=fedora
CI=true
TRAVIS_TIMER_START_TIME=1574859663084274950
BASE_IMAGE=fedora:31
TRAVIS_BUILD_ID=617709774
LOGNAME=travis
TRAVIS_PULL_REQUEST_SLUG=
GEM_PATH=/home/travis/.rvm/gems/ruby-2.6.5:/home/travis/.rvm/gems/ruby-2.6.5@global
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
TRAVIS_SECURE_ENV_VARS=false
DEBIAN_FRONTEND=noninteractive
NVM_BIN=/home/travis/.nvm/versions/node/v8.12.0/bin
TRAVIS_APP_HOST=build.travis-ci.org
GIT_ASKPASS=echo
CC=
TRAVIS_BRANCH=master
SUDO_GID=0
TRAVIS_COMMIT_RANGE=6dae37c00dbf...290d1f791d43
TRAVIS_PULL_REQUEST_BRANCH=
TRAVIS_JOB_ID=617709779
ANSI_RED=\033[31;1m
RUBY_VERSION=ruby-2.6.5
container=lxc
TRAVIS_BUILD_NUMBER=52
TRAVIS_BUILD_DIR=/home/travis/build/junaruga/ci-multi-arch-native-container-test
_=/usr/bin/env

And in case of ppc64le and s390x,

@rhatdan
Copy link
Member

rhatdan commented Dec 3, 2019

@giuseppe Any progress?

@giuseppe
Copy link
Member

giuseppe commented Dec 3, 2019

it might be caused by differences in the kernel on that architectures. It could be the renameat2 syscall to behave differently.

The kernel is quite old though, and I don't think you can use fuse-overlayfs, so in general support for rootless containers is very limited.

Could you use a newer kernel?

@junaruga
Copy link
Contributor Author

junaruga commented Dec 3, 2019

Could you use a newer kernel?

Yes, I can use the newer kernel.
But the Travis arm64/ppc64le/s390x's kernel is already newer than the Travis x86_64 environment's one.

The kernel versions (uname -r) are

  • x86_64-fedora (ok) case: 4.15.0-1028-gcp
  • arm64/ppc64le/s390x (error) cases: 5.3.0-NN-generic

See this summary page for detail.

I also did put the log files of strace -f podman info in "x86_64-fedora" and "aarch64-xenial-fedora" cases captured from here.

https://github.com/junaruga/ci-multi-arch-native-container-test/tree/master/issues/13_podman_arch/logs

You can compare the files like this on your local.

$ vimdiff issues/13_podman_arch/logs/aarch64-xenial-fedora/strace-podman-info.log issues/13_podman_arch/logs/x86_64-fedora/strace-podman-info.log

it might be caused by differences in the kernel on that architectures. It could be the renameat2 syscall to behave differently.

You can see that renameat2 is called in issues/13_podman_arch/logs/aarch64-xenial-fedora/strace-podman-info.log (= error case), but is not called in issues/13_podman_arch/logs/x86_64-fedora/strace-podman-info.log (= ok case).

I hope the logs might be a clue to fix this issue.

Thank you.

@junaruga
Copy link
Contributor Author

junaruga commented Dec 3, 2019

When I searched pause.pid in both log files,

In issues/13_podman_arch/logs/x86_64-fedora/strace-podman-info.log (ok case, user_id: 2000)

...
open("/run/user/2000/libpod/pause.pid", O_RDONLY) = 3
...

In issues/13_podman_arch/logs/aarch64-xenial-fedora/strace-podman-info.log (error case, user_id: 1000)

...
[pid  2784] openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", O_RDWR|O_CREAT|O_EXCL, 0600 <unfinished ...>
...
[pid  2784] renameat2(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", AT_FDCWD, "/tmp/run-1000/libpod/pause.pid", RENAME_NOREPLACE <unfinished ...>
...
[pid  2784] unlinkat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", 0 <unfinished ...>
...
[pid  2762] openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid", O_RDONLY|O_CLOEXEC <unfinished ...>
...
[pid  2762] write(2, "Error: could not get runtime: er"..., 123Error: could not get runtime: error setting up the process: open /tmp/run-1000/libpod/pause.pid: no such file or directory
...

@giuseppe
Copy link
Member

giuseppe commented Dec 4, 2019

thanks for such detailed info.

Yes, indeed the renameat2 syscall fails:

[pid 3123] <... renameat2 resumed> ) = -1 EINVAL (Invalid argument)

I'll prepare a patch and open a PR

giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 4, 2019
the renameat2 syscall might be defined in the C library but lacking
support in the kernel.

In such case, let it fallback to open(O_CREAT)+rename as it does on
systems lacking the definition for renameat2.

Closes: containers#4570

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

giuseppe commented Dec 4, 2019

opened a PR here: #4637

@junaruga
Copy link
Contributor Author

junaruga commented Dec 4, 2019

You are welcome. Okay. I see the renameat2 syscall fails.
Out of curiosity, why is the process dealing with the pause.pid file different between Travis x86_64 and aarch64(/ppc64le/s390x) environment?

In x86_64 open("/run/user/2000/libpod/pause.pid", O_RDONLY) = 3 is executed first.
In aarch64, openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", O_RDWR|O_CREAT|O_EXCL, 0600 <unfinished ...> is executed first.

@giuseppe
Copy link
Member

giuseppe commented Dec 4, 2019

In aarch64, openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", O_RDWR|O_CREAT|O_EXCL, 0600 <unfinished ...> is executed first.

I see the file is also first opened:

[pid  2762] openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid  2763] pselect6(0, NULL, NULL, NULL, {0, 20000}, NULL <unfinished ...>
[pid  2762] <... openat resumed> )      = -1 ENOENT (No such file or directory)

@junaruga
Copy link
Contributor Author

junaruga commented Dec 5, 2019

I see.

I meant I wondered why renameat2 syscall was not executed in Travis x86_64 case (ok case).

Now seeing the file before your pull-request,
https://github.com/containers/libpod/blob/10f733497f37c6ed85756ba95f6e75f3443a90af/pkg/rootless/rootless_linux.c#L27-L37

And I understand as the macro SYS_renameat2 was not defined in Travis x86_64, renameat2 syscall was not executed.

And you modified the logic to when SYS_renameat2 is defined, but the syscall execution is actually failed, run another rename logic rename (oldpath, newpath);, right?

Great! Thank you.
Please let me know after you will release the new version of the podman deb package, if you remember it. Then I would like to test it again.

@junaruga
Copy link
Contributor Author

junaruga commented Dec 5, 2019

I found the following article about Travis ppc64le and s390x environments, though I am not sure that it was directly related to this issue.

https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z
Syscall interception support - only system calls considered as safe. We will be working on overcoming these limitations in the coming months.

@junaruga
Copy link
Contributor Author

It seems that Ubuntu's podman installed from "ppa:projectatomic/ppa" is still 1.6.2 . It's not the latest version 1.7.0. And

$ apt list podman
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Listing...
podman/xenial,now 1.6.2-1~ubuntu16.04~ppa1 arm64 [installed]

And the error still happens.
https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/636119447#L438

@vrothberg
Copy link
Member

vrothberg commented Jan 13, 2020

Cross-distro packaging for Podman is now happening in the Open Build Service: https://build.opensuse.org/project/show/devel:kubic:libcontainers:stable

@lsm5 will soon give updates

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants