Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasionally failing to start container over varlink with systemd #4076

Closed
johanbrandhorst opened this issue Sep 21, 2019 · 7 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@johanbrandhorst
Copy link

johanbrandhorst commented Sep 21, 2019

Description

I've been playing around with the varlink interface to automate spinning up containers for testing. Occasionally I run into an issue where the StartContainer call will fail with the following error:

failed to read from slirp4netns sync pipe: read |0: bad file descriptor

Sometimes another, somewhat similar error happens:

error saving container 425746739fb2cdeac0abb3366c3133b1e0cff78d3783f21cab30b8dc496c6fae state: write /home/REDACTED/.local/share/containers/storage/libpod/bolt_state.db: bad file descriptor

This happens about 1 in 10 times.

Steps to reproduce the issue:

  1. Enable the podman socket interface
  2. Create a container via the varlink interface
    varlink call unix:/run/user/1000/podman/io.podman/io.podman.CreateContainer '{"create":{"args":["cockroachdb/cockroach:v19.1.3","start","--insecure","--listen-addr=0.0.0.0:6257"],"name": "cockroach-test-db","publish":["6257"],"pull":"missing","ulimit":["nofile=1956:1956"]}}'
    
  3. Run the varlink StartContainer command:
    varlink call unix:/run/user/1000/podman/io.podman/io.podman.StartContainer '{"name":"cockroach-test-db"}'
    
  4. If it doesn't error, tear the container down and repeat until it errors.

Describe the results you received:
I received an unexpected error:

failed to read from slirp4netns sync pipe: read |0: bad file descriptor

Sometimes another, somewhat similar error happens:

error saving container 425746739fb2cdeac0abb3366c3133b1e0cff78d3783f21cab30b8dc496c6fae state: write /home/REDACTED/.local/share/containers/storage/libpod/bolt_state.db: bad file descriptor

Describe the results you expected:
I expected the container to start every time.

Additional information you deem important (e.g. issue happens only occasionally):

$ slirp4netns --version
slirp4netns version 0.4.1
commit: 4d38845e2e311b684fc8d1c775c725bfcd5ddc27

Output of podman version:

$ podman version
Version:            1.5.1
RemoteAPI Version:  1
Go Version:         go1.12.8
OS/Arch:            linux/amd64

Output of varlink info:

$ varlink info unix:/run/user/1000/podman/io.podmanVendor: Atomic
Product: podman
Version: 1.5.1
URL: https://github.com/containers/libpod
Interfaces:
  org.varlink.service
  io.podman

Output of podman info --debug:

$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.12.8
  podman version: 1.5.1
host:
  BuildahVersion: 1.10.1
  Conmon:
    package: Unknown
    path: /usr/bin/conmon
    version: 'conmon version 2.0.0, commit: e217fdff82e0b1a6184a28c43043a4065083407f'
  Distribution:
    distribution: manjaro
    version: unknown
  MemFree: 743174144
  MemTotal: 16569856000
  OCIRuntime:
    package: Unknown
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8
      commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
      spec: 1.0.1-dev
  SwapFree: 17846607872
  SwapTotal: 18223570944
  arch: amd64
  cpus: 8
  eventlogger: file
  hostname: REDACTED
  kernel: 4.19.69-1-MANJARO
  os: linux
  rootless: true
  uptime: 173h 49m 3.52s (Approximately 7.21 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/REDACTED/.config/containers/storage.conf
  ContainerStore:
    number: 1
  GraphDriverName: vfs
  GraphOptions: null
  GraphRoot: /home/REDACTED/.local/share/containers/storage
  GraphStatus: {}
  ImageStore:
    number: 118
  RunRoot: /run/user/1000
  VolumePath: /home/REDACTED/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

$ pacman -Qs podman
local/podman 1.5.1-1
    Tool and library for running OCI-based containers in pods
local/podman-docker 1.5.1-1
    Emulate Docker CLI using podman

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on laptop in systemd user mode.

$ systemctl --user enable io.podman.socket
$ systemctl --user start io.podman.socket
@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 21, 2019
@johanbrandhorst
Copy link
Author

This is worse than 1/10 times - in my recent testing it's over half the time. Often it will work fine for a bit and then fail over and over again for 5-6 times before working again. It is pretty readily reproducible.

@vrothberg
Copy link
Member

Hi @johanbrandhorst, thanks for opening the issue. I can confirm that varlink is having issue when being started via systemd. Similar (maybe identical?) to #4005.

@baude @jwhonce FYI

@johanbrandhorst
Copy link
Author

Thanks for the information @vrothberg, unfortunately I'm only interested in running podman varlink in user mode, so that's very disappointing. Is there a way to enable the socket for rootless accounts that doesn't go via systemd?

@vrothberg
Copy link
Member

@johanbrandhorst, you could run a varlink endpoint outside of the systemd context via podman varlink unix:$path and then connect to $path. I just did as follows:

$ podman varlink unix:/home/$(whoami)/podman.socket -t0
$ PODMAN_VARLINK_ADDRESS=unix:/home/$(whoami)/podman.socket podman-remote info
client:
...

@johanbrandhorst
Copy link
Author

Let me see if that removes the errors, thanks!

@johanbrandhorst
Copy link
Author

Error does seem to be related to systemd, I've been running with a local socket for a while and not seen the errors.

@johanbrandhorst johanbrandhorst changed the title Occasionally failing to start container over varlink interface Occasionally failing to start container over varlink with systemd Sep 23, 2019
@vrothberg
Copy link
Member

Thanks a lot for reporting, @johanbrandhorst! We're looking into the issue.

I will close this one as duplicate of #4005. Feel free to join the discussion there.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants