Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/etc/hosts becomes empty after checkpoint and restore #22901

Closed
Tianyang-Zhang opened this issue Jun 4, 2024 · 1 comment · Fixed by #23083
Closed

/etc/hosts becomes empty after checkpoint and restore #22901

Tianyang-Zhang opened this issue Jun 4, 2024 · 1 comment · Fixed by #23083
Assignees
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@Tianyang-Zhang
Copy link

Issue Description

Checkpoint then restore a container causes the /etc/hosts file become empty, which causes problem to localhost network. This issue seems to only happen in podman v5. I never hit such issue with v4 before.

The /etc/hosts before checkpoint:

[root@ip-172-31-24-203 ec2-user]# podman exec -it 1f9a7b1bf56f bash

root@1f9a7b1bf56f:/# cat /etc/hosts
127.0.0.1       localhost localhost.localdomain localhost4 localhost4.localdomain4
::1     localhost localhost.localdomain localhost6 localhost6.localdomain6
10.88.0.1       host.containers.internal host.docker.internal
10.88.0.6       1f9a7b1bf56f hopeful_babbage

After restore:

[root@ip-172-31-24-203 ec2-user]# podman exec -it 1f9a7b1bf56f bash
root@1f9a7b1bf56f:/# cat /etc/hosts
root@1f9a7b1bf56f:/#

I have tried both netavark and cni backend, and both have this issue. This issue can be easily reproduce use a simple python program(see below)

Host OS: CentOS Stream 9, Rocky Linux 9 (tried on both, and both have this issue)
Kernel: 5.14.0-391.el9.x86_64
EC2 instance: t2.2xlarge
conmon version: 2.1.0, commit: 8ef5de138efb6f0aad657082cdea22cf037792cb
crun version: 1.15 (also tried runc v1.1.12)
netavark version: 1.10.3

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create python script test.py:
import time

counter = 0

while True:
    print(counter)
    counter += 1
    time.sleep(1)
  1. Start container:
[root@ip-172-31-24-203 ec2-user]# podman run -d --volume $(pwd):$(pwd) python:3.9 python $(pwd)/test.py
65cc9268e2f2631a72d9be503c681d24840725c6b661132ee730d752fe358029
  1. Attach to container and confirm the content of /etc/hosts:
[root@ip-172-31-24-203 ec2-user]# podman ps
CONTAINER ID  IMAGE                         COMMAND               CREATED        STATUS        PORTS       NAMES
65cc9268e2f2  docker.io/library/python:3.9  python /home/ec2-...  3 seconds ago  Up 4 seconds              silly_darwin

[root@ip-172-31-24-203 ec2-user]# podman exec -it 65cc9268e2f2 bash

root@65cc9268e2f2:/# cat /etc/hosts
127.0.0.1       localhost localhost.localdomain localhost4 localhost4.localdomain4
::1     localhost localhost.localdomain localhost6 localhost6.localdomain6
10.88.0.1       host.containers.internal host.docker.internal
10.88.0.12      65cc9268e2f2 silly_darwin
  1. Checkpoint:
[root@ip-172-31-24-203 ec2-user]# podman container checkpoint 65cc9268e2f2
65cc9268e2f2
  1. Restore:
[root@ip-172-31-24-203 ec2-user]# podman container restore 65cc9268e2f2
65cc9268e2f2
  1. Attach to container and check the content of /etc/hosts, now its empty:
[root@ip-172-31-24-203 ec2-user]# podman exec -it 65cc9268e2f2 bash
root@65cc9268e2f2:/# cat /etc/hosts
root@65cc9268e2f2:/#

Describe the results you received

/etc/hosts file becomes empty after restore.

Describe the results you expected

/etc/hosts should remain the same as before the checkpoint.

podman info output

[root@ip-172-31-24-203 ec2-user]# podman info
host:
  arch: amd64
  buildahVersion: 1.35.3
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: 8ef5de138efb6f0aad657082cdea22cf037792cb'
  cpuUtilization:
    idlePercent: 98.98
    systemPercent: 0.26
    userPercent: 0.77
  cpus: 8
  databaseBackend: sqlite
  distribution:
    distribution: centos
    version: "9"
  eventLogger: journald
  freeLocks: 2048
  hostname: ip-172-31-24-203.us-east-2.compute.internal
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-391.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 31245967360
  memTotal: 33385242624
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.10.3-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.15-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /bin/pasta
    package: passt-0^20231204.gb86afe3-1.el9.x86_64
    version: |
      pasta 0^20231204.gb86afe3-1.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.2.0-2.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 0h 40m 35.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /mnt/float-image/containers/storage
  graphRootAllocated: 42937069568
  graphRootUsed: 12786778112
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /mnt/float-image/run/storage
  transientStore: false
  volumePath: /mnt/float-image/containers/storage/volumes
version:
  APIVersion: 5.0.2
  Built: 1715074917
  BuiltTime: Tue May  7 09:41:57 2024
  GitCommit: ""
  GoVersion: go1.22.2 (Red Hat 1.22.2-1.el9)
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.2

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

I'm using AWS EC2 instance(t2.2xlarge, and g4dn.4xlarge).

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@Tianyang-Zhang Tianyang-Zhang added the kind/bug Categorizes issue or PR as related to a bug. label Jun 4, 2024
@Tianyang-Zhang
Copy link
Author

Tianyang-Zhang commented Jun 4, 2024

The closest issue I found is this old one: #12003, not sure if it helps. Please let me know if anything else is needed, thanks!

@Luap99 Luap99 self-assigned this Jun 24, 2024
@Luap99 Luap99 added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Jun 24, 2024
mheon pushed a commit to mheon/libpod that referenced this issue Jul 10, 2024
The restore code path never called completeNetworkSetup() and this means
that hosts/resolv.conf files were not populated. This fix is simply to
call this function. There is a big catch here. Technically this is
suposed to be called after the container is created but before it is
started. There is no such thing for restore, the container runs right
away. This means that if we do the call afterwards there is a short
interval where the file is still empty. Thus I decided to call it
before which makes it not working with PostConfigureNetNS (userns) but
as this does not work anyway today so  I don't see it as problem.

Fixes containers#22901

Signed-off-by: Paul Holzinger <[email protected]>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants