Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API sometimes can't find networks created via CLI while API is running #11828

Closed
arctic-alpaca opened this issue Oct 1, 2021 · 10 comments · Fixed by #11846
Closed

API sometimes can't find networks created via CLI while API is running #11828

arctic-alpaca opened this issue Oct 1, 2021 · 10 comments · Fixed by #11846
Assignees
Labels
HTTP API Bug is in RESTful API In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@arctic-alpaca
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When running the API service, creating a network via CLI sometimes isn't reflected in the API.

Steps to reproduce the issue:

  1. podman system service unix:///home/`whoami`/testing.sock --log-level=debug --time=500

  2. podman network create testns && sleep 5 && curl -v -X DELETE --unix-socket /home/`whoami`/testing.sock http://d/v3.0.0/libpod/networks/testns

Describe the results you received:
{"cause":"network not found","message":"unable to find network with name or ID testns: network not found","response":404}
The network can't be found via API. The list networks endpoint also doesn't return the testns network:

alpaca@DESKTOP:~$ curl --unix-socket /home/`whoami`/testing.sock http://d/v3.0.0/libpod/networks/json
[{"name":"podman","id":"2f259bab93aaaaa2542ba43ef33eb990d0999ee1b9924b557b7be53c0b7a1bb9","driver":"bridge","network_interface":"cni-podman0","created":"2021-10-01T15:53:27.340335748+02:00","subnets":[{"subnet":"10.88.0.0/16","gateway":"10.88.0.1"}],"ipv6_enabled":false,"internal":false,"dns_enabled":false,"ipam_options":{"driver":"host-local"}}]

Describe the results you expected:
The testns network get's deleted

Additional information you deem important (e.g. issue happens only occasionally):
I'm writing a library wrapping the API, when running tests against the networks endpoints, most fail. Running each test seperate works for every test. When testing the issue with curl, I could reproduce it every time. Restarting the API allows to see the network with the list endpoint and to delete it via API.

Output of podman version:

Version:      4.0.0-dev
API Version:  4.0.0-dev
Go Version:   go1.15.9
Git Commit:   fedd9cc120cae6cc2df19d49ed916f35fc5f5d71
Built:        Fri Oct  1 15:33:42 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.23.0
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/local/libexec/podman/conmon
    version: 'conmon version 2.0.31-dev, commit: 8d4355071e51ba4b597fc7ccb410808da73e54d5'
  cpus: 12
  distribution:
    codename: bullseye
    distribution: debian
    version: "11"
  eventLogger: file
  hostname: DESKTOP
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.10.16.3-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 6488829952
  memTotal: 13343838208
  ociRuntime:
    name: runc
    package: 'runc: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0~rc93+ds1
      commit: 1.0.0~rc93+ds1-5+b2
      spec: 1.0.2-dev
      go: go1.15.9
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /tmp/podman-run-1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: ""
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 4294967296
  swapTotal: 4294967296
  uptime: 7h 13m 52.39s (Approximately 0.29 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/alpaca/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/alpaca/.local/share/containers/storage
  graphStatus: {}
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /tmp/podman-run-1000/containers
  volumePath: /home/alpaca/.local/share/containers/storage/volumes
version:
  APIVersion: 4.0.0-dev
  Built: 1633095222
  BuiltTime: Fri Oct  1 15:33:42 2021
  GitCommit: fedd9cc120cae6cc2df19d49ed916f35fc5f5d71
  GoVersion: go1.15.9
  OsArch: linux/amd64
  Version: 4.0.0-dev

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes, using self built latest Github dev version.

Additional environment details (AWS, VirtualBox, physical, etc.):
Windows 10 WSL2, Debian Bullseye

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 1, 2021
@Luap99
Copy link
Member

Luap99 commented Oct 1, 2021

This is expected in the current implementation. I personally think this is a rare use case.

To fix this we would have to use inotify which causes problem for rootless users or read the config files everytime you do a network create/ls/rm, etc...
To provide the best performance I decided against it but I am open for suggestions.

@mheon @baude @rhatdan WDYT?

@mheon
Copy link
Member

mheon commented Oct 1, 2021

How are we deciding when to reload the list of available networks now - every time the API endpoints are hit, or less frequent? It's probably inexpensive enough to do it frequently if that will resolve this without inotify.

@Luap99
Copy link
Member

Luap99 commented Oct 1, 2021

We are never reloading once they are loaded. Only networks that are created and removed via api are updated.

@arctic-alpaca
Copy link
Contributor Author

I'm using the CLI to test my API wrapper implementation so I don't have to rely on parts of my code to test other parts. If this would be possible with a flag or something similar, this would be awesome.

@Luap99
Copy link
Member

Luap99 commented Oct 1, 2021

Idea, stat the config dir and store the the mtime, if it changed we reload. I think that should work.

@jwhonce
Copy link
Member

jwhonce commented Oct 1, 2021

Implementation Detail: We could add a middleware handler to the necessary API endpoints to reload the list. This could work well with #11828 (comment)

Current middleware handlers that we use log trace data, log panics, support X-Reference-Id headers.

@jwhonce jwhonce added the HTTP API Bug is in RESTful API label Oct 1, 2021
jwhonce added a commit to jwhonce/podman that referenced this issue Oct 1, 2021
API middleware handler to update memory list of networks if
Config.Network.NetworkConfigDir is newer.

Fixes containers#11828

Signed-off-by: Jhon Honce <[email protected]>
@jwhonce
Copy link
Member

jwhonce commented Oct 1, 2021

@Luap99 9f9ec93 is a POC. Assuming the new networking stack can be extended for that code to call a NetworkListRefresh() function.

@Luap99
Copy link
Member

Luap99 commented Oct 2, 2021

@jwhonce I would prefer to handle this inside the network interface. There is already logic which loads the networks on the first call. This way we only need to check if the network interface is actually used.

@mheon
Copy link
Member

mheon commented Oct 2, 2021

I'm still wondering how costly a full refresh is; if it's cheap, there's not much reason not to do it every time.

@Luap99
Copy link
Member

Luap99 commented Oct 4, 2021

Well we need to read and unmarshal every config file in the directory so it is definitely costly. Keep in mind that this process is locked.

@Luap99 Luap99 self-assigned this Oct 4, 2021
@Luap99 Luap99 added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Oct 4, 2021
Luap99 added a commit to Luap99/libpod that referenced this issue Oct 4, 2021
The current implementation of the CNI network interface only loads the
networks on the first call and saves them in a map. This is done to safe
performance and not having to reload all configs every time which will be
costly for many networks.

The problem with this approach is that if a network is created by
another process it will not be picked up by the already running podman
process. This is not a problem for the short lived podman commands but
it is problematic for the podman service.

To make sure we always have the actual networks store the mtime of the
config directory. If it changed since the last read we have to read
again.

Fixes containers#11828

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
HTTP API Bug is in RESTful API In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants