Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port binding silently failing #7067

Closed
andrewgdunn opened this issue Jul 23, 2020 · 3 comments
Closed

Port binding silently failing #7067

andrewgdunn opened this issue Jul 23, 2020 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@andrewgdunn
Copy link

andrewgdunn commented Jul 23, 2020

/kind bug
(maybe?)

Trying to figure out if this is a failure within the container, or a podman/slirp4netns failure that is very hard to find. I'm running gitlab in a container with podman right now, I'm doing this via a systemd unit:

[root@citadel gitlab]# cat /etc/systemd/system/podman-gitlab.service 
[Unit]
Description=Podman running gitlab
Wants=network.target
After=network-online.target

[Service]
WorkingDirectory=/app/gitlab
User=gitlab
Group=gitlab
Restart=no
ExecStartPre=/usr/bin/rm -f %T/%N.pid %T/%N.cid
ExecStartPre=/usr/bin/podman rm --ignore -f gitlab
ExecStart=/usr/bin/podman run --conmon-pidfile %T/%N.pid --cidfile %T/%N.cid --cgroups=no-conmon \
  --replace --detach \
  --log-driver=journald \
  --log-opt tag=gitlab \
  --security-opt label=disable \
  --health-cmd=none \
  --health-interval=disable \
  --name=gitlab \
  --publish 8880:80 \
  --publish 2222:22 \
  --shm-size=8g \
  --volume /app/gitlab/config:/etc/gitlab \
  --volume /app/gitlab/logs:/var/log/gitlab \
  --volume /app/gitlab/data:/var/opt/gitlab \
  docker.io/gitlab/gitlab-ee:latest
ExecStop=/usr/bin/podman stop --ignore gitlab -t 10
ExecStopPost=/usr/bin/podman rm --ignore -f gitlab
ExecStopPost=/usr/bin/rm -f %T/%N.pid %T/%N.cid
KillMode=none
Type=forking

[Install]
WantedBy=multi-user.target

This comes up fine, the http is listening/available on 8880 and ssh is listening/available on 2222. This works for... sometimes minutes, hours, days... then I no longer am able to get http on 8880, for example from the host metal:

[root@citadel gitlab]# nc 127.0.0.1 2222
SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.10
^C
[root@citadel gitlab]# nc 127.0.0.1 8880
Ncat: Connection refused.

Getting internal to the container I can restart processes and whatnot, but have not been able to figure out how to get this state to recover. It seems like the only thing that "fixes" this is to restart the systemd unit, effectively destroying and re-instancing the container (as well as all the assistive sub systems like slirp4netns).

I'm really trying to figure out if I am having a podman/slirp4netns issue or if this is a gitlab issue. I've got several instances of gitlab and I'm only seeing this consistently on one host/instance (using the same host OS (Fedora 32) and the same version of podman (2.0.2) on all instances).

I've got a support license from gitlab to dig into this, but want to really know if I should be leaning on them hard or... am I inducing this error by being off the "golden docker" path of operation.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 23, 2020
@openshift-ci-robot
Copy link
Collaborator

@storrgie: The label(s) kind/(maybe?) cannot be applied, because the repository doesn't have them

In response to this:

/kind bug (maybe?)

Trying to figure out if this is a failure within the container, or a podman/slirp4netns failure that is very hard to find. I'm running gitlab in a container with podman right now, I'm doing this via a systemd unit:

[root@citadel gitlab]# cat /etc/systemd/system/podman-gitlab.service 
[Unit]
Description=Podman running gitlab
Wants=network.target
After=network-online.target

[Service]
WorkingDirectory=/app/gitlab
User=gitlab
Group=gitlab
Restart=no
ExecStartPre=/usr/bin/rm -f %T/%N.pid %T/%N.cid
ExecStartPre=/usr/bin/podman rm --ignore -f gitlab
ExecStart=/usr/bin/podman run --conmon-pidfile %T/%N.pid --cidfile %T/%N.cid --cgroups=no-conmon \
 --replace --detach \
 --log-driver=journald \
 --log-opt tag=gitlab \
 --security-opt label=disable \
 --health-cmd=none \
 --health-interval=disable \
 --name=gitlab \
 --publish 8880:80 \
 --publish 2222:22 \
 --shm-size=8g \
 --volume /app/gitlab/config:/etc/gitlab \
 --volume /app/gitlab/logs:/var/log/gitlab \
 --volume /app/gitlab/data:/var/opt/gitlab \
 docker.io/gitlab/gitlab-ee:latest
ExecStop=/usr/bin/podman stop --ignore gitlab -t 10
ExecStopPost=/usr/bin/podman rm --ignore -f gitlab
ExecStopPost=/usr/bin/rm -f %T/%N.pid %T/%N.cid
KillMode=none
Type=forking

[Install]
WantedBy=multi-user.target

This comes up fine, the http is listening/available on 8880 and ssh is listening/available on 2222. This works for... sometimes minutes, hours, days... then I no longer am able to get http on 8880, for example from the host metal:

Andrew G. Dunn Today at 10:23

Looking at the three log systems:

  • nginx/gitlab_error.log
  • gitlab-workhorse/current
  • puma/puma_stdout.log
  • puma/puma_stderr.log

Not seeing anything that is particularly enlightening.

I can still see ssh logging (where people are attempting random users), and I can see that the container is still listening on the ssh port but the http port is refusing connections:

[root@citadel gitlab]# nc 127.0.0.1 2222
SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.10
^C
[root@citadel gitlab]# nc 127.0.0.1 8880
Ncat: Connection refused.

Getting internal to the container I can restart processes and whatnot, but have not been able to figure out how to get this state to recover. It seems like the only thing that "fixes" this is to restart the systemd unit, effectively destroying and re-instancing the container (as well as all the assistive sub systems like slirp4netns).

I'm really trying to figure out if I am having a podman/slirp4netns issue or if this is a gitlab issue. I've got several instances of gitlab and I'm only seeing this consistently on one host/instance (using the same host OS (Fedora 32) and the same version of podman (2.0.2) on all instances).

I've got a support license from gitlab to dig into this, but want to really know if I should be leaning on them hard or... am I inducing this error by being off the "golden docker" path of operation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mheon
Copy link
Member

mheon commented Jul 23, 2020

@giuseppe @AkihiroSuda Any way we can get debug info out of slirp4netns here?

@AkihiroSuda
Copy link
Collaborator

Seems #7016

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants