Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with --mount=type=ssh erroring with error serving agent: read unix /tmp/blah/ssh_auth_sock->@: use of closed network connection #3587

Closed
jspc opened this issue Oct 13, 2021 · 13 comments · Fixed by #3631

Comments

@jspc
Copy link

jspc commented Oct 13, 2021

Description

Hello.

When building a container image as per:

$ podman build --ssh default .

From the Dockerfile:

FROM python:3.8.1

RUN mkdir -m 700 ${HOME}/.ssh && \
  touch -m 600 ${HOME}/.ssh/known_hosts &&  \
  ssh-keyscan github.com > ${HOME}/.ssh/known_hosts

RUN --mount=type=ssh,required=true ssh [email protected] ; sleep 11m

Results in the log line:

ERRO[0002] error serving agent: read unix /var/tmp/.buildah-ssh-sock1848960062/ssh_auth_sock->@: use of closed network connection 

(Or, of course, whichever directory ioutil.TempDir("", ".buildah-ssh-sock") creates).

I've put the sleep in there to try and diagnose further:

# The file exists, and is a socket which my user can access
/var/tmp $ ls -alrt .buildah-ssh-sock1848960062/ssh_auth_sock 
srw--w---- 1 jspc jspc 0 Oct 13 09:52 .buildah-ssh-sock1848960062/ssh_auth_sock

# The file is opened by the expected process
/var/tmp $ sudo lsof !$
COMMAND  PID USER   FD   TYPE             DEVICE SIZE/OFF  NODE NAME
conmon  1419 root   16u  unix 0xffff8ced428fcc80      0t0 28866 /proc/self/fd/15/attach type=SEQPACKET (LISTEN)

# Nothing appears to be wrong with the file on disk /shrug
/var/tmp $ stat !$
stat .buildah-ssh-sock1848960062/ssh_auth_sock
  File: .buildah-ssh-sock1848960062/ssh_auth_sock
  Size: 0               Blocks: 0          IO Block: 4096   socket
Device: 259,2   Inode: 272112      Links: 1
Access: (0620/srw--w----)  Uid: ( 1000/    jspc)   Gid: ( 1000/    jspc)
Access: 2021-10-13 09:52:00.665138756 +0100
Modify: 2021-10-13 09:52:00.665138756 +0100
Change: 2021-10-13 09:52:00.665138756 +0100
 Birth: 2021-10-13 09:52:00.665138756 +0100

# Similarly, trying to authenticate with this socket locally (not via buildah) is broken
/var/tmp $ SSH_AUTH_SOCK=!$ ssh [email protected]
SSH_AUTH_SOCK=.buildah-ssh-sock1848960062/ssh_auth_sock ssh [email protected]
sign_and_send_pubkey: signing failed for RSA "/home/jspc/.ssh/id_rsa" from agent: communication with agent failed
[email protected]: Permission denied (publickey).

When I run the bottommost command (SSH_AUTH_SOCK=.buildah-ssh-sock1848960062/ssh_auth_sock ssh [email protected] ) I get the same error message in my podman build window:

ERRO[0191] error serving agent: read unix /var/tmp/.buildah-ssh-sock1848960062/ssh_auth_sock->@: use of closed network connection 

Steps to reproduce the issue:

  1. Copy the above Dockerfile to disk
  2. Run podman build --ssh default .

Describe the results you received:

The error message use of closed network connection

Describe the results you expected:

PTY allocation request failed on channel 0
Hi jspc! You've successfully authenticated, but GitHub does not provide shell access.
Connection to github.com closed.

Output of rpm -q buildah or apt list buildah:

# pacman -Qs buildah
local/buildah 1.23.1-1
    A tool which facilitates building OCI images

Output of buildah version:

Version:         1.23.1
Go Version:      go1.17.1
Image Spec:      1.0.1-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        0.4.0
libcni Version:  v0.8.1
image Version:   5.16.0
Git Commit:      d9a41b85
Built:           Tue Sep 28 21:28:39 2021
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

Output of podman version if reporting a podman build issue:

Version:      3.4.0
API Version:  3.4.0
Go Version:   go1.17.1
Git Commit:   6e8de00bb224f9931d7402648f0177e7357ed079
Built:        Fri Oct  1 11:14:18 2021
OS/Arch:      linux/amd64

Output of cat /etc/*release:

NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux

Output of uname -a:

Linux hostname 5.14.8-arch1-1 #1 SMP PREEMPT Sun, 26 Sep 2021 19:36:15 +0000 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"

[storage.options]
additionalimagestores = [
]

[storage.options.overlay]
mountopt = "nodev"

[storage.options.thinpool]
@rhatdan
Copy link
Member

rhatdan commented Oct 13, 2021

@ashley-cui @flouthoc PTAL

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@vulturm
Copy link

vulturm commented Nov 15, 2021

Hi, I'm encountering the same issue.. podman 3.4.2

@rhatdan
Copy link
Member

rhatdan commented Nov 16, 2021

@flouthoc @ashley-cui any progress?

@flouthoc
Copy link
Collaborator

@rhatdan Ackd. Will check this

@flouthoc
Copy link
Collaborator

@jspc @vulturm Above PR should be able to relay close message from github.

@flouthoc
Copy link
Collaborator

flouthoc commented Nov 17, 2021

@jspc @vulturm
Note: ssh [email protected] would still exit with error code 1 and that is expected behavior and should happen same with docker.

You can use RUN --mount=type=ssh,required=true ssh [email protected]||true to suppress error code and still get the output.

@jspc
Copy link
Author

jspc commented Nov 17, 2021

@flouthoc fair point, but that's not the problem we're seeing, and is why I put the semicolon after the ssh git.. command in the Dockerfile.

Does your PR mean that agents work as expected now? Because if you look at the original issue body, the connection never even made it as far as github; the agent connection fails, which means the ssh command is unable to load the correct key from the agent, and as such the command fails. It's only evident by using ssh with a very verbose output.

In fact, when I discovered this issue it was trying to pip install from a private git repo which doesn't try to ssh into github at all.

@flouthoc
Copy link
Collaborator

@flouthoc fair point, but that's not the problem we're seeing, and is why I put the semicolon after the ssh git.. command in the Dockerfile.

Does your PR mean that agents work as expected now? Because if you look at the original issue body, the connection never even made it as far as github; the agent connection fails, which means the ssh command is unable to load the correct key from the agent, and as such the command fails. It's only evident by using ssh with a very verbose output.

In fact, when I discovered this issue it was trying to pip install from a private git repo which doesn't try to ssh into github at all.

@jspc Yes PR above PR fixes that.

@jspc
Copy link
Author

jspc commented Nov 17, 2021

Fair enough @flouthoc. I don't get how increasing the sleep duration stops agent connections failing, but I trust your work- and I just tried it myself.

Good work!

@flouthoc
Copy link
Collaborator

@jspc yeah it might look confusing. I have added in the description of the PR but i'll explain again.

call to AgentServe makes connection and performs i/o but only returns back in case of completion or error. But in-order to relay output we close connection here https://github.com/containers/buildah/blob/main/pkg/sshagent/sshagent.go#L114 which is ok in most of the cases but not when there is no output from the original connection in such cases our other goroutine closes connection too early causing the error so increasing timeout helps here.

Ideally we should context with a deadline and some sort of forwarding stream to relay output. But that seems too big of a change for this issue.

@rhatdan
Copy link
Member

rhatdan commented Nov 17, 2021

OK. It seems a little hacky, but moves us forward.

@szedani
Copy link

szedani commented Oct 30, 2022

seeing the same error when building an image with podman that uses ssh mount
ERRO[0188] error serving agent: read unix /var/tmp/.buildah-ssh-sock531862322/ssh_auth_sock->@: use of closed network connection
Im using podman v.4.2.1 which I think should include this fix

from the Dockerfile:

RUN mkdir -p /usr/local/src/
RUN mkdir -p ~/.ssh/ && ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts
RUN --mount=type=ssh git clone my_private_repo # fails

running fedora 36 on WSL and it works if i use docker to build
is there any solution? thanks!

host:
  arch: amd64
  buildahVersion: 1.27.0
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.4-3.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.4, commit: '
  cpuUtilization:
    idlePercent: 98.11
    systemPercent: 0.48
    userPercent: 1.41
  cpus: 8
  distribution:
    distribution: fedora
    variant: container
    version: "36"
  eventLogger: file
  hostname: szedani
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.15.68.1-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 7937961984
  memTotal: 16642961408
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.6-2.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.6
      commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /mnt/wslg/runtime-dir/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 4294692864
  swapTotal: 4294967296
  uptime: 0h 47m 55.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/szedani/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/szedani/.local/share/containers/storage
  graphRootAllocated: 1081101176832
  graphRootUsed: 6801805312
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 11
  runRoot: /mnt/wslg/runtime-dir/containers
  volumePath: /home/szedani/.local/share/containers/storage/volumes
version:
  APIVersion: 4.2.1
  Built: 1662580699
  BuiltTime: Wed Sep  7 22:58:19 2022
  GitCommit: ""
  GoVersion: go1.18.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.2.1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants