Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crun does not wait for tty to receive error message for short lived process #1524

Closed
saschagrunert opened this issue Aug 14, 2024 · 6 comments · Fixed by #1548
Closed

crun does not wait for tty to receive error message for short lived process #1524

saschagrunert opened this issue Aug 14, 2024 · 6 comments · Fixed by #1548

Comments

@saschagrunert
Copy link
Member

saschagrunert commented Aug 14, 2024

This CRI-O test fails: https://github.com/cri-o/cri-o/blob/73b6f0dbc6d/test/ctr.bats#L790-L799

@test "ctr create with non-existent command [tty]" {
	start_crio
	pod_id=$(crictl runp "$TESTDATA"/sandbox_config.json)


	jq '	  .command = ["nonexistent"]
		| .tty = true' \
		"$TESTDATA"/container_config.json > "$newconfig"
	run ! crictl create "$pod_id" "$newconfig" "$TESTDATA"/sandbox_config.json
	[[ "$output" == *"not found"* ]]
}

With the error (conmon):

running container: creating container failed: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF

From: https://github.com/cri-o/cri-o/blob/73b6f0dbc6da5c349aacb8393f0b33143325a694/internal/oci/runtime_oci.go#L298

journal:

Aug 14 11:03:01 nixos conmon[581872]: conmon fc2da7da74434dd90b18 <ndebug>: addr{sun_family=AF_UNIX, sun_path=/tmp/conmon-term.ATULS2}
Aug 14 11:03:01 nixos conmon[581872]: conmon fc2da7da74434dd90b18 <ndebug>: addr{sun_family=AF_UNIX, sun_path=/proc/self/fd/12/attach}
Aug 14 11:03:01 nixos conmon[581872]: conmon fc2da7da74434dd90b18 <ndebug>: terminal_ctrl_fd: 12
Aug 14 11:03:01 nixos conmon[581872]: conmon fc2da7da74434dd90b18 <ndebug>: winsz read side: 15, winsz write side: 16
Aug 14 11:03:01 nixos conmon[581872]: conmon fc2da7da74434dd90b18 <ndebug>: about to accept from console_socket_fd: 8
Aug 14 11:03:01 nixos conmon[581872]: conmon fc2da7da74434dd90b18 <ndebug>: about to recvfd from connfd: 11
Aug 14 11:03:01 nixos systemd[1]: Started libcrun container.
Aug 14 11:03:01 nixos systemd[1]: crio-fc2da7da74434dd90b184a55259911b5c78c4cda486b5e6cadedc7397433405c.scope: Deactivated successfully.
Aug 14 11:03:01 nixos systemd[1]: Stopped libcrun container.
Aug 14 11:03:01 nixos conmon[581872]: conmon fc2da7da74434dd90b18 <error>: Failed to receive console file descriptor Communication error on send
Aug 14 11:03:01 nixos systemd[1]: crio-conmon-fc2da7da74434dd90b184a55259911b5c78c4cda486b5e6cadedc7397433405c.scope: Deactivated successfully.
Aug 14 11:03:01 nixos systemd[1]: var-lib-containers-storage-overlay-88c1ee85d930e65a2959ae6d050ce96cc3050ab88cf7049178a9063de6ca036f-merged.mount: Deactivated successfully.

Something similar happens when using conmon-rs:

running container: creating container failed: rpc error: code = Unknown desc = create container: create result: internal/proto/conmon.capnp:Conmon.createContainer: Failed: read_all_with_timeout called before message_rx was registered

journal:

Aug 14 11:01:43 nixos conmonrs[576417]: DEBUG backend:create_container{container_id="564e2075a0f1287c854c5a076fdd924cf2b93898a2712a4e9b0854883b4f49e9" uuid="214dbed2-cc01-44cd-8365-e5045200b552"}:promise: conmonrs::terminal: 105: Waiting for terminal socket connection
Aug 14 11:01:43 nixos conmonrs[576417]: DEBUG backend:create_container{container_id="564e2075a0f1287c854c5a076fdd924cf2b93898a2712a4e9b0854883b4f49e9" uuid="214dbed2-cc01-44cd-8365-e5045200b552"}:listen: conmonrs::terminal: 196: Got terminal socket stream: PollEvented { io: Some(UnixStream { fd: FileDesc(OwnedFd { fd: 17 }), local: "/proc/self/fd/12/conmon-term-yOzHBVi.sock" (pathname), peer: (unnamed) }) }
Aug 14 11:01:43 nixos systemd[1]: Started libcrun container.
Aug 14 11:01:43 nixos systemd[1]: crio-564e2075a0f1287c854c5a076fdd924cf2b93898a2712a4e9b0854883b4f49e9.scope: Deactivated successfully.
Aug 14 11:01:43 nixos systemd[1]: Stopped libcrun container.
Aug 14 11:01:43 nixos conmonrs[576417]: DEBUG backend:create_container{container_id="564e2075a0f1287c854c5a076fdd924cf2b93898a2712a4e9b0854883b4f49e9" uuid="214dbed2-cc01-44cd-8365-e5045200b552"}:listen: conmonrs::terminal: 217: Removing socket path /tmp/conmon-term-yOzHBVi.sock
Aug 14 11:01:43 nixos conmonrs[576417]: DEBUG backend:create_container{container_id="564e2075a0f1287c854c5a076fdd924cf2b93898a2712a4e9b0854883b4f49e9" uuid="214dbed2-cc01-44cd-8365-e5045200b552"}:listen: conmonrs::terminal: 220: Shutting down receiver stream
Aug 14 11:01:43 nixos conmonrs[576417]: ERROR backend:create_container{container_id="564e2075a0f1287c854c5a076fdd924cf2b93898a2712a4e9b0854883b4f49e9" uuid="214dbed2-cc01-44cd-8365-e5045200b552"}:listen: conmonrs::terminal: 224: No file descriptor received
Aug 14 11:01:43 nixos conmonrs[576417]: ERROR backend:create_container{container_id="564e2075a0f1287c854c5a076fdd924cf2b93898a2712a4e9b0854883b4f49e9" uuid="214dbed2-cc01-44cd-8365-e5045200b552"}:listen: conmonrs::terminal: 86: Unable to listen on terminal: got no file descriptor
@saschagrunert saschagrunert changed the title crun does not wait for tty to receive error for short lived process crun does not wait for tty to receive error message for short lived process Aug 14, 2024
@saschagrunert
Copy link
Member Author

cc @giuseppe @kolyshkin

@giuseppe
Copy link
Member

giuseppe commented Sep 3, 2024

Does it happen on exec or create/start?

I am trying to understand where the race could happen, in both cases, but I don't see where it could be

@saschagrunert
Copy link
Member Author

Does it happen on exec or create/start?

Only on create wrt this issue.

@giuseppe
Copy link
Member

giuseppe commented Sep 3, 2024

hm.. the create itself does not start the container. The container runtime is expected to attach to the TTY before calling start.

  1. create the socket
  2. $RUNTIME create ...
  3. wait to receive the terminal fd on the socket
  4. $RUNTIME start ...

With these steps there is no way for a race, since the container payload itself is not running. Is conmon-rs doing anything differently?

@saschagrunert
Copy link
Member Author

@giuseppe the issue is that it works neither with conmon nor conmon-rs.

We would expect to get an error from the runtime like:

runc create failed: unable to start container process: exec: \"nonexistent\": executable file not found in $PATH"

runc sends the terminal file descriptor and then the error message over it. crun does not send the file descriptor and reports the error using the CLI.

@saschagrunert
Copy link
Member Author

Working on a fix now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants