Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman can crash during standard stream I/O #685

Closed
apyrgio opened this issue Jan 23, 2024 · 13 comments
Closed

Podman can crash during standard stream I/O #685

apyrgio opened this issue Jan 23, 2024 · 13 comments
Labels
bug Something isn't working container P:linux

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Jan 23, 2024

While working on issue #443 and PR #627, we have seen the following weird behavior (#627 (comment) and #627 (comment)):

  1. The client (host) may get stuck while reading page data from the container's stdout, and time out at some point.
  2. The client (host) may return immediately with a successful error code, without even getting all the pixels back.

The platforms where this behavior is consistent are:

  • Ubuntu Focal (we see mostly timeouts there)
  • Ubuntu Jammy (we see mostly abrupt returns)
  • Debian Bullseye (we see mostly abrupt returns)

Specifically for the abrupt return bug, it seems that on the client side read() gets interrupted abruptly, and thus we raise ConverterProcException(). Then we try to find the reason for the exception, and the exit code is actually 0! In that case, the Podman process does not exist, and the container is deleted as well.

Interestingly, with ps we see that the underlying conversion process still runs. With journalctl -f, we see that a conmon component attempts to print binary data to the journal.

@apyrgio apyrgio added bug Something isn't working P:linux container labels Jan 23, 2024
@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 23, 2024

Looking more into it, we are now sure that when a container sends large amounts of data through its standard streams, faster than the client can consume, conmon may receive EAGAIN / EWOULDBLOCK. Due to a buggy implementation introduced in v2.0.23 and fixed in v2.0.26, this would make conmon drop the connection (see containers/conmon#236).

The affected platforms and respective conmon versions are:

  • Ubuntu Focal (conmon v2.0.9, released in 2020-02-11)
  • Ubuntu Jammy / Debian Bullseye (conmon v2.0.25, released in 2021-01-21)

For Ubuntu Focal, things are a bit more complicated. While conmon is available through the Focal repo, Podman is not. Instead, we ask our uses to install Podman via a repo that OpenSUSE maintains (see https://github.com/freedomofpress/dangerzone/blob/main/INSTALL.md#ubuntu-debian), which installs conmon v2.1.2. However, this particular combination is bitten by the timeout bug we alluded to at the start of the issue. Basically, if that conmon version attempts to log data into the system's journal (see --log-driver option), it will hang.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 23, 2024

The timeout issue that affects Ubuntu Focal is easily bypassed. We can just use --log-driver none when running Podman. This option exists in Docker as well, so it's worth using it in Windows and macOS too. After all, our standard streams will no longer have anything log-worthy.

The abrupt return issue though cannot be bypassed, to the best of my knowledge. The code path that erroneously sets the conmon sockets to non-blocking is not affected by any meaningful option (see call path below):

  1. https://github.com/containers/conmon/blob/05ce716ac6d1cfeeb27b9280832abd2e9d1a085f/src/conmon.c#L189
  2. https://github.com/containers/conmon/blob/05ce716ac6d1cfeeb27b9280832abd2e9d1a085f/src/conn_sock.c#L120
  3. https://github.com/containers/conmon/blob/05ce716ac6d1cfeeb27b9280832abd2e9d1a085f/src/conn_sock.c#L283

The opt_bundle_path that the code refers to is basically a container layer in the filesystem, so it's a critical option.

Therefore, we have two options:

  1. Add a conmon > 2.0.25 dependency in our .deb package. This will break existing Ubuntu Jammy / Debian Bullseye users, but we can ask them to install Podman through the OpenSUSE repo, same as we do for Ubuntu Focal.
  2. Backport conmon, either from Debian Bookworm, or from the OpenSUSE repo, into our FPF repo. This way, users will install a more recent conmon version implicitly, without any interaction. On the flip side, we have to monitor the source repo for updates.

@deeplow
Copy link
Contributor

deeplow commented Jan 23, 2024

Thanks a lot for this investigation and findings!

Backport conmon, either from Debian Bookworm, or from the OpenSUSE repo, into our FPF repo. This way, users will install a more recent conmon version implicitly, without any interaction. On the flip side, we have to monitor the source repo for updates.

This option sounds interesting. However, won't the podman be dependent on a specific conmon version already provided by the same repo? Because if this is the case we'd need to provide all the other dependencies in our repo such that our podman "take over" the user's distro one. However if the user uses podman for something else, this could lead to unintended consequences. And we'd also need to re-sign / rebuild these repos, which takes some maintenance work.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 23, 2024

No, it's actually dependent on conmon (>= 2.0.18~) as you can see from apt-cache show podman. This means that a newer version tramps the one offered by the repos.

But yes, you're right about a) possibly unintended consequences, and b) resigning the packages. I don't like that as well.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 24, 2024

There's another option that we can consider: Package conmon v2.0.26 and include it in our Debian Bullseye / Ubuntu Jammy repo. This has the following benefits:

  1. It's the exact next version after the one that is shipped in Debian Bullseye / Ubuntu Jammy.
  2. It offers a one-line fix for the faulty code, so there's no compatibility risk for users who depend on v2.0.25.
  3. There's no need to maintain this version forever:
    • If the upstream decides to release a newer conmon version, they will offer v2.0.26+, which covers our use case.
    • If the upstream decides to add a patch to the existing v2.0.25 version, things are a bit trickier. We can possibly craft a package version that sits between the one that is currently offered and the next patch level, to make sure that the package will get upgraded.
  4. The users don't need to setup and trust yet another repo.
  5. Existing users won't have their installation break unexpectedly.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 24, 2024

Sigh 😮‍💨

While building the package, I just realized that this is a known bug in Debian Bullseye. The maintainers of the package are doing the same thing we are describing in this issue (#685 (comment)). They have backported the fix from v2.0.26 into v2.0.25, and have created a new package called v2.0.25+ds1-1.1+deb11u1. This package has been accepted as a "proposed update" for Debian Bullseye in Sun, 12 Nov 2023.

The proposed package has not landed yet in Debian Bullseye, nor Ubuntu Jammy. We can build a Debian package out of it though and name it 2.0.25+ds1-1.1+deb11u1~fpf1. This way, I believe that the version order will be:

2.0.25+ds1-1.1 < 2.0.25+ds1-1.1+deb11u1~fpf1 < 2.0.25+ds1-1.1+deb11u1

where:

  • Current package version: 2.0.25+ds1-1.1 (Debian Bullseye / Ubuntu Jammy)
  • Proposed package version by FPF: 2.0.25+ds1-1.1+deb11u1~fpf1 (tildes lose from empty strings in Debian versions, see here)
  • Proposed package version by Debian maintainers: 2.0.25+ds1-1.1+deb11u1

Useful Links


Edit: The original suggestion was to use Debian version 1.1+deb11u1 but this sorts higher than 1.1ubuntu<N>, which is customary for Ubuntu. Since Debian Bullseye will have a patched conmon package soon, we can focus on just the version for Ubuntu Jammy. We prefer 2.0.25+ds1-1.1a~fpf1, which sorts higher than the existing package, but lower than anything else:

2.0.25+ds1-1.1 < 2.0.25+ds1-1.1a~fpf1 < 2.0.25+ds1-1.1ubuntu<N> < 2.0.25+ds1-1.1+deb11u1

where:

  • Current package version: 2.0.25+ds1-1.1
  • Proposed package version by FPF: 2.0.25+ds1-1.1a~fpf1 (tildes lose from empty strings in Debian versions, see here)
  • Potential package version by Ubuntu maintainers: 2.0.25+ds1-1.1ubuntu
  • Proposed package version by Debian maintainers: 2.0.25+ds1-1.1+deb11u1

@deeplow
Copy link
Contributor

deeplow commented Jan 24, 2024

Oh wow. What a timing... and this is an issue that has existed for more than a year. How convenient!
I think you had a good hunch about that curiously named package in our call earlier today.

So this is actually good for us, no? We can use debian's package and just rename it (not sure if it uses the name inside the package or not) and re-sign it.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 24, 2024

It adds a bit more confidence in what we're doing, so yeah, it's a good thing. I'll grab their source package, bump the changelog accordingly (the Debian version is taken from the changelog), and build it in an Ubuntu Jammy and Debian Bullseye container.

Edit: they don't offer a .deb yet, if that's what you're asking @deeplow. Ouch, I stand corrected, see below.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 29, 2024

I stand corrected: Debian has some repos called proposed-updates. For instance, there is an oldstable-proposed-updates repo, which users can enable to install just conmon:

/etc/apt/sources.list.d/oldstable-pu.list

deb http://deb.debian.org/debian/ oldstable-proposed-updates main

/etc/apt/preferences.d/oldstable-pu.pref

Package: *
Pin: release a=proposed-updates
Pin-Priority: 100

Package: conmon
Pin: release a=proposed-updates
Pin-Priority: 500

And if they run apt-cache policy conmon, they will see that the candidate package to be installed is the new one (2.0.25+ds1-1.1+deb11u1):

  Installed: (none)
  Candidate: 2.0.25+ds1-1.1+deb11u1
  Version table:
     2.0.25+ds1-1.1+deb11u1 500
        100 http://deb.debian.org/debian oldstable-proposed-updates/main amd64 Packages
     2.0.25+ds1-1.1 500
        500 http://deb.debian.org/debian bullseye/main amd64 Packages

whereas for Podman, it's the old one, because we have lowered the priority of the proposed updates repo:

  Installed: (none)
  Candidate: 3.0.1+dfsg1-3+deb11u4
  Version table:
     3.0.1+dfsg1-3+deb11u5 100
        100 http://deb.debian.org/debian oldstable-proposed-updates/main amd64 Packages
     3.0.1+dfsg1-3+deb11u4 500
        500 http://deb.debian.org/debian bullseye/main amd64 Packages

@deeplow
Copy link
Contributor

deeplow commented Jan 29, 2024

Nice. That means we can now see if our CI is fixed on #627.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jan 29, 2024

I've downloaded conmon from the oldstable-proposed-updates repo, and tested it in Ubuntu Jammy and Debian Bullseye. It seems to work!

I have a reservation regarding how this package was built. It was built in Debian Bullseye (proposed updates) which has:

  • libc6=2.31-13+deb11u7
  • libglib2.0-0=2.66.8-1+deb11u1
  • libsystemd0=247.3-7+deb11u4

We plan to install it in Ubuntu Jammy and Debian Bullseye though, which have:

Package Jammy Bullseye
libc6 2.35-0ubuntu3.6 2.31-13+deb11u7
libglib2.0-0 2.72.4-0ubuntu2.2 2.66.8-1
libsystemd0 249.11-0ubuntu3.12 247.3-7+deb11u4

As one can see, for Debian Bullseye, aside for a missing patch in libc6 / libglib2.0-0, there doesn't seem to be a compatibility issue. For Ubuntu Jammy, the library versions are higher than the ones the package was build for. For reference, this is the minimum version of the libraries that conmon depends on:

Depends: libc6 (>= 2.17), libglib2.0-0 (>= 2.35.8), libsystemd0

Since libc6 and libglib2.0-0 are libraries that take backwards compatibility seriously, I believe that using a conmon package that was linked with a previous version of these libraries should work. However, to be 100% safe, we can build a separate package, just for Ubuntu Jammy, and tag it as 2.0.25+ds1-1.1a~fpf1, for reasons I mentioned above.

@apyrgio
Copy link
Contributor Author

apyrgio commented Feb 5, 2024

@legoktm pointed out that the Debian 11.9 release will happen on February 10th, so it may not be strictly necessary to include the .deb in our Debian Bullseye repo.

apyrgio added a commit that referenced this issue Feb 7, 2024
Switching from mounting files to writing to stdout has introduced some
Podman crashes in specific environments (Ubuntu Jammy / Debian Bullseye)
due to a conmon bug that affects version 2.0.25.

Fixing it for various permutations of the environments we support
requires the following:

1. CI tests: Install conmon from the oldstable-proposed-updates in
   our Debian Bullseye / Ubuntu Jammy dev/end-user environments.
2. Developers: Add a line in BUILD.md that suggests users to install
   conmon from the oldstable-proposed-updates repo, or some other repo
   they prefer.
3. End-user installations: We will build conmon for Ubuntu Jammy, and
   wait until the proposed updates repo gets merged in Debian Bullseye.

Fixes #685
@deeplow deeplow closed this as completed in d1afe4c Feb 7, 2024
apyrgio added a commit to freedomofpress/maint-dangerzone-conmon that referenced this issue Feb 12, 2024
Update the Debian changelog with version 2.0.25+ds1-1.1a~fpf1, which is
greater than the existing Ubuntu Jammy version (2.0.25+ds1-1.1) but
sorts lower than anything else.

Refs freedomofpress/dangerzone#685
apyrgio added a commit to freedomofpress/maint-dangerzone-conmon that referenced this issue Feb 12, 2024
Add instructions on how to build an Ubuntu Jammy package for `conmon`.
Also, add an explanation of why does FPF needs to create an Ubuntu Jammy
package for this project

Refs freedomofpress/dangerzone#685
apyrgio added a commit to apyrgio/apt-tools-prod that referenced this issue Feb 12, 2024
Add a conmon package in Ubuntu Jammy, that fixes an issue that is
present only in that version.

Refs freedomofpress/dangerzone#685
@apyrgio
Copy link
Contributor Author

apyrgio commented Feb 13, 2024

Note that the patched conmon package is now available in Debian Bullseye from the official repos. We have also asked upstream maintainers of Ubuntu Jammy to include it before the next point release. In the meantime, we offer it to our Ubuntu Jammy users via our apt-tools-prod repo.

apyrgio added a commit to freedomofpress/maint-dangerzone-conmon that referenced this issue Feb 13, 2024
Add instructions on how to build an Ubuntu Jammy package for `conmon`.
Also, add an explanation of why does FPF needs to create an Ubuntu Jammy
package for this project

Refs freedomofpress/dangerzone#685
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working container P:linux
Projects
None yet
Development

No branches or pull requests

2 participants