Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUPS segfault when adding many ipp-printers fast #466

Closed
SwissGipfel opened this issue Aug 29, 2022 · 10 comments
Closed

CUPS segfault when adding many ipp-printers fast #466

SwissGipfel opened this issue Aug 29, 2022 · 10 comments
Labels
bug Something isn't working priority-medium

Comments

@SwissGipfel
Copy link

SwissGipfel commented Aug 29, 2022

I'm running cups in a Kubernetes cluster with a with a sidecar that adds and removes printers based on a database.
This worked very well until I've started to add printers via ipp.

The sidecar adds printer roughly:

  • ~470 printer with lpadmin -p <printer-name> -E -v socket://<printer-ip>:9100
  • ~250 printers with lpadmin -p <printer-name> -E -v ipp://<printer-ip> -m everywhere

These printers will be added at the start of my cups container and the container crashes with roughly 500 printers added - each time with a segfault like this one:

[ 5738.759503] cupsd[11876]: segfault at 7fbb7b47a0e5 ip 00007fbea4cd3ae2 sp 00007fbe41ff7018 error 4 in libc.so.6[7fbea4b63000+195000]
[ 5738.759511] Code: ff ff e9 41 bd f2 ff 90 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 c5 fe 6f 0f <c5> f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 74 7a f3 0f
[ 5738.759514] potentially unexpected fatal signal 11.
[...]
[ 5738.759521] RIP: 0033:0x7fbea4cd3ae2
[ 5738.759522] Code: ff ff e9 41 bd f2 ff 90 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 c5 fe 6f 0f <c5> f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 74 7a f3 0f
[ 5738.759525] RSP: 002b:00007fbe41ff7018 EFLAGS: 00010287
[ 5738.759527] RAX: 00000000000003fd RBX: 0000557450719f60 RCX: 000055744fdee8a0
[ 5738.759528] RDX: 0000000000000000 RSI: 00007fbb7b47a0e5 RDI: 000055745037133c
[ 5738.759530] RBP: 00007fbe41ff70e0 R08: 0000000000000000 R09: 00007fbe2c03843c
[ 5738.759531] R10: 0000000000000007 R11: 00007fbe2c03843c R12: 0000557450371330
[ 5738.759532] R13: 00007fbb7b47a0d9 R14: 0000557450366db0 R15: 00007fbe41ff70e0
[ 5738.759534] FS:  00007fbe41ffb640 GS:  0000000000000000
[ 5738.759699] cupsd[11850]: segfault at 7fbb7b47a0e5 ip 00007fbea4cd3ae2 sp 00007fbe797f6158 error 4 in libc.so.6[7fbea4b63000+195000]
[ 5738.759704] Code: ff ff e9 41 bd f2 ff 90 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 c5 fe 6f 0f <c5> f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 74 7a f3 0f

I can reproduce the same behavior, if I only add my ~250 ipp printers - again with a segfault like:

[ 6125.316852] cupsd[15545]: segfault at c ip 00007ffa2f082450 sp 00007ff98e7d81a8 error 4 in libcups.so.2[7ffa2f05c000+59000]
[ 6125.316868] Code: c7 02 48 83 c6 02 88 47 fe 0f b6 46 fe 88 47 ff 48 39 fa 75 e5 c3 0f 1f 40 00 f3 0f 1e fa eb 3b 66 2e 0f 1f 84 00 00 00 00 00 <0f> be 16 84 d2 74 39 44 8d 40 bf 8d 48 20 41 83 f8 1a 44 8d 42 bf
[ 6125.316874] potentially unexpected fatal signal 11.
[...]
[ 6125.316891] RIP: 0033:0x7ffa2f082450
[ 6125.316897] Code: c7 02 48 83 c6 02 88 47 fe 0f b6 46 fe 88 47 ff 48 39 fa 75 e5 c3 0f 1f 40 00 f3 0f 1e fa eb 3b 66 2e 0f 1f 84 00 00 00 00 00 <0f> be 16 84 d2 74 39 44 8d 40 bf 8d 48 20 41 83 f8 1a 44 8d 42 bf
[ 6125.316905] RSP: 002b:00007ff98e7d81a8 EFLAGS: 00010202
[ 6125.316911] RAX: 0000000000000070 RBX: 00007ff98e7d8710 RCX: 000055aaf3241d30
[ 6125.316916] RDX: 0000000000000000 RSI: 000000000000000c RDI: 00007ff98e7d871c
[ 6125.316920] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff98e7d8710
[ 6125.316925] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000037e
[ 6125.316929] R13: 000000000000037e R14: 000055aaf47db7f0 R15: 00007ff98e7d8710
[ 6125.316934] FS:  00007ff98e7dc640 GS:  0000000000000000

My current workaround is to wait 2 seconds each time i generate the command lpadmin -p <printer-name> -E -v ipp://<printer-ip> -m everywhere. With this sleep, cups doesn't crash consistently but the container also needs about 8.5 minutes to start up.

I'm using cups with an Ubuntu:22.04 base image. The whole container looks like this:

FROM ubuntu:22.04
# Install basic tools, cups and a minimal set of drivers
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update >/dev/null \
&& apt-get install -y \
    sudo \
    whois \
    libcap2-bin=1:2.44-1build3 >/dev/null \
    cups=2.4.1op1-1ubuntu4.1 \
    cups-client=2.4.1op1-1ubuntu4.1 \
    cups-bsd=2.4.1op1-1ubuntu4.1 \
    cups-filters=1.28.15-0ubuntu1 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

EXPOSE 631

# Add user and disable sudo password checking
RUN useradd \
    --groups=sudo,lp,lpadmin \
    --create-home \
    --home-dir=/home/print \
    --shell=/bin/bash \
    --password=$(mkpasswd print) \
    print

# A user with this UID also exists in the cups-configurator container. Adding it here to the lpadmin group is required for authorization to cups
RUN adduser --system -uid 1005 --shell /bin/bash --home /home/networkmaster --disabled-password networkmaster && usermod networkmaster -aG lpadmin

# Change owner of some directories to run as non-root
RUN chown -R print:lp /var/log/cups \
&& chown -R print:lp /var/spool/cups \
&& chown -R print:lp /var/cache/cups \
&& chown -R print:lp /run/ \
&& chown -R print:lp /etc/cups/ \
&& setcap 'cap_net_bind_service=+ep' /usr/sbin/cupsd

COPY --chown=print:lp cupsd.conf /etc/cups/cupsd.conf
COPY --chown=print:lp cups-files.conf /etc/cups/cups-files.conf

USER print

CMD ["/usr/sbin/cupsd", "-f"]

(I also tried it without the version pinning, with the same result)
The whole cups container has a limit of 500Mi memory and 500m CPU - but it never peaks as far as I can see.

Is this a bug or an expected behavior?
Do you have an idea how I can debug this further?

@SwissGipfel
Copy link
Author

I can give you a little more information about my environment:

The 250 printers I try to connect via IPP are a mix of different models:

  • HP LaserJet M506
  • HP LaserJet M507
  • HP Color LaserJet Pro MFP M479fdw
  • HP LaserJet Pro M501dn
  • HP Color LaserJet MFP M477fdw
  • HP ColorLaserJet MFP M282-M285
  • HP Color LaserJet Pro M478f-9f
  • HP LaserJet MFP M430
  • HP Color LaserJet FlowMFP M776
  • HP Color LaserJet MFP M480
  • HP Color LaserJet M652
  • HP Color LaserJet MFP M277n

I'd have some OKI and Kyocera printers as well. But for the sake of debugging, I excluded them from IPP for the moment.
Unfortunately, I don't have any information about the firmware versions of each printer model, at the moment.

In the meantime, I also noticed, that the container does not only crash when adding this many printers at the start. It also crashes with the "2 Seconds" workaround, it just does so later.

At this point I'm quite at a loss. I don't know cups cannot handle this many IPP connections or if I made a mistake while setting cups up with IPP.

@michaelrsweet
Copy link
Member

@SwissGipfel There are no limitations on the number of queues built into CUPS. But since every queue you are adding is using the IPP Everywhere model you might be running into a thread or stack limit.

What we need is a debug CUPS error_log ("cupsctl --debug-logging" before you start adding printers) as well as a backtrace from the core dump.

@zdohnal zdohnal added the waiting for reporter There are data requested from the reporter label Aug 31, 2022
@SwissGipfel
Copy link
Author

Thank you for your response.
You'll find the debug-logs of my latest 2 tries attached. One with log level "debug" and one with log level "debug2".

logs-level-debug.zip
logs-level-debug2.zip

In my last two tries, i switched my base image to https://github.com/olbat/dockerfiles/blob/master/cupsd/Dockerfile based on debian:testing to get the newest CUPS release.

I'm still figuring out how I get the core dump done on my local development environment. As soon as I got it, I'll post it here.

@SwissGipfel
Copy link
Author

I'm not sure if these are the correct files you were looking for.

Here is the core dump together with a full backtrace (generated with thread apply all bt full)
core-dump-backtrace.zip

and the debug log file of the same run
logs-debug.zip

For this run, i used the following Dockerfile

FROM ubuntu:22.04
# Install basic tools, cups and a minimal set of drivers
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update >/dev/null && apt-get install -y lsb-release && apt install ubuntu-dbgsym-keyring
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN apt-get update >/dev/null \
&& apt-get install -y \
    sudo \
    whois \
    gdb \
    libcap2-bin=1:2.44-1build3 >/dev/null \
    cups=2.4.1op1-1ubuntu4.1 \
    cups-client=2.4.1op1-1ubuntu4.1 \
    cups-bsd=2.4.1op1-1ubuntu4.1 \
    cups-filters=1.28.15-0ubuntu1 \
    cups-dbgsym \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

EXPOSE 631

# Add user and disable sudo password checking
RUN useradd \
    --groups=sudo,lp,lpadmin \
    --create-home \
    --home-dir=/home/print \
    --shell=/bin/bash \
    --password=$(mkpasswd print) \
    print

# A user with this UID also exists in the cups-configurator container. Adding it here to the lpadmin group is required for authorization to cups
RUN adduser --system -uid 1005 --shell /bin/bash --home /home/networkmaster --disabled-password networkmaster && usermod networkmaster -aG lpadmin

# Change owner of some directories to run as non-root
RUN chown -R print:lp /var/log/cups \
&& chown -R print:lp /var/spool/cups \
&& chown -R print:lp /var/cache/cups \
&& chown -R print:lp /run/ \
&& chown -R print:lp /etc/cups/ \
&& setcap 'cap_net_bind_service=+ep' /usr/sbin/cupsd

COPY cupsd.conf /etc/cups/cupsd.conf
COPY cups-files.conf /etc/cups/cups-files.conf

#USER print

CMD ["/usr/sbin/cupsd", "-f"]

@zdohnal
Copy link
Member

zdohnal commented Aug 31, 2022

@SwissGipfel thank you for the files! Unfortunately the debuginfo packages were missing, so the backtrace is too generic and I don't see where the crash happened either. Can you get debuginfo packages (at least for glibc, cupsd and libcups) into the container, reproduce the issue and get a new backtrace?

But it probably will be some internal thread limit we hit - the last thread in the backtrace doesn't seem to go into libcups at all (unless there is some libcups function under ?? unknown functions...), but the full backtrace for all threads with debuginfo package should say more.

@SwissGipfel
Copy link
Author

Here is the backtrace with the additional debuginfo packages and the corresponding logs
core-dump-backtrace.zip
logs.zip

I used this modified Dockerfile to generate this batch:

FROM ubuntu:22.04
# Install basic tools, cups and a minimal set of drivers
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update >/dev/null && apt-get install -y lsb-release && apt install ubuntu-dbgsym-keyring
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN apt-get update >/dev/null
RUN apt-get install --allow-unauthenticated -y libglib2.0-0-dbgsym=2.72.1-1 libglib2.0-0=2.72.1-1 \
&& apt-get install -y \
    sudo \
    whois \
    gdb \
    libcap2-bin=1:2.44-1build3 >/dev/null \
    cups=2.4.1op1-1ubuntu4.1 \
    cups-client=2.4.1op1-1ubuntu4.1 \
    cups-bsd=2.4.1op1-1ubuntu4.1 \
    cups-filters=1.28.15-0ubuntu1 \
    cups-dbgsym \
    libcups2-dbgsym \
    libcupsfilters1-dbgsym \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

EXPOSE 631

# Add user and disable sudo password checking
RUN useradd \
    --groups=sudo,lp,lpadmin \
    --create-home \
    --home-dir=/home/print \
    --shell=/bin/bash \
    --password=$(mkpasswd print) \
    print

# A user with this UID also exists in the cups-configurator container. Adding it here to the lpadmin group is required for authorization to cups
RUN adduser --system -uid 1005 --shell /bin/bash --home /home/networkmaster --disabled-password networkmaster && usermod networkmaster -aG lpadmin

# Change owner of some directories to run as non-root
RUN chown -R print:lp /var/log/cups \
&& chown -R print:lp /var/spool/cups \
&& chown -R print:lp /var/cache/cups \
&& chown -R print:lp /run/ \
&& chown -R print:lp /etc/cups/ \
&& setcap 'cap_net_bind_service=+ep' /usr/sbin/cupsd

COPY cupsd.conf /etc/cups/cupsd.conf
COPY cups-files.conf /etc/cups/cups-files.conf

#USER print

CMD ["/usr/sbin/cupsd", "-f"]

Comparing the two gdb logs there are less ?? unknown functions. But few still remain. Could you tell me, if additional packages are needed? (And if possible, which ones?)

Thanks

@zdohnal zdohnal added bug Something isn't working priority-medium and removed waiting for reporter There are data requested from the reporter labels Sep 1, 2022
@zdohnal
Copy link
Member

zdohnal commented Sep 1, 2022

@SwissGipfel thank you for the data! In the end I've brought up the container and found out that Ubuntu packages CUPS with more subpackages, so the missing symbols were from cups-daemon-dgbsym.

The sigsegv happens in cups_array_find() because num_elements is high negative number, and it is used as array index after some computation - but I'm not sure why it is negative - the array is created for every new thread, so it shouldn't overflow...

Can you run cupsd within valgrind in the container? This seems like a memory problem. The command looks like:

valgrind -v --leak-check=full --track-origins=yes --log-file=cupsd.txt /usr/sbin/cupsd -f

@zdohnal zdohnal added the waiting for reporter There are data requested from the reporter label Sep 1, 2022
@SwissGipfel
Copy link
Author

I had some issues with valgrind in my dev environment, but finally it produced some logs:
valgrind.txt

However, I had to remove the setcap 'cap_net_bind_service=+ep' /usr/sbin/cupsd command in the Docker image, so valgrind won't get a permission denied error. The new Dockerfile looks like this (if needed):

FROM ubuntu:22.04
# Install basic tools, cups and a minimal set of drivers
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
&& apt-get install -y \
    lsb-release \
&& apt-get install -y ubuntu-dbgsym-keyring
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN echo "deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" >> /etc/apt/sources.list.d/ddebs.list
RUN apt-get update \
&& apt-get install -y \
    sudo \
    whois \
    gdb \
    libcap2-bin=1:2.44-1build3 >/dev/null \
    cups=2.4.1op1-1ubuntu4.1 \
    cups-client=2.4.1op1-1ubuntu4.1 \
    cups-bsd=2.4.1op1-1ubuntu4.1 \
    cups-filters=1.28.15-0ubuntu1 \
    cups-dbgsym \
    cups-daemon-dbgsym \
    libcups2-dbgsym \
    libcupsfilters1-dbgsym \
    libglib2.0-0=2.72.1-1 \
    libglib2.0-0-dbgsym=2.72.1-1 \
    valgrind \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

EXPOSE 631

# Add user and disable sudo password checking
RUN useradd \
    --groups=sudo,lp,lpadmin \
    --create-home \
    --home-dir=/home/print \
    --shell=/bin/bash \
    --password=$(mkpasswd print) \
    print

# A user with this UID also exists in the cups-configurator container. Adding it here to the lpadmin group is required for authorization to cups
RUN adduser --system -uid 1005 --shell /bin/bash --home /home/networkmaster --disabled-password networkmaster && usermod networkmaster -aG lpadmin

# Change owner of some directories to run as non-root
RUN chown -R print:lp /var/log/cups \
&& chown -R print:lp /var/spool/cups \
&& chown -R print:lp /var/cache/cups \
&& chown -R print:lp /run/ \
&& chown -R print:lp /etc/cups/
# && setcap 'cap_net_bind_service=+ep' /usr/sbin/cupsd

COPY cupsd.conf /etc/cups/cupsd.conf
COPY cups-files.conf /etc/cups/cups-files.conf

#USER print

CMD ["/usr/bin/valgrind","-v","--leak-check=full","--track-origins=yes","--log-file=/var/log/cups/valgrind.txt","/usr/sbin/cupsd","-f"]

@SwissGipfel
Copy link
Author

Let me know if the image needs some additional debug packages to analyze this problem further. I saw some lines in the valgrind log with the warning object doesn't have a symbol table.

@zdohnal zdohnal removed the waiting for reporter There are data requested from the reporter label Sep 5, 2022
@zdohnal
Copy link
Member

zdohnal commented Sep 5, 2022

@SwissGipfel Great! The log helps a lot - now I can see that we don't protect MimeDatabase global pointer, so one thread allocates the array by cupsArrayNew() , but an other new thread deletes it by delete_printer_filters()- we need to protect the global pointer from simultaneous access.

zdohnal added a commit that referenced this issue Sep 20, 2022
scheduler: Protect MimeDatabase from parallel access (fixes #466)
zdohnal added a commit that referenced this issue Sep 21, 2022
scheduler/ipp.c: Fix memory leaks found during #466
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority-medium
Projects
None yet
Development

No branches or pull requests

3 participants