zfs receive deadlocks when zstdcat piped to it #13571

jgoerzen · 2022-06-19T12:35:00Z

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	bullseye
Kernel Version	5.10.0-12-amd64 #1 SMP Debian 5.10.103-1 (2022-03-07) x86_64 GNU/Linux
Architecture	x86_64
OpenZFS Version	2.0.3-9

Describe the problem you're observing

When piping data to zfs receive from zstdcat, there is an issue that manifests itself approximately 1/1000 of the time in which the pipeline deadlocks. Additionally, attaching to the zfs receive process with strace -p causes the zfs receive process to exit with cannot receive incremental stream: incomplete stream a few seconds later.

Describe how to reproduce the problem

I wrote a blog post going into detail about the situation and my investigation into it.

I note that it appears to me that the zfs process is not reading from stdin itself, but rather is delegating this work to kernel_read within the kernel. I believe that zfs send is (was?) doing the same; for instance, #11445 described an issue with zfs send not working piped to /dev/null, and #13133 for using a wrapper thread for zfs send (at least when things aren't being sent to a pipe).

I wonder if there is something about how libzfs_set_pipe_max, calling fcntl with F_SETPIPE_SZ, interacts with the kernel code.

Include any warning/errors/backtraces from the system logs

I checked and there are none.

The text was updated successfully, but these errors were encountered:

rincebrain · 2022-06-20T05:30:36Z

You and #13309 should be friends, including the backstory in #13232.

But briefly, Linux has a bug, they ignored a patch to fix it, and nobody particularly cares enough to try again because LKML tends to vomit fire and worse things at anyone who mentions ZFS around them, so nobody can have larger pipe sizes on Linux.

jgoerzen · 2022-06-20T13:40:37Z

Wow, yes indeed. I spent a lot of time searching but didn't manage to turn up either of those, somehow.

I am patching out the F_SETPIPE_SZ in 2.0.3 and will observe if that appears to fix it. I'll let it bake for a few days and report back here.

The system in question is a slower x86_64 one, a Core i5-5200U which is moderately popular as a sort of small PC in a fanless configuration. It is perfect for receiving ZFS backups, which is its primary purpose for me.

I am still at a loss as to why I never saw this bug when the pipeline was being kicked off by the shell, but did when it was being kicked off by Filespooler; perhaps, since it seems to be a race, the faster pace at which my Rust-based program was able to work through the queue may have had something to do with it.

Interestingly, I had inserted cat into the pipeline, which significantly reduced, but did not eliminate, the incidence of this. I at first thought maybe cat was reblocking, but after inspecting its source and strace output, don't believe it was. Perhaps it had something to do with helping to avoid triggering the race.

rincebrain · 2022-06-20T14:14:13Z

I believe, if I understand the bug correctly, it only triggers if you F_SETPIPE_SZ when the writer has put nonzero but not a full unit's worth in yet, which is why the world isn't on fire screaming about this - you need to either have a very slow but nonzero or otherwise very strange write pattern to hit it, which is why it doesn't come up in, say, the CI or most of my testbeds, but my poor little SPARC (440 MHz, 1c1t) and Raspberry Pis were not so fortunate.

jgoerzen · 2022-06-20T15:17:38Z

This could very well explain why I never saw it before I switched to processing data with Filespooler.

Previously, the pipeline was roughly gpg -q -d < file | zstdcat | zfs receive.

Now, Filespooler invokes gpg -q -d < file | zstdcat and crucially reads a few bytes (probably around 120 bytes) from the pipe from this. THEN it spawns zfs receive and hooks up the pipe between zstdcat and zfs receive.

I suspect this increases the likelihood of the condition you described, because now the gpg/zstdcat pipeline will already have data ready to be read by the time zfs receive is invoked, rather than those two programs forking and initializing at about the same time as zfs receive.

Edit: Also I am very impressed at you running ZFS on a 440MHz SPARC!

jgoerzen · 2022-06-22T14:18:38Z

I have not experienced any deadlocks since I patched out F_SETPIPE_SZ. I think this is the proper resolution - thank you!

jgoerzen · 2022-08-20T02:57:05Z

Any word on when this might be merged? Thanks!

stale · 2023-09-17T13:31:01Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

jgoerzen · 2024-01-17T13:30:26Z

At some point, the fix was merged; 2.2.2 no longer has this issue, and contains this code:

        unsigned int cur = fcntl(infd, F_GETPIPE_SZ);
        /*
         * Sadly, Linux has an unfixed deadlock if you do SETPIPE_SZ on a pipe
         * with data in it.
         * cf. #13232, https://bugzilla.kernel.org/show_bug.cgi?id=212295
         *
         * And since the problem is in waking up the writer, there's nothing
         * we can do about it from here.
         *
         * So if people want to, they can set this, but they
         * may regret it...
         */
        if (getenv("ZFS_SET_PIPE_MAX") == NULL)
                return (cur);
        if (cur < max && fcntl(infd, F_SETPIPE_SZ, max) != -1)
                cur = max;

So, unless ZFS_SET_PIPE_MAX is given, the behavior will be correct. That is, it's correct by default. I can confirm deadlocks have gone away in 2.2.2.

I don't know when this was merged; it was still there in 2.1.11.

rincebrain · 2024-01-17T16:41:41Z

It's also a little moot because torvalds/linux@e95aada got merged, which should theoretically make that obsolete, eventually.

PS: a30927f was the commit in master, and 2.1.8 had e84a2ed. So you shouldn't be able to hit that on 2.1.11...

jgoerzen added the Type: Defect label Jun 19, 2022

rincebrain added the Component: Send/Recv label Jun 20, 2022

rincebrain mentioned this issue Jun 20, 2022

zfs-2.1.5 patchset #13532

Merged

13 tasks

stale bot added the Status: Stale label Sep 17, 2023

jgoerzen closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs receive deadlocks when zstdcat piped to it #13571

zfs receive deadlocks when zstdcat piped to it #13571

jgoerzen commented Jun 19, 2022

rincebrain commented Jun 20, 2022 •

edited

Loading

jgoerzen commented Jun 20, 2022

rincebrain commented Jun 20, 2022

jgoerzen commented Jun 20, 2022 •

edited

Loading

jgoerzen commented Jun 22, 2022

jgoerzen commented Aug 20, 2022

stale bot commented Sep 17, 2023

jgoerzen commented Jan 17, 2024

rincebrain commented Jan 17, 2024 •

edited

Loading

zfs receive deadlocks when zstdcat piped to it #13571

zfs receive deadlocks when zstdcat piped to it #13571

Comments

jgoerzen commented Jun 19, 2022

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

rincebrain commented Jun 20, 2022 • edited Loading

jgoerzen commented Jun 20, 2022

rincebrain commented Jun 20, 2022

jgoerzen commented Jun 20, 2022 • edited Loading

jgoerzen commented Jun 22, 2022

jgoerzen commented Aug 20, 2022

stale bot commented Sep 17, 2023

jgoerzen commented Jan 17, 2024

rincebrain commented Jan 17, 2024 • edited Loading

rincebrain commented Jun 20, 2022 •

edited

Loading

jgoerzen commented Jun 20, 2022 •

edited

Loading

rincebrain commented Jan 17, 2024 •

edited

Loading