Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs receive deadlocks when zstdcat piped to it #13571

Closed
jgoerzen opened this issue Jun 19, 2022 · 9 comments
Closed

zfs receive deadlocks when zstdcat piped to it #13571

jgoerzen opened this issue Jun 19, 2022 · 9 comments
Labels
Component: Send/Recv "zfs send/recv" feature Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@jgoerzen
Copy link

System information

Type Version/Name
Distribution Name Debian
Distribution Version bullseye
Kernel Version 5.10.0-12-amd64 #1 SMP Debian 5.10.103-1 (2022-03-07) x86_64 GNU/Linux
Architecture x86_64
OpenZFS Version 2.0.3-9

Describe the problem you're observing

When piping data to zfs receive from zstdcat, there is an issue that manifests itself approximately 1/1000 of the time in which the pipeline deadlocks. Additionally, attaching to the zfs receive process with strace -p causes the zfs receive process to exit with cannot receive incremental stream: incomplete stream a few seconds later.

Describe how to reproduce the problem

I wrote a blog post going into detail about the situation and my investigation into it.

I note that it appears to me that the zfs process is not reading from stdin itself, but rather is delegating this work to kernel_read within the kernel. I believe that zfs send is (was?) doing the same; for instance, #11445 described an issue with zfs send not working piped to /dev/null, and #13133 for using a wrapper thread for zfs send (at least when things aren't being sent to a pipe).

I wonder if there is something about how libzfs_set_pipe_max, calling fcntl with F_SETPIPE_SZ, interacts with the kernel code.

Include any warning/errors/backtraces from the system logs

I checked and there are none.

@jgoerzen jgoerzen added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jun 19, 2022
@rincebrain
Copy link
Contributor

rincebrain commented Jun 20, 2022

You and #13309 should be friends, including the backstory in #13232.

But briefly, Linux has a bug, they ignored a patch to fix it, and nobody particularly cares enough to try again because LKML tends to vomit fire and worse things at anyone who mentions ZFS around them, so nobody can have larger pipe sizes on Linux.

@rincebrain rincebrain added the Component: Send/Recv "zfs send/recv" feature label Jun 20, 2022
@jgoerzen
Copy link
Author

Wow, yes indeed. I spent a lot of time searching but didn't manage to turn up either of those, somehow.

I am patching out the F_SETPIPE_SZ in 2.0.3 and will observe if that appears to fix it. I'll let it bake for a few days and report back here.

The system in question is a slower x86_64 one, a Core i5-5200U which is moderately popular as a sort of small PC in a fanless configuration. It is perfect for receiving ZFS backups, which is its primary purpose for me.

I am still at a loss as to why I never saw this bug when the pipeline was being kicked off by the shell, but did when it was being kicked off by Filespooler; perhaps, since it seems to be a race, the faster pace at which my Rust-based program was able to work through the queue may have had something to do with it.

Interestingly, I had inserted cat into the pipeline, which significantly reduced, but did not eliminate, the incidence of this. I at first thought maybe cat was reblocking, but after inspecting its source and strace output, don't believe it was. Perhaps it had something to do with helping to avoid triggering the race.

@rincebrain
Copy link
Contributor

I believe, if I understand the bug correctly, it only triggers if you F_SETPIPE_SZ when the writer has put nonzero but not a full unit's worth in yet, which is why the world isn't on fire screaming about this - you need to either have a very slow but nonzero or otherwise very strange write pattern to hit it, which is why it doesn't come up in, say, the CI or most of my testbeds, but my poor little SPARC (440 MHz, 1c1t) and Raspberry Pis were not so fortunate.

@rincebrain rincebrain mentioned this issue Jun 20, 2022
13 tasks
@jgoerzen
Copy link
Author

jgoerzen commented Jun 20, 2022

This could very well explain why I never saw it before I switched to processing data with Filespooler.

Previously, the pipeline was roughly gpg -q -d < file | zstdcat | zfs receive.

Now, Filespooler invokes gpg -q -d < file | zstdcat and crucially reads a few bytes (probably around 120 bytes) from the pipe from this. THEN it spawns zfs receive and hooks up the pipe between zstdcat and zfs receive.

I suspect this increases the likelihood of the condition you described, because now the gpg/zstdcat pipeline will already have data ready to be read by the time zfs receive is invoked, rather than those two programs forking and initializing at about the same time as zfs receive.

Edit: Also I am very impressed at you running ZFS on a 440MHz SPARC!

@jgoerzen
Copy link
Author

I have not experienced any deadlocks since I patched out F_SETPIPE_SZ. I think this is the proper resolution - thank you!

@jgoerzen
Copy link
Author

Any word on when this might be merged? Thanks!

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Sep 17, 2023
@jgoerzen
Copy link
Author

At some point, the fix was merged; 2.2.2 no longer has this issue, and contains this code:

        unsigned int cur = fcntl(infd, F_GETPIPE_SZ);
        /*
         * Sadly, Linux has an unfixed deadlock if you do SETPIPE_SZ on a pipe
         * with data in it.
         * cf. #13232, https://bugzilla.kernel.org/show_bug.cgi?id=212295
         *
         * And since the problem is in waking up the writer, there's nothing
         * we can do about it from here.
         *
         * So if people want to, they can set this, but they
         * may regret it...
         */
        if (getenv("ZFS_SET_PIPE_MAX") == NULL)
                return (cur);
        if (cur < max && fcntl(infd, F_SETPIPE_SZ, max) != -1)
                cur = max;

So, unless ZFS_SET_PIPE_MAX is given, the behavior will be correct. That is, it's correct by default. I can confirm deadlocks have gone away in 2.2.2.

I don't know when this was merged; it was still there in 2.1.11.

@rincebrain
Copy link
Contributor

rincebrain commented Jan 17, 2024

It's also a little moot because torvalds/linux@e95aada got merged, which should theoretically make that obsolete, eventually.

PS: a30927f was the commit in master, and 2.1.8 had e84a2ed. So you shouldn't be able to hit that on 2.1.11...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants