-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Linux 4.1 compat regarding loop device on ZFS #3651
Conversation
Depends on openzfs/spl#468 |
@tuxoko Nice fix! Thanks for tackling this issue. |
update: account for |
Tested-by: Lukas Wunner [email protected] WFM, great job @tuxoko! |
void *paddr; | ||
cnt = MIN(bv->bv_len - skip, n); | ||
|
||
paddr = zfs_kmap_atomic(bv->bv_page, KM_USER1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe on older kernels because interrupts must be disabled to prevent another thread from using the CPU's KM_USER1 slot. This is likely okay on newer kernels where the slots are dynamically allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, KM_USER* are not used in interrupt context since... well it's "user". So it's perfectly safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When CONFIG_PREEMPT is set, the kernel scheduler should be able to preempt us. If the scheduler preempts us and schedules us to another CPU or runs something else on this CPU that uses this, the value will be clobbered. See the comment for __schedule
:
http://lxr.free-electrons.com/source/kernel/sched/core.c#L2693
All we need is an IRQ to occur when the scheduler thinks something else should run. As the comment for __schedule()
indicates, the code in arch/x86/entry/entry_64.S will call into the scheduler on return from the IRQ and preempt_schedule_irq()
will be called. That will call schedule()
and something else can have the CPU.
You are correct that there is code in the mainline kernel doing this, but I think those routines will need to be changed for the same reason. The reason why they have not yet been changed is that these events are rare enough that no one has encountered them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tuxoko It seems that I used the wrong term. It is unsafe to use kmap_atomic in a preemptible context. An interrupt context is definitely unsafe. Also, my statement about it being okay on newer kernels was wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryao
kmap_atomic does turn off preemption itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tuxoko Where does it do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryao
http://lxr.free-electrons.com/source/include/linux/uaccess.h#L16
7 /*
8 * These routines enable/disable the pagefault handler in that
9 * it will not take any locks and go straight to the fixup table.
10 *
11 * They have great resemblance to the preempt_disable/enable calls
12 * and in fact they are identical; this is because currently there is
13 * no other way to make the pagefault handlers do this. So we do
14 * disable preemption but we don't necessarily care about that.
15 */
16 static inline void pagefault_disable(void)
@ryao
|
@tuxoko POSIX doesn't apply to kernel code and the other platforms are using |
@ryao |
Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2. We use zfs_{kmap,kunmap}_atomic as wrappers and always take 2 argument, but ignore the 2nd for newer kernel. Signed-off-by: Chunwei Chen <[email protected]>
This looks fine to me. I have tested this on Linux 4.1.3 by applying it to spl-0.6.4.2. Using losetup on a from my root pool works fine with it, but hangs my laptop without it. |
working with sys-kernel/aufs-sources-4.1.5 |
@tuxoko aside from fixing the bug this is a really nice improvement! The patch looks good to me but let me manually to it against a wider range of kernels before merging it. I also agree there's no issue with using |
Causes trouble with Chromium, Chrome and bookmark handling: https://forums.gentoo.org/viewtopic-t-1026884.html [ZFS loop fix [upstream] breaks chromium bookmark handling ?] the bookmarks can't be displayed or are even corrupt (they are in the .config subfolder but Chromium asks for bookmark import and shows no bookmarks), rolling back a previously created snapshot (on other kernel, or without modifying bookmarks) helps but this isn't a long-term solution. I hope that there wasn't any additional breakage or file corruption :/ |
@tuxoko with a trivial fix to the SPL part of this patch set it builds cleanly on all the buildbots and passes all the automated testing. @kernelOfTruth thanks for commenting on the issue with chromium. We'll definitely want to explain that before merging these changes. Do you happen to have another reproducer which is easier to run which shows the issue? |
@kernelOfTruth |
@tuxoko like so: ? unfortunately no :( |
@kernelOfTruth This would be very strange if every thing except chromium works fine. |
@tuxoko alright, will test again once I did at least one backup & modify accordingly thanks |
@kernelOfTruth Edit: Also, I couldn't reproduce the bug with linux-4.1.5 on my ubuntu VM. |
Just saw that the zfs branch I was observing these issues also contains #3672 (configure bdi_setup_and_register: don't blow config stack #3672) https://github.com/kernelOfTruth/zfs/commits/zfs_kOT_13.08.2015 (minus the last commit) so that actually could be the real cause (I'm running a branch without it and this pull-request - having accidentally forgotten to add), it was too late the last time I wrote, so oversaw it :( It happens with 4.1.4 to 4.1.6: That's the range I started testing these changes and when I observed it, first I added the patches with the upgrade to 4.1.5, then also 4.1.6 - finally went back to 4.1.4, the "working" kernel and updated ZFS to the state used with the 4.1.5+ kernel and also observed it there. First I had assumed that it was related to the up-to-date chrome & chromium versions, GCC (went also back from 5.2 to 4.9.3) or even CFLAGS - but to no avail. Only with going back to a previous patchset iteration without the loop fix and @chrisrd 's patch the behavior disappeared for good. Once I've time again (currently /home is being scrubbed and I'm busy) I'll also test it against latest zfs master to see whether it's caused by the other patch stacks I'm using. sorry for potentially leading you on a wrong track |
There's also something fishy going on with git's history (actually bash history) - not sure if it's connected but suddenly lots of entries are missing up to February of this year ... @behlendorf unfortunately that's the only reproducer so far - I'll see if I can encircle it either to this pull request or @chrisrd 's that reduces stack usage (will add that solely in the next testing round) |
alright, just pulled the latest 2 commits from master into the tree from @chrisrd and it's definitely this pull-request (the loop fix) that causes this issue with chromium, will proceed to remove the
statement and re-test ... |
@kernelOfTruth |
@tuxoko sure, will do I just did a test with iov_iter_advance removed and more conservative CFLAGS and that lead to no change so now adding the new commit Thanks ! |
@kernelOfTruth |
@tuxoko I was slowly suspecting that the problem has to do with ABD, sorry, should have mentioned explicitly that I'm running a branch with it Thanks |
@tuxoko: thanks for the heads up, i just got to the point where i realized this PR might be what's causing #3684 and the obscene amount of data corruption i was seeing - inodes were being mauled on disk which hadn't been written to since this was deployed(gcc claimed it couldnt create elfs, git packs rended like bunnies in a blender, aptitude package index nuked, and a pool created with this exhibiting the worst behaviors, all sorts of fun stuff). This at least makes me think the SSE vectorized parity calculation branch is not the culprit, but i'll need to do more testing when my team gets to the DC and resets the "very locked-up" system i've been abusing for this stuff. |
That explains things. Has anyone observed problems when running with just this patch? |
When I tested the patch on July 31, a kernel compile failed 100% reproducably when linking vmlinux. I didn't report this here because I was unsure what the cause was: It was very hot in Germany back then (around 40° C) and I suspected that the RAM was either permanently damaged or had temporary issues due to the temperatures. I reverted the patch, waited for the machine to cool down, then retried a compile with "make -j1" to keep the temperature low and it worked fine. The patch did solve the loopback issue, this I had verified successfully. To this day I'm not sure what caused the kernel compile to fail but reading the other reports here I do get the impression this may be linked. I had used the vanilla ZoL code out of Turbo's Debian dailies repo, I believe it was zfsonlinux/pkg-zfs@2e97348. |
Thus far I haven't been able to reproduce any issues when testing the updated version of this patch. It consistently passes:
I'm going to leave much of the above testing running but I think it's safe to say the reported issues were due to mixing this with ABD which was known not to work. I'd love to get this merged in the next few days so any additional testing people can offer as always is appreciated! |
Starting from Linux 4.1 allows iov_iter with bio_vec to be passed into iter_read/iter_write. Notably, the loop device will pass bio_vec to backend filesystem. However, current ZFS code assumes iovec without any check, so it will always crash when using loop device. With the restructured uio_t, we can safely pass bio_vec in uio_t with UIO_BVEC set. The uio* functions are modified to handle bio_vec case separately. The const uio_iov causes some warning in xuio related stuff, so explicit convert them to non const. Signed-off-by: Chunwei Chen <[email protected]>
Update: nothing changed, just squashed the previous bvec fix. |
I've been running this happily as 4.1.5-aufs on an X4540, Mellanox ConnectX EN 10GbE + Mellanox ConnectX VPI QDR IB both raidz2 (zil/l2arc on NVMe Intel 750) and raid10 (12x scratch disks) under varying loads (TBs of data in/out) without any issues. |
After a weekend of testing without observing any issues and independent confirmation of the fix I'm comfortable merging this fix. Merged as: Thanks @tuxoko. 5475aad Linux 4.1 compat: loop device on ZFS |
Starting from Linux 4.1 allows iov_iter with bio_vec to be passed into
iter_read/iter_write. Notably, the loop device will pass bio_vec to backend
filesystem. However, current ZFS code assumes iovec without any check, so it
will always crash when using loop device.
With the restructured uio_t, we can safely pass bio_vec in uio_t with UIO_BVEC
set. The uio* functions are modified to handle bio_vec case separately.
The const uio_iov causes some warning in xuio related stuff, so explicit
convert them to non const.
Signed-off-by: Chunwei Chen [email protected]