-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU stall when disk is removed #3652
Comments
I'm experiencing a similar situation. In a system with multiple I/O errors on different harddrives,
After that OS: Ubuntu Linux 14.04 LTS x64 Each drive is encrypted via LUKS. These 12 Device Mapper devices are configured as 2x raidz2 (6 disks each). |
Got something similar, on a pre-production machine running under CentOS 7.1 (downgraded to 0.6.4.2 since I hit the bug). The issue is however perfectly reproducible in a test environment running under VirtualBox: when a disk is detached and an write operation occurs on the pool => CPU stall and soft lock-up notices show up on the console. (CPU#0 stuck for 22s [z_null_int:717]). A 'top' on the host shows a thread hung at 100% in kernel mode... |
Is anyone planning to tackle this one? |
@ab17 , @kaazoo , @adessemond just to be clear: not every one of you is using LUKS ? |
@kernelOfTruth, I'm not using LUKS. That is for encryption right? |
I built kernel 4.1.7 with the latest ZoL GitHub release to see if this was related to kernel 3.18.x. |
Hello, |
@kernelOfTruth no LUKS here also. |
I built the very latest release today. Kernel 3.18.21. Still the same problem. You can use a virtual or physical machine to test this. Remove a pool disk and then do some write IO to the pool. You might need to remove/re-add/remove a few times to see it happen. It does not always happen on the first disk removal. |
I have not spent much time with zfs so take this with a grain of salt --> zpool events I see no additional events generated (I am not sure if that is intended but I think there should be more). zpool status I see that the pool is still health and that in fact nothing happened --> and that is a bug as I tested "disk-eject" with FreeNAS where the pool immediately goes to degraded. So I think as the system does not know that the pool is down/degraded it can't handle a subsequent reattach of the disk or the next IO-operation. |
@xatian do you run openRC ? systemd ? with openRC 0.17 events are being generated:
|
Yeah I am using openRC 0.17 zpool events
TIME CLASS
Sep 23 2015 22:11:02.291999909 resource.fs.zfs.statechange
Sep 23 2015 22:11:02.291999909 resource.fs.zfs.statechange
Sep 23 2015 22:11:02.291999909 resource.fs.zfs.statechange
Sep 23 2015 22:11:02.291999909 resource.fs.zfs.statechange
Sep 23 2015 22:11:02.291999909 resource.fs.zfs.statechange
Sep 23 2015 22:11:02.291999909 resource.fs.zfs.statechange
Sep 23 2015 22:11:03.087999901 ereport.fs.zfs.config.sync
Sep 23 2015 22:11:03.487999897 ereport.fs.zfs.config.sync
Sep 23 2015 22:11:05.003999880 resource.fs.zfs.statechange
Sep 23 2015 22:11:05.003999880 resource.fs.zfs.statechange
Sep 23 2015 22:11:05.711999872 ereport.fs.zfs.config.sync
Sep 23 2015 22:11:06.087999868 ereport.fs.zfs.config.sync
Sep 23 2015 22:11:07.439999854 resource.fs.zfs.statechange
Sep 23 2015 22:11:07.439999854 resource.fs.zfs.statechange
Sep 23 2015 22:11:07.439999854 resource.fs.zfs.statechange
Sep 23 2015 22:11:08.071999847 ereport.fs.zfs.config.sync
Sep 23 2015 22:11:08.455999843 ereport.fs.zfs.config.sync When I pull a disk there are no additional events shown and like I said zpool status still shows everything fine. [ 292.755888] INFO: rcu_sched self-detected stall on CPU
[ 292.755892] 4: (5249 ticks this GP) idle=3bd/140000000000001/0 softirq=2448/2448 fqs=5249
[ 292.755893] (t=5250 jiffies g=2236 c=2235 q=21023)
[ 292.755895] Task dump for CPU 4:
[ 292.755896] z_null_int R running task 0 2173 2 0x00000008
[ 292.755897] ffffffff81c35c80 ffff88087fd03da8 ffffffff810753fe 0000000000000004
[ 292.755899] ffffffff81c35c80 ffff88087fd03dc8 ffffffff81077a38 ffff88087fd03e08
[ 292.755900] 0000000000000005 ffff88087fd03df8 ffffffff81091fe0 ffff88087fd130c0
[ 292.755901] Call Trace:
[ 292.755902] <IRQ> [<ffffffff810753fe>] sched_show_task+0xae/0x120
[ 292.755908] [<ffffffff81077a38>] dump_cpu_task+0x38/0x40
[ 292.755910] [<ffffffff81091fe0>] rcu_dump_cpu_stacks+0x90/0xd0
[ 292.755912] [<ffffffff81094d23>] rcu_check_callbacks+0x423/0x700
[ 292.755913] [<ffffffff8109d744>] ? update_wall_time+0x234/0x650
[ 292.755915] [<ffffffff81097134>] update_process_times+0x34/0x60
[ 292.755917] [<ffffffff810a5231>] tick_sched_handle.isra.18+0x31/0x40
[ 292.755918] [<ffffffff810a527c>] tick_sched_timer+0x3c/0x70
[ 292.755920] [<ffffffff81097791>] __run_hrtimer.isra.34+0x41/0xf0
[ 292.755921] [<ffffffff81097f05>] hrtimer_interrupt+0xc5/0x1e0
[ 292.755923] [<ffffffff8102b454>] local_apic_timer_interrupt+0x34/0x60
[ 292.755924] [<ffffffff8102bacc>] smp_apic_timer_interrupt+0x3c/0x60
[ 292.755926] [<ffffffff816f9cea>] apic_timer_interrupt+0x6a/0x70
[ 292.755927] <EOI> [<ffffffff816f8859>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[ 292.755934] [<ffffffffa0003f6a>] taskq_cancel_id+0x31a/0x530 [spl]
[ 292.755935] [<ffffffff81075a60>] ? wake_up_state+0x10/0x10
[ 292.755937] [<ffffffffa0003d70>] ? taskq_cancel_id+0x120/0x530 [spl]
[ 292.755940] [<ffffffff8106d7e4>] kthread+0xc4/0xe0
[ 292.755941] [<ffffffff8106d720>] ? kthread_create_on_node+0x170/0x170
[ 292.755943] [<ffffffff816f8e48>] ret_from_fork+0x58/0x90
[ 292.755944] [<ffffffff8106d720>] ? kthread_create_on_node+0x170/0x170 BTW --> fedora server 22 I also tested with is running systemd and had exactly the same behaviour. |
Thanks @behlendorf |
Kernel 3.18.21 I've tried this:
That helps. A lot. I can remove and re-add the disk several times. But... Eventually - after several removals and re-adds, I get the CPU stall again:
This time, the system does not lockup. There is no z_null_int process running at 100% either. However, no writes can take place to the pool. A reset of the system is required. |
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and may explain the performance regressions reported in both openzfs#3829 and openzfs#3780. This patch resolves the issue by making the blocking behavior dependant on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3780 Issue openzfs#3829 Issue openzfs#3652
@behlendorf @ kernelOfTruth : echo 0 > /sys/module/spl/parameters/spl_taskq_thread_dynamic did not help in my case (even forced at system startup before importing anything). Will test the patch and report. |
@behlendorf your fix listed above for #3833 solves the issue on my side, should be tested on my real hardware but so far no more CPU stall when a virtual disk is removed. |
@behlendorf I tried the patch this morning. The pool did not go into degraded mode when the disk was removed and write IO took place (the expected behaviour). The CPU stall issue is gone, but the pool is no longer writable.
"zpool clear" command hangs. No write IO can take place to the pool. A reset of the system is required. |
@behlendorf an update to previous test. I assumed I was to test with the default spl_taskq_thread_dynamic = 1. I have tested again with spl_taskq_thread_dynamic = 0. I have completed a total of 20 tests where I remove and re-add the disk. In each case the pool goes into degraded mode as expected. A re-add resilvers OK so I can remove again. Works exactly like it should do. So to confirm: #3833 appears to fix the problem when spl_taskq_thread_dynamic = 0. |
@adessemond @ab17 thanks for the quick testing turnaround. I think I've got a good idea what the lingering issue is but I'd like to address that in a separate patch. It's an issue which has always existed but was masked in 0.6.4.x because we didn't have dynamic taskqs. We'll get #3833 in right away, but @ab17 you may need to run with |
@behlendorf no problem. Thank you. I'll run with spl_taskq_thread_dynamic = 0 for now. |
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870
Just for the records, I think I was hitting this too. My server has one drive that is starting to go bad and once in a while hits an unreadable sector. Rsyncing data from an mdadm array to the new ZFS pool would lock up after a while. I never removed or added any drives tho. Pinging the machine worked fine as well as anything that was still running in ram. As soon as something hit the disk it was dead and the only fix was magic sysreq or hard poweroff. I do not have any dmesg logs unfortunately so i cannot be sure since the machine is remote. running echo 3>drop_caches once in a while would appear to make it last longer before locking up but not sure why or how or if it was just luck for a while. I applied ef5b2e1 and this patch locally and the machine has been fine for many hours since. I have rsync'd more data over to the zfs pool and it should probably have had an error by now so it appears fixed. @behlendorf since spl_taskq_thread_dynamic = 0 is required to make things stable, should the default for that be set back to 0 until the other issue is fixed too so that the next point release will work correctly without user intervention? |
@perfinion I was considering the same thing myself. However, from what I've seen reported the f5b2e1 fix addresses the vast majority of issues. Setting |
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870
I'm getting a pool lockup when:
z_null_int process takes 100% CPU:
We get this at the console just after removal of the disk:
A hard reset of the machine is required.
I've experienced this on multiple machines with:
Kernel version is 3.18.19.
This only appeared on the ZFS master release from 13th July onwards.
I don't know the specific commit.
The latest release on 29th July is still affected by this issue. I confirmed that today.
The latest SPL master is not affected and is OK to use.
The text was updated successfully, but these errors were encountered: