Infinite loop in metaslab_group_alloc #1720

ryao · 2013-09-12T17:10:22Z

My system deadlocked on me. It could just be that the hardware is going bad as the system is dying, but I managed to get a backtrace from a stall that RCU detected. The stall detection uses a rather simple mechanism. The thread scheduler maintains updates a per-CPU each time that a CPU is scheduled a new thread and it is checked each time RCU runs. RCU runs in natural preemption points provided by things such as mutexes, which causes them to show up whenever RCU detects a stall. This does not mean that the mutex operation is stuck, but that the thread is not being rescheduled (usually because something is wrong).

With that in mind, it appears that the code was looping infinitely in metaslab_group_alloc(), which did not appear in the backtrace because it was a static function. We probably should add an loop counter to detect excessive iterations, print useful diagnostic information and panic the system.

INFO: rcu_sched self-detected stall on CPU { 0}  (t=21009 jiffies g=3837706 c=3837705 q=129263)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU: 0 PID: 29583 Comm: BrowserBlocking Tainted: P           O 3.11.0 #1
Hardware name: System manufacturer P5K Deluxe/P5K Deluxe, BIOS 1005    12/16/2008
task: ffff880095212050 ti: ffff8801883fe000 task.ti: ffff8801883fe000
RIP: 0010:[<ffffffff81305b51>]  [<ffffffff81305b51>] delay_tsc+0x1e/0x49
RSP: 0018:ffff88022fc03df0  EFLAGS: 00000046
RAX: 000000009a0d05d1 RBX: 0000000000002710 RCX: 000000009a0d04fd
RDX: 000000000002d76b RSI: 0000000000000000 RDI: 00000000002b50c0
RBP: ffff88022fc03df0 R08: 000000009a0d0596 R09: 0000000000000000
R10: 0000000000002544 R11: 000000000000036d R12: ffff88022fc0d7d0
R13: ffffffff81627540 R14: ffff88022fc0cfa8 R15: ffff8801883fffd8
FS:  00007ffc5d3df700(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000004ca21daf1d0 CR3: 0000000187645000 CR4: 00000000000407f0
Stack:
 ffff88022fc03e00 ffffffff81305ac4 ffff88022fc03e10 ffffffff81305aea
 ffff88022fc03e28 ffffffff8102457c ffffffff81627540 ffff88022fc03e80
 ffffffff8108ff3f ffff88022fc03e70 ffff8801883fe000 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffff81305ac4>] __delay+0xa/0xc
 [<ffffffff81305aea>] __const_udelay+0x24/0x26
 [<ffffffff8102457c>] arch_trigger_all_cpu_backtrace+0x69/0x71
 [<ffffffff8108ff3f>] rcu_check_callbacks+0x1b7/0x4cb
 [<ffffffff8103c828>] update_process_times+0x38/0x66
 [<ffffffff8106b8f7>] tick_sched_handle+0x32/0x34
 [<ffffffff8106b9fa>] tick_sched_timer+0x35/0x53
 [<ffffffff8104cf4b>] __run_hrtimer.isra.27+0x4e/0xa7
 [<ffffffff8104d410>] hrtimer_interrupt+0xdf/0x1c8
 [<ffffffff810230b1>] local_apic_timer_interrupt+0x4f/0x52
 [<ffffffff81023430>] smp_apic_timer_interrupt+0x3a/0x4b
 [<ffffffff8148a58a>] apic_timer_interrupt+0x6a/0x70
 <EOI> 
 [<ffffffff81486c8e>] ? __mutex_unlock_slowpath+0xf4/0xfb
 [<ffffffff81486c9e>] mutex_unlock+0x9/0xb
 [<ffffffffa08112b3>] metaslab_alloc+0x456/0x7da [zfs]
 [<ffffffffa085dc7a>] zio_alloc_zil+0x5f/0x159 [zfs]
 [<ffffffffa0858866>] zil_add_block+0x2bd/0x3d2 [zfs]
 [<ffffffffa08592f1>] zil_commit+0x495/0x61b [zfs]
 [<ffffffffa0851b6e>] zfs_fsync+0xc2/0xe2 [zfs]
 [<ffffffffa0861705>] zpl_putpage+0x112/0x38c [zfs]
 [<ffffffff810ea090>] vfs_fsync_range+0x18/0x20
 [<ffffffff810ea0af>] vfs_fsync+0x17/0x19
 [<ffffffff810ea1dc>] do_fsync+0x30/0x53
 [<ffffffff810ea401>] SyS_fdatasync+0xe/0x12
 [<ffffffff814899d2>] system_call_fastpath+0x16/0x1b
Code: f7 e2 48 8d 7a 01 e8 89 ff ff ff 5d c3 55 48 89 e5 65 8b 34 25 1c b0 00 00 66 66 90 0f ae e8 0f 31 89 c1 66 66 90 0f ae e8 0f 31 <48> c1 e2 20 89 c0 48 09 c2 41 89 d0 29 ca 39 fa 73 17 f3 90 65 
NMI backtrace for cpu 2
CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           O 3.11.0 #1

That is all that was printed on the console before I forced a hard reboot. I obtained the output after the fact from my laptop, which is connected to my desktop via a serial line.

The text was updated successfully, but these errors were encountered:

behlendorf · 2013-09-12T17:52:18Z

If it is possible for us to spin in metaslab_group_alloc() that would go a long way towards explaining some of the reported hangs in this area of the code. It would have to be a tight loop otherwise we'd be making calls to the scheduler and you wouldn't see the stall. This might be possible if a very large value for 'd' was passed in. If you can use gdb to resolve the exact line for metaslab_alloc+0x456 that may provide some more insight.

ryao · 2013-09-12T21:03:48Z

I seem to have built this kernel with a generic configuration by mistake. Unfortunately, the generic configuration omits debug symbols.

ryao · 2013-10-17T18:07:55Z

I might have found a possible cause. e51be06 was supposed to have ported illumos/illumos-gate@03f8c36, but it really did not. I discovered this when comparing our metaslab.c to the illumos metaslab.c.

ryao · 2013-10-17T18:16:13Z

It seems that e51be06 did properly port the 3552 patch, but it introduced a race condition regression, which was later fixed by illumos/illumos-gate@03f8c36. That patch cited Illumos issue 3552, but lacked a number of its own, so all of us mistook it for the Illumos 3552 fix that we already had.

dweeezil · 2013-10-17T18:45:07Z

@ryao Good catch! Definitely something to keep in mind in the future.

…read (fix race condition) Ported-by: Richard Yao <[email protected]> References: https://www.illumos.org/issues/3552 illumos/illumos-gate@03f8c36 Porting notes: This fixes an upstream regression that was introduced in openzfs/zfs@e51be06, which ported the Illumos 3552 changes. This fix was added to upstream rather quickly, but at the time of the port, no one spotted it and the race was rare enough that it passed our regression test. I discovered this when comparing our metaslab.c to the illumos metaslab.c. Closes: openzfs#1687 openzfs#1720 openzfs#1747

behlendorf · 2013-10-18T21:58:34Z

@ryao Nice catch, this does appear to cleanly explain the recently introduced metaslab issues. I just wish upstream would have include some comment about symptoms or the exact race in the commit comment.

3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread (fix race condition) References: https://www.illumos.org/issues/3552 illumos/illumos-gate@03f8c36 Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Porting notes: This fixes an upstream regression that was introduced in commit openzfs/zfs@e51be06, which ported the Illumos 3552 changes. This fix was added to upstream rather quickly, but at the time of the port, no one spotted it and the race was rare enough that it passed our regression tests. I discovered this when comparing our metaslab.c to the illumos metaslab.c. Without this change it is possible for metaslab_group_alloc() to consume a large amount of cpu time. Since this occurs under a mutex in a rcu critical section the kernel will log this to the console as a self-detected cpu stall as follows: INFO: rcu_sched self-detected stall on CPU { 0} (t=60000 jiffies g=11431890 c=11431889 q=18271) Closes openzfs#1687 Closes openzfs#1720 Closes openzfs#1731 Closes openzfs#1747

This was referenced Sep 18, 2013

Crash when running 'zfs get all <zvol>' while 'zfs recv <same zvol>' was running #1731

Closed

ZFS seems to be causing kernel BUG #1687

Closed

CPU locked at 100% causing hung task #1747

Closed

ryao mentioned this issue Oct 18, 2013

3552 condensing one space map burns 3 seconds of CPU in spa_sync() threa... #1797

Closed

behlendorf closed this as completed in 7a61440 Oct 19, 2013

This was referenced Oct 24, 2013

NFS writes causing lockup #1724

Closed

NFSd unresposive #1790

Closed

scttfrdmn mentioned this issue Mar 6, 2014

ZFS / z_wr_int_h hung in state D #2164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop in metaslab_group_alloc #1720

Infinite loop in metaslab_group_alloc #1720

ryao commented Sep 12, 2013

behlendorf commented Sep 12, 2013

ryao commented Sep 12, 2013

ryao commented Oct 17, 2013

ryao commented Oct 17, 2013

dweeezil commented Oct 17, 2013

behlendorf commented Oct 18, 2013

Infinite loop in metaslab_group_alloc #1720

Infinite loop in metaslab_group_alloc #1720

Comments

ryao commented Sep 12, 2013

behlendorf commented Sep 12, 2013

ryao commented Sep 12, 2013

ryao commented Oct 17, 2013

ryao commented Oct 17, 2013

dweeezil commented Oct 17, 2013

behlendorf commented Oct 18, 2013