Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in metaslab_group_alloc #1720

Closed
ryao opened this issue Sep 12, 2013 · 6 comments
Closed

Infinite loop in metaslab_group_alloc #1720

ryao opened this issue Sep 12, 2013 · 6 comments
Milestone

Comments

@ryao
Copy link
Contributor

ryao commented Sep 12, 2013

My system deadlocked on me. It could just be that the hardware is going bad as the system is dying, but I managed to get a backtrace from a stall that RCU detected. The stall detection uses a rather simple mechanism. The thread scheduler maintains updates a per-CPU each time that a CPU is scheduled a new thread and it is checked each time RCU runs. RCU runs in natural preemption points provided by things such as mutexes, which causes them to show up whenever RCU detects a stall. This does not mean that the mutex operation is stuck, but that the thread is not being rescheduled (usually because something is wrong).

With that in mind, it appears that the code was looping infinitely in metaslab_group_alloc(), which did not appear in the backtrace because it was a static function. We probably should add an loop counter to detect excessive iterations, print useful diagnostic information and panic the system.

INFO: rcu_sched self-detected stall on CPU { 0}  (t=21009 jiffies g=3837706 c=3837705 q=129263)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU: 0 PID: 29583 Comm: BrowserBlocking Tainted: P           O 3.11.0 #1
Hardware name: System manufacturer P5K Deluxe/P5K Deluxe, BIOS 1005    12/16/2008
task: ffff880095212050 ti: ffff8801883fe000 task.ti: ffff8801883fe000
RIP: 0010:[<ffffffff81305b51>]  [<ffffffff81305b51>] delay_tsc+0x1e/0x49
RSP: 0018:ffff88022fc03df0  EFLAGS: 00000046
RAX: 000000009a0d05d1 RBX: 0000000000002710 RCX: 000000009a0d04fd
RDX: 000000000002d76b RSI: 0000000000000000 RDI: 00000000002b50c0
RBP: ffff88022fc03df0 R08: 000000009a0d0596 R09: 0000000000000000
R10: 0000000000002544 R11: 000000000000036d R12: ffff88022fc0d7d0
R13: ffffffff81627540 R14: ffff88022fc0cfa8 R15: ffff8801883fffd8
FS:  00007ffc5d3df700(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000004ca21daf1d0 CR3: 0000000187645000 CR4: 00000000000407f0
Stack:
 ffff88022fc03e00 ffffffff81305ac4 ffff88022fc03e10 ffffffff81305aea
 ffff88022fc03e28 ffffffff8102457c ffffffff81627540 ffff88022fc03e80
 ffffffff8108ff3f ffff88022fc03e70 ffff8801883fe000 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffff81305ac4>] __delay+0xa/0xc
 [<ffffffff81305aea>] __const_udelay+0x24/0x26
 [<ffffffff8102457c>] arch_trigger_all_cpu_backtrace+0x69/0x71
 [<ffffffff8108ff3f>] rcu_check_callbacks+0x1b7/0x4cb
 [<ffffffff8103c828>] update_process_times+0x38/0x66
 [<ffffffff8106b8f7>] tick_sched_handle+0x32/0x34
 [<ffffffff8106b9fa>] tick_sched_timer+0x35/0x53
 [<ffffffff8104cf4b>] __run_hrtimer.isra.27+0x4e/0xa7
 [<ffffffff8104d410>] hrtimer_interrupt+0xdf/0x1c8
 [<ffffffff810230b1>] local_apic_timer_interrupt+0x4f/0x52
 [<ffffffff81023430>] smp_apic_timer_interrupt+0x3a/0x4b
 [<ffffffff8148a58a>] apic_timer_interrupt+0x6a/0x70
 <EOI> 
 [<ffffffff81486c8e>] ? __mutex_unlock_slowpath+0xf4/0xfb
 [<ffffffff81486c9e>] mutex_unlock+0x9/0xb
 [<ffffffffa08112b3>] metaslab_alloc+0x456/0x7da [zfs]
 [<ffffffffa085dc7a>] zio_alloc_zil+0x5f/0x159 [zfs]
 [<ffffffffa0858866>] zil_add_block+0x2bd/0x3d2 [zfs]
 [<ffffffffa08592f1>] zil_commit+0x495/0x61b [zfs]
 [<ffffffffa0851b6e>] zfs_fsync+0xc2/0xe2 [zfs]
 [<ffffffffa0861705>] zpl_putpage+0x112/0x38c [zfs]
 [<ffffffff810ea090>] vfs_fsync_range+0x18/0x20
 [<ffffffff810ea0af>] vfs_fsync+0x17/0x19
 [<ffffffff810ea1dc>] do_fsync+0x30/0x53
 [<ffffffff810ea401>] SyS_fdatasync+0xe/0x12
 [<ffffffff814899d2>] system_call_fastpath+0x16/0x1b
Code: f7 e2 48 8d 7a 01 e8 89 ff ff ff 5d c3 55 48 89 e5 65 8b 34 25 1c b0 00 00 66 66 90 0f ae e8 0f 31 89 c1 66 66 90 0f ae e8 0f 31 <48> c1 e2 20 89 c0 48 09 c2 41 89 d0 29 ca 39 fa 73 17 f3 90 65 
NMI backtrace for cpu 2
CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           O 3.11.0 #1 

That is all that was printed on the console before I forced a hard reboot. I obtained the output after the fact from my laptop, which is connected to my desktop via a serial line.

@behlendorf
Copy link
Contributor

If it is possible for us to spin in metaslab_group_alloc() that would go a long way towards explaining some of the reported hangs in this area of the code. It would have to be a tight loop otherwise we'd be making calls to the scheduler and you wouldn't see the stall. This might be possible if a very large value for 'd' was passed in. If you can use gdb to resolve the exact line for metaslab_alloc+0x456 that may provide some more insight.

@ryao
Copy link
Contributor Author

ryao commented Sep 12, 2013

I seem to have built this kernel with a generic configuration by mistake. Unfortunately, the generic configuration omits debug symbols.

@ryao
Copy link
Contributor Author

ryao commented Oct 17, 2013

I might have found a possible cause. e51be06 was supposed to have ported illumos/illumos-gate@03f8c36, but it really did not. I discovered this when comparing our metaslab.c to the illumos metaslab.c.

@ryao
Copy link
Contributor Author

ryao commented Oct 17, 2013

It seems that e51be06 did properly port the 3552 patch, but it introduced a race condition regression, which was later fixed by illumos/illumos-gate@03f8c36. That patch cited Illumos issue 3552, but lacked a number of its own, so all of us mistook it for the Illumos 3552 fix that we already had.

@dweeezil
Copy link
Contributor

@ryao Good catch! Definitely something to keep in mind in the future.

ryao pushed a commit to ryao/zfs that referenced this issue Oct 18, 2013
…read (fix race condition)

Ported-by: Richard Yao <[email protected]>

References:
https://www.illumos.org/issues/3552
illumos/illumos-gate@03f8c36

Porting notes:

This fixes an upstream regression that was introduced in
openzfs/zfs@e51be06, which ported
the Illumos 3552 changes. This fix was added to upstream rather quickly,
but at the time of the port, no one spotted it and the race was rare
enough that it passed our regression test. I discovered this when
comparing our metaslab.c to the illumos metaslab.c.

Closes:
openzfs#1687
openzfs#1720
openzfs#1747
@behlendorf
Copy link
Contributor

@ryao Nice catch, this does appear to cleanly explain the recently introduced metaslab issues. I just wish upstream would have include some comment about symptoms or the exact race in the commit comment.

This was referenced Oct 24, 2013
unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
3552 condensing one space map burns 3 seconds of CPU in spa_sync()
     thread (fix race condition)

References:
  https://www.illumos.org/issues/3552
  illumos/illumos-gate@03f8c36

Ported-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>

Porting notes:

This fixes an upstream regression that was introduced in commit
openzfs/zfs@e51be06, which
ported the Illumos 3552 changes. This fix was added to upstream
rather quickly, but at the time of the port, no one spotted it and
the race was rare enough that it passed our regression tests. I
discovered this when comparing our metaslab.c to the illumos
metaslab.c.

Without this change it is possible for metaslab_group_alloc() to
consume a large amount of cpu time.  Since this occurs under a
mutex in a rcu critical section the kernel will log this to the
console as a self-detected cpu stall as follows:

  INFO: rcu_sched self-detected stall on CPU { 0}
  (t=60000 jiffies g=11431890 c=11431889 q=18271)

Closes openzfs#1687
Closes openzfs#1720
Closes openzfs#1731
Closes openzfs#1747
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants