Kernel panic - not syncing: Watchdog detected hard LOCKUP [exception RIP: spl_slab_reclaim+544 #233

inevity · 2013-04-30T09:45:02Z

hi,all
kernel panic hard lockup on cpu 2 .can anyone help ?

 SPL: Loaded module v0.6.0-rc13
 ZFS: Loaded module v0.6.0-rc13, ZFS pool version 28, ZFS filesystem version 5
 kernel: zunicode: module license 'CDDL' taints kernel.

crash output:

       KERNEL:2.6.32-220.el6.x86_64/
DUMPFILE:vmcore  [PARTIAL DUMP]
    CPUS: 4
    DATE: Fri Apr 26 16:56:13 2013
  UPTIME: 39 days, 02:03:01

LOAD AVERAGE: 30.46, 9.19, 3.93
TASKS: 501
NODENAME:
RELEASE: 2.6.32-220.el6.x86_64
VERSION: #1 SMP Tue Dec 6 19:48:22 GMT 2011
MACHINE: x86_64 (2133 Mhz)
MEMORY: 24 GB
PANIC: "Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2"
PID: 3452
COMMAND: "arc_adapt"
TASK: ffff88062bb6ea80 [THREAD_INFO: ffff8805f78e4000]
CPU: 2
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 3452 TASK: ffff88062bb6ea80 CPU: 2 COMMAND: "arc_adapt"
#0 [ffff88003e647b00] machine_kexec at ffffffff81031fcb
#1 [ffff88003e647b60] crash_kexec at ffffffff810b8f72
#2 [ffff88003e647c30] panic at ffffffff814ec328
#3 [ffff88003e647cb0] watchdog_overflow_callback at ffffffff810d8fad
#4 [ffff88003e647cd0] __perf_event_overflow at ffffffff8110a89d
#5 [ffff88003e647d70] perf_event_overflow at ffffffff8110ae54
#6 [ffff88003e647d80] intel_pmu_handle_irq at ffffffff8101e096
#7 [ffff88003e647e90] perf_event_nmi_handler at ffffffff814f09d9
#8 [ffff88003e647ea0] notifier_call_chain at ffffffff814f2525
#9 [ffff88003e647ee0] atomic_notifier_call_chain at ffffffff814f258a
#10 [ffff88003e647ef0] notify_die at ffffffff81096bce
#11 [ffff88003e647f20] do_nmi at ffffffff814f01a3
#12 [ffff88003e647f50] nmi at ffffffff814efab0

[exception RIP: _spin_lock+30]
RIP: ffffffff814ef31e  RSP: ffff88003e643ec8  RFLAGS: 00000097
RAX: 0000000000006f7b  RBX: ffff8805fb260000  RCX: ffff880630bbe040
RDX: 0000000000006f7a  RSI: ffff8805facb1c00  RDI: ffff8805fb2680a8
RBP: ffff88003e643ec8   R8: 8080000000000000   R9: 0000000000000001
R10: 000000000000000f  R11: 0000000000000020  R12: 0000000000000002
R13: 00000000ffffffff  R14: ffffffff81a831d0  R15: 0000000000000020
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

--- ---
#13 [ffff88003e643ec8] _spin_lock at ffffffff814ef31e
#14 [ffff88003e643ed0] spl_cache_flush at ffffffffa0546a72 [spl]
#15 [ffff88003e643f50] spl_magazine_age at ffffffffa05470e0 [spl]
#16 [ffff88003e643f60] generic_smp_call_function_interrupt at ffffffff810a7060
#17 [ffff88003e643fa0] smp_call_function_interrupt at ffffffff8102a287
#18 [ffff88003e643fb0] call_function_interrupt at ffffffff8100bdd3

--- ---
#19 [ffff8805f78e5c38] call_function_interrupt at ffffffff8100bdd3

[exception RIP: spl_slab_reclaim+544]
RIP: ffffffffa05462e0  RSP: ffff8805f78e5ce0  RFLAGS: 00000202
RAX: 0000000000006f7a  RBX: ffff8805f78e5d80  RCX: 0000000000000005
RDX: 0000000000006f7a  RSI: 0000000003ffffff  RDI: ffff8805fb268090
RBP: ffffffff8100bdce   R8: 8080000000000000   R9: 0000000000000001
R10: 0000000000000000  R11: 000000000000000a  R12: ffff880500000ac0
R13: ffff880000000000  R14: ffff8805f78e5c70  R15: ffff88003e653b40
ORIG_RAX: ffffffffffffff03  CS: 0010  SS: 0018

#20 [ffff8805f78e5d88] spl_kmem_cache_reap_now at ffffffffa05465b0 [spl]
#21 [ffff8805f78e5de8] __spl_kmem_cache_generic_shrinker at ffffffffa0547acb [spl]
#22 [ffff8805f78e5e18] spl_kmem_reap at ffffffffa0547bb7 [spl]
#23 [ffff8805f78e5e38] zpl_prune_sbs at ffffffffa067b4e8 [zfs]
#24 [ffff8805f78e5e58] arc_adjust_meta at ffffffffa05d9200 [zfs]
#25 [ffff8805f78e5ea8] arc_adapt_thread at ffffffffa05d932a [zfs]
#26 [ffff8805f78e5eb8] thread_generic_wrapper at ffffffffa054a6c8 [spl]
#27 [ffff8805f78e5ee8] kthread at ffffffff81090886
#28 [ffff8805f78e5f48] kernel_thread at ffffffff8100c14a

log output:

<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
<4>Pid: 3452, comm: arc_adapt Tainted: P ---------------- T 2.6.32-220.el6.x86_64 #1
<4>Call Trace:
<4> [] ? panic+0x78/0x143
<4> [] ? watchdog_overflow_callback+0xcd/0xd0
<4> [] ? __perf_event_overflow+0x9d/0x230
<4> [] ? perf_event_overflow+0x14/0x20
<4> [] ? intel_pmu_handle_irq+0x336/0x550
<4> [] ? kprobe_exceptions_notify+0x16/0x430
<4> [] ? perf_event_nmi_handler+0x39/0xb0
<4> [] ? notifier_call_chain+0x55/0x80
<4> [] ? atomic_notifier_call_chain+0x1a/0x20
<4> [] ? notify_die+0x2e/0x30
<4> [] ? do_nmi+0x173/0x2b0
<4> [] ? nmi+0x20/0x30
<4> [] ? _spin_lock+0x1e/0x30
<4> <> [] ? spl_cache_flush+0x52/0x2c0 [spl]
<4> [] ? __do_softirq+0x11a/0x1d0
<4> [] ? spl_magazine_age+0x0/0x60 [spl]
<4> [] ? spl_magazine_age+0x50/0x60 [spl]
<4> [] ? generic_smp_call_function_interrupt+0x90/0x1b0
<4> [] ? smp_call_function_interrupt+0x27/0x40
<4> [] ? call_function_interrupt+0x13/0x20
<4> [] ? spl_slab_reclaim+0x220/0x3f0 [spl]
<4> [] ? arc_buf_data_free+0x38/0xc0 [zfs]
<4> [] ? zpl_prune_sbs+0x0/0x50 [zfs]
<4> [] ? spl_kmem_cache_reap_now+0x100/0x1e0 [spl]
<4> [] ? dispose_list+0xfc/0x120
<4> [] ? zpl_prune_sbs+0x0/0x50 [zfs]
<4> [] ? __spl_kmem_cache_generic_shrinker+0x4b/0xe0 [spl]
<4> [] ? spl_kmem_reap+0x27/0x30 [spl]
<4> [] ? zpl_prune_sbs+0x48/0x50 [zfs]
<4> [] ? arc_adjust_meta+0x120/0x1e0 [zfs]
<4> [] ? arc_adapt_thread+0x0/0xd0 [zfs]
<4> [] ? arc_adapt_thread+0x0/0xd0 [zfs]
<4> [] ? arc_adapt_thread+0x6a/0xd0 [zfs]
<4> [] ? thread_generic_wrapper+0x68/0x80 [spl]
<4> [] ? thread_generic_wrapper+0x0/0x80 [spl]
<4> [] ? kthread+0x96/0xa0
<4> [] ? child_rip+0xa/0x20
<4> [] ? kthread+0x0/0xa0
<4> [] ? child_rip+0x0/0x20

ps |grep UN
566 2 1 ffff88062963b500 UN 0.0 0 0 [kjournald]
2683 1 1 ffff88062809a0c0 UN 0.0 78676 1336 master
2963 1 3 ffff880620d63580 UN 0.1 27116 14536 ccAMRd
3028 1 3 ffff88062963f500 UN 0.1 33420 26684 ccDMd
3267 1 1 ffff88062c9d74c0 UN 0.0 228440 2460 gdm-smartcard-w
3363 1 1 ffff88062c3ad580 UN 0.0 2504 588 ccMFTTd
4819 1 1 ffff8804498cf500 UN 0.0 339128 1600 rs:main Q:Reg
5376 1 1 ffff88062bb94a80 UN 0.0 400752 10044 glusterfs
5387 1 1 ffff8805f58a7580 UN 0.1 334584 22624 glusterfs
6717 1 3 ffff88062b33aac0 UN 0.1 31360 22056 ccTAd
10703 2683 3 ffff8805ff7fcb00 UN 0.0 78756 3228 pickup
16686 2963 1 ffff8802bec69500 UN 0.1 23652 21120 copUpdater
22970 1 3 ffff8805f6083540 UN 0.1 242428 15072 glusterd
22997 1 1 ffff88062d2f0a80 UN 0.1 1257164 27220 glusterfsd
23009 1 3 ffff8805f43fcb40 UN 0.1 310960 30684 glusterfs
23180 1 1 ffff8802bec68ac0 UN 0.1 1257164 27220 glusterfsd
24610 1 3 ffff8801d1f58080 UN 0.0 94976 1220 storageclient
24617 24612 3 ffff8805b2452b40 UN 0.0 4996 208 ps
24643 25277 1 ffff88062d06b4c0 UN 0.0 4996 208 ps
24650 25277 1 ffff880302506b00 UN 0.0 4996 212 ps
24655 25277 1 ffff8803de834080 UN 0.0 4996 212 ps
25277 1 3 ffff8804498ce080 UN 0.1 39856 30776 ccNGd

The text was updated successfully, but these errors were encountered:

behlendorf · 2013-04-30T19:14:58Z

This was fixed post -rc13. If this is chronic problem my suggestion would be to update to 0.6.1 it include quite a number of improvements and bug fixes.

behlendorf closed this as completed Apr 30, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel panic - not syncing: Watchdog detected hard LOCKUP [exception RIP: spl_slab_reclaim+544 #233

Kernel panic - not syncing: Watchdog detected hard LOCKUP [exception RIP: spl_slab_reclaim+544 #233

inevity commented Apr 30, 2013

behlendorf commented Apr 30, 2013

Kernel panic - not syncing: Watchdog detected hard LOCKUP [exception RIP: spl_slab_reclaim+544 #233

Kernel panic - not syncing: Watchdog detected hard LOCKUP [exception RIP: spl_slab_reclaim+544 #233

Comments

inevity commented Apr 30, 2013

behlendorf commented Apr 30, 2013