Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

Kernel panic - not syncing: Watchdog detected hard LOCKUP [exception RIP: spl_slab_reclaim+544 #233

Closed
inevity opened this issue Apr 30, 2013 · 1 comment
Labels
Milestone

Comments

@inevity
Copy link

inevity commented Apr 30, 2013

hi,all
kernel panic hard lockup on cpu 2 .can anyone help ?

 SPL: Loaded module v0.6.0-rc13
 ZFS: Loaded module v0.6.0-rc13, ZFS pool version 28, ZFS filesystem version 5
 kernel: zunicode: module license 'CDDL' taints kernel.

crash output:

       KERNEL:2.6.32-220.el6.x86_64/
DUMPFILE:vmcore  [PARTIAL DUMP]
    CPUS: 4
    DATE: Fri Apr 26 16:56:13 2013
  UPTIME: 39 days, 02:03:01

LOAD AVERAGE: 30.46, 9.19, 3.93
TASKS: 501
NODENAME:
RELEASE: 2.6.32-220.el6.x86_64
VERSION: #1 SMP Tue Dec 6 19:48:22 GMT 2011
MACHINE: x86_64 (2133 Mhz)
MEMORY: 24 GB
PANIC: "Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2"
PID: 3452
COMMAND: "arc_adapt"
TASK: ffff88062bb6ea80 [THREAD_INFO: ffff8805f78e4000]
CPU: 2
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 3452 TASK: ffff88062bb6ea80 CPU: 2 COMMAND: "arc_adapt"
#0 [ffff88003e647b00] machine_kexec at ffffffff81031fcb
#1 [ffff88003e647b60] crash_kexec at ffffffff810b8f72
#2 [ffff88003e647c30] panic at ffffffff814ec328
#3 [ffff88003e647cb0] watchdog_overflow_callback at ffffffff810d8fad
#4 [ffff88003e647cd0] __perf_event_overflow at ffffffff8110a89d
#5 [ffff88003e647d70] perf_event_overflow at ffffffff8110ae54
#6 [ffff88003e647d80] intel_pmu_handle_irq at ffffffff8101e096
#7 [ffff88003e647e90] perf_event_nmi_handler at ffffffff814f09d9
#8 [ffff88003e647ea0] notifier_call_chain at ffffffff814f2525
#9 [ffff88003e647ee0] atomic_notifier_call_chain at ffffffff814f258a
#10 [ffff88003e647ef0] notify_die at ffffffff81096bce
#11 [ffff88003e647f20] do_nmi at ffffffff814f01a3
#12 [ffff88003e647f50] nmi at ffffffff814efab0

[exception RIP: _spin_lock+30]
RIP: ffffffff814ef31e  RSP: ffff88003e643ec8  RFLAGS: 00000097
RAX: 0000000000006f7b  RBX: ffff8805fb260000  RCX: ffff880630bbe040
RDX: 0000000000006f7a  RSI: ffff8805facb1c00  RDI: ffff8805fb2680a8
RBP: ffff88003e643ec8   R8: 8080000000000000   R9: 0000000000000001
R10: 000000000000000f  R11: 0000000000000020  R12: 0000000000000002
R13: 00000000ffffffff  R14: ffffffff81a831d0  R15: 0000000000000020
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

--- ---
#13 [ffff88003e643ec8] _spin_lock at ffffffff814ef31e
#14 [ffff88003e643ed0] spl_cache_flush at ffffffffa0546a72 [spl]
#15 [ffff88003e643f50] spl_magazine_age at ffffffffa05470e0 [spl]
#16 [ffff88003e643f60] generic_smp_call_function_interrupt at ffffffff810a7060
#17 [ffff88003e643fa0] smp_call_function_interrupt at ffffffff8102a287
#18 [ffff88003e643fb0] call_function_interrupt at ffffffff8100bdd3

--- ---
#19 [ffff8805f78e5c38] call_function_interrupt at ffffffff8100bdd3

[exception RIP: spl_slab_reclaim+544]
RIP: ffffffffa05462e0  RSP: ffff8805f78e5ce0  RFLAGS: 00000202
RAX: 0000000000006f7a  RBX: ffff8805f78e5d80  RCX: 0000000000000005
RDX: 0000000000006f7a  RSI: 0000000003ffffff  RDI: ffff8805fb268090
RBP: ffffffff8100bdce   R8: 8080000000000000   R9: 0000000000000001
R10: 0000000000000000  R11: 000000000000000a  R12: ffff880500000ac0
R13: ffff880000000000  R14: ffff8805f78e5c70  R15: ffff88003e653b40
ORIG_RAX: ffffffffffffff03  CS: 0010  SS: 0018

#20 [ffff8805f78e5d88] spl_kmem_cache_reap_now at ffffffffa05465b0 [spl]
#21 [ffff8805f78e5de8] __spl_kmem_cache_generic_shrinker at ffffffffa0547acb [spl]
#22 [ffff8805f78e5e18] spl_kmem_reap at ffffffffa0547bb7 [spl]
#23 [ffff8805f78e5e38] zpl_prune_sbs at ffffffffa067b4e8 [zfs]
#24 [ffff8805f78e5e58] arc_adjust_meta at ffffffffa05d9200 [zfs]
#25 [ffff8805f78e5ea8] arc_adapt_thread at ffffffffa05d932a [zfs]
#26 [ffff8805f78e5eb8] thread_generic_wrapper at ffffffffa054a6c8 [spl]
#27 [ffff8805f78e5ee8] kthread at ffffffff81090886
#28 [ffff8805f78e5f48] kernel_thread at ffffffff8100c14a

log output:

<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
<4>Pid: 3452, comm: arc_adapt Tainted: P ---------------- T 2.6.32-220.el6.x86_64 #1
<4>Call Trace:
<4> [] ? panic+0x78/0x143
<4> [] ? watchdog_overflow_callback+0xcd/0xd0
<4> [] ? __perf_event_overflow+0x9d/0x230
<4> [] ? perf_event_overflow+0x14/0x20
<4> [] ? intel_pmu_handle_irq+0x336/0x550
<4> [] ? kprobe_exceptions_notify+0x16/0x430
<4> [] ? perf_event_nmi_handler+0x39/0xb0
<4> [] ? notifier_call_chain+0x55/0x80
<4> [] ? atomic_notifier_call_chain+0x1a/0x20
<4> [] ? notify_die+0x2e/0x30
<4> [] ? do_nmi+0x173/0x2b0
<4> [] ? nmi+0x20/0x30
<4> [] ? _spin_lock+0x1e/0x30
<4> <> [] ? spl_cache_flush+0x52/0x2c0 [spl]
<4> [] ? __do_softirq+0x11a/0x1d0
<4> [] ? spl_magazine_age+0x0/0x60 [spl]
<4> [] ? spl_magazine_age+0x50/0x60 [spl]
<4> [] ? generic_smp_call_function_interrupt+0x90/0x1b0
<4> [] ? smp_call_function_interrupt+0x27/0x40
<4> [] ? call_function_interrupt+0x13/0x20
<4> [] ? spl_slab_reclaim+0x220/0x3f0 [spl]
<4> [] ? arc_buf_data_free+0x38/0xc0 [zfs]
<4> [] ? zpl_prune_sbs+0x0/0x50 [zfs]
<4> [] ? spl_kmem_cache_reap_now+0x100/0x1e0 [spl]
<4> [] ? dispose_list+0xfc/0x120
<4> [] ? zpl_prune_sbs+0x0/0x50 [zfs]
<4> [] ? __spl_kmem_cache_generic_shrinker+0x4b/0xe0 [spl]
<4> [] ? spl_kmem_reap+0x27/0x30 [spl]
<4> [] ? zpl_prune_sbs+0x48/0x50 [zfs]
<4> [] ? arc_adjust_meta+0x120/0x1e0 [zfs]
<4> [] ? arc_adapt_thread+0x0/0xd0 [zfs]
<4> [] ? arc_adapt_thread+0x0/0xd0 [zfs]
<4> [] ? arc_adapt_thread+0x6a/0xd0 [zfs]
<4> [] ? thread_generic_wrapper+0x68/0x80 [spl]
<4> [] ? thread_generic_wrapper+0x0/0x80 [spl]
<4> [] ? kthread+0x96/0xa0
<4> [] ? child_rip+0xa/0x20
<4> [] ? kthread+0x0/0xa0
<4> [] ? child_rip+0x0/0x20

ps |grep UN
566 2 1 ffff88062963b500 UN 0.0 0 0 [kjournald]
2683 1 1 ffff88062809a0c0 UN 0.0 78676 1336 master
2963 1 3 ffff880620d63580 UN 0.1 27116 14536 ccAMRd
3028 1 3 ffff88062963f500 UN 0.1 33420 26684 ccDMd
3267 1 1 ffff88062c9d74c0 UN 0.0 228440 2460 gdm-smartcard-w
3363 1 1 ffff88062c3ad580 UN 0.0 2504 588 ccMFTTd
4819 1 1 ffff8804498cf500 UN 0.0 339128 1600 rs:main Q:Reg
5376 1 1 ffff88062bb94a80 UN 0.0 400752 10044 glusterfs
5387 1 1 ffff8805f58a7580 UN 0.1 334584 22624 glusterfs
6717 1 3 ffff88062b33aac0 UN 0.1 31360 22056 ccTAd
10703 2683 3 ffff8805ff7fcb00 UN 0.0 78756 3228 pickup
16686 2963 1 ffff8802bec69500 UN 0.1 23652 21120 copUpdater
22970 1 3 ffff8805f6083540 UN 0.1 242428 15072 glusterd
22997 1 1 ffff88062d2f0a80 UN 0.1 1257164 27220 glusterfsd
23009 1 3 ffff8805f43fcb40 UN 0.1 310960 30684 glusterfs
23180 1 1 ffff8802bec68ac0 UN 0.1 1257164 27220 glusterfsd
24610 1 3 ffff8801d1f58080 UN 0.0 94976 1220 storageclient
24617 24612 3 ffff8805b2452b40 UN 0.0 4996 208 ps
24643 25277 1 ffff88062d06b4c0 UN 0.0 4996 208 ps
24650 25277 1 ffff880302506b00 UN 0.0 4996 212 ps
24655 25277 1 ffff8803de834080 UN 0.0 4996 212 ps
25277 1 3 ffff8804498ce080 UN 0.1 39856 30776 ccNGd

@behlendorf
Copy link
Contributor

This was fixed post -rc13. If this is chronic problem my suggestion would be to update to 0.6.1 it include quite a number of improvements and bug fixes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants