soft lockups und NULL pointer dereferenes after upgrade to ubuntu 12.04 #837

phillipp · 2012-07-17T21:47:16Z

On our largest system, we have lots of soft lockups and null pointer dereferences now (sometimes more than two per day) which makes it VERY hard to maintan uptime greater that 95%.

We noticed, that the problems started just after upgrading to ubuntu 12.04. Before that, we had no problems. You see traces of the soft lockups and the null pointer dereferences (as in the ticket #805) below. I wrote a ticket for the NULL pointer dereferences already, but as this seems to have to do with the ubuntu upgrade (10.10 => 12.04) we did before.

The system has 64GB of ram. Maybe freeing is an issue on that amount?

zpool, zfs, spl+zfs version (and one error), uname

zpool status
  pool: lima-city
 state: ONLINE
 scan: scrub in progress since Mon Jul 16 00:00:02 2012
    1.34T scanned out of 1.14T at 6.32M/s, (scan is slow, no estimated time)
    0 repaired, 117.46% done
config:
        NAME        STATE     READ WRITE CKSUM
        lima-city   ONLINE       0     0     0
          vdb1      ONLINE       0     0     0
          vdc1      ONLINE       0     0     0
errors: No known data errors

zfs list
NAME                          USED  AVAIL  REFER  MOUNTPOINT
lima-city                    1.40T   523G   697K  /lima-city
lima-city/webspace-logs       103G   523G   101G  /lima-city/webspace-logs
lima-city/webspaces-nodedup  1.07T   761G  1.05T  /lima-city/webspaces-nodedup

dmesg:

[   14.281153] SPL: Loaded module v0.6.0.67-rc9, using hostid 0x53d4822d
[   18.245307] ZFS: Loaded module v0.6.0.67-rc9, ZFS pool version 28, ZFS filesystem version 5
[  935.572117] SPLError: 11374:0:(spl-err.c:67:vcmn_err()) WARNING: ZFS replay transaction error 5, dataset lima-city/webspaces-nodedup, seq 0x478b, txtype 9

uname -a
Linux webspace6 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Syslog:

Jul 12 11:22:58 webspace6 kernel: [169847.488598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
Jul 12 11:22:58 webspace6 kernel: [169847.492047] IP: [<ffffffffa01cd6e1>] dbuf_find+0x81/0x100 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047] PGD 0 
Jul 12 11:22:58 webspace6 kernel: [169847.492047] Oops: 0000 [#1] SMP 
Jul 12 11:22:58 webspace6 kernel: [169847.492047] CPU 9 
Jul 12 11:22:58 webspace6 kernel: [169847.492047] Modules linked in: dm_crypt nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc psmouse ppdev microcode ext2 serio_raw joydev i2c_piix4 mac_hid parport_pc i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 12 11:22:58 webspace6 kernel: [169847.492047] 
Jul 12 11:22:58 webspace6 kernel: [169847.492047] Pid: 17013, comm: apache2 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
Jul 12 11:22:58 webspace6 kernel: [169847.492047] RIP: 0010:[<ffffffffa01cd6e1>]  [<ffffffffa01cd6e1>] dbuf_find+0x81/0x100 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047] RSP: 0000:ffff88004acb9838  EFLAGS: 00010202
Jul 12 11:22:58 webspace6 kernel: [169847.492047] RAX: 0000000000000001 RBX: 000000000791d5de RCX: 0000000000000000
Jul 12 11:22:58 webspace6 kernel: [169847.492047] RDX: ffffffffa0293bf0 RSI: 000000000791d5de RDI: ffffffffa0293bf0
Jul 12 11:22:58 webspace6 kernel: [169847.492047] RBP: ffff88004acb9878 R08: 004bf4ed475d5d87 R09: ffff8809b0317800
Jul 12 11:22:58 webspace6 kernel: [169847.492047] R10: ffff88004acb9940 R11: 0000000000000000 R12: ffff880f9b38b000
Jul 12 11:22:58 webspace6 kernel: [169847.492047] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000af2f0b
Jul 12 11:22:58 webspace6 kernel: [169847.492047] FS:  00007f0c7ff57740(0000) GS:ffff88101fd20000(0000) knlGS:0000000000000000
Jul 12 11:22:58 webspace6 kernel: [169847.492047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 12 11:22:58 webspace6 kernel: [169847.492047] CR2: 0000000000000001 CR3: 0000000ed7a38000 CR4: 00000000000006e0
Jul 12 11:22:58 webspace6 kernel: [169847.492047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 12 11:22:58 webspace6 kernel: [169847.492047] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 12 11:22:58 webspace6 kernel: [169847.492047] Process apache2 (pid: 17013, threadinfo ffff88004acb8000, task ffff880f414244a0)
Jul 12 11:22:58 webspace6 kernel: [169847.492047] Stack:
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  00000000000080d0 ffffffffa0293bf0 ffff88004acb98d8 ffff8809b0317800
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  0000000000000000 0000000000000000 0000000000000001 ffffffffa026c58f
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  ffff88004acb98d8 ffffffffa01cf499 0000000000000000 0000000000000000
Jul 12 11:22:58 webspace6 kernel: [169847.492047] Call Trace:
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa01cf499>] __dbuf_hold_impl+0xa9/0x490 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa01cf906>] dbuf_hold_impl+0x86/0xc0 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa01cfea0>] dbuf_hold+0x20/0x30 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa01d5941>] dmu_buf_hold_array_by_dnode+0x141/0x4b0 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa01d6317>] dmu_read+0x97/0x180 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8166a2ee>] ? mutex_lock+0x1e/0x50
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa0250362>] zfs_getpage+0x132/0x230 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa0263c70>] ? zpl_writepages+0x20/0x20 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa0263ca1>] zpl_readpage+0x31/0x70 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8112cfda>] read_cache_pages+0xba/0x130
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffffa0263c4e>] zpl_readpages+0x1e/0x20 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8112c828>] read_pages+0x48/0x100
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8112ca3b>] __do_page_cache_readahead+0x15b/0x170
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8112ca71>] ra_submit+0x21/0x30
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff811212d8>] filemap_fault+0x358/0x440
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff81149422>] __do_fault+0x72/0x570
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8112e0ed>] ? lru_cache_add_lru+0x2d/0x50
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff811526ad>] ? page_add_new_anon_rmap+0x9d/0xf0
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff81149a07>] handle_pte_fault+0xe7/0x1f0
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8114cb48>] ? vma_link+0x88/0xe0
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff81149ce5>] handle_mm_fault+0x1d5/0x330
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff816704cc>] do_page_fault+0x14c/0x490
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8114f2f4>] ? do_mmap_pgoff+0x354/0x3a0
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  [<ffffffff8166cee5>] page_fault+0x25/0x30
Jul 12 11:22:58 webspace6 kernel: [169847.492047] Code: e8 15 cc 49 e1 48 8b 05 c6 63 0c 00 48 8b 55 c8 4a 8b 04 f8 48 85 c0 75 12 eb 67 0f 1f 80 00 00 00 00 48 8b 40 38 48 85 c0 74 57 <48> 39 18 75 f2 4c 39 60 20 75 ec 44 38 68 50 75 e6 4c 39 70 40 
Jul 12 11:22:58 webspace6 kernel: [169847.492047] RIP  [<ffffffffa01cd6e1>] dbuf_find+0x81/0x100 [zfs]
Jul 12 11:22:58 webspace6 kernel: [169847.492047]  RSP <ffff88004acb9838>
Jul 12 11:22:58 webspace6 kernel: [169847.492047] CR2: 0000000000000001
Jul 12 11:22:58 webspace6 kernel: [169847.611031] ---[ end trace f08c052de9514c03 ]---

Jul 13 17:11:36 webspace6 kernel: [ 1736.602368] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
Jul 13 17:11:36 webspace6 kernel: [ 1736.602677] IP: [<ffffffffa01cd6e1>] dbuf_find+0x81/0x100 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.603130] PGD 0 
Jul 13 17:11:36 webspace6 kernel: [ 1736.603240] Oops: 0000 [#1] SMP 
Jul 13 17:11:36 webspace6 kernel: [ 1736.603394] CPU 4 
Jul 13 17:11:36 webspace6 kernel: [ 1736.603464] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ext2 psmouse parport_pc serio_raw joydev i6300esb microcode mac_hid lp i2c_piix4 parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] 
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] Pid: 9034, comm: apache2 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] RIP: 0010:[<ffffffffa01cd6e1>]  [<ffffffffa01cd6e1>] dbuf_find+0x81/0x100 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] RSP: 0018:ffff880cc57f19f8  EFLAGS: 00010202
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] RAX: 0000000000000002 RBX: 000000000f7b0e48 RCX: 0000000000000000
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] RDX: ffffffffa0293bf0 RSI: 000000000f7b0e48 RDI: ffffffffa0293bf0
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] RBP: ffff880cc57f1a38 R08: 00cffd6fca20e741 R09: ffff880db711a800
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] R10: ffff880cc57f1c00 R11: 0000000000000293 R12: ffff880faa378800
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000ac150b
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] FS:  00007fd1ddec6740(0000) GS:ffff88101fc80000(0000) knlGS:0000000000000000
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] CR2: 0000000000000002 CR3: 0000000d2b1ef000 CR4: 00000000000006e0
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] Process apache2 (pid: 9034, threadinfo ffff880cc57f0000, task ffff880fc404c4a0)
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032] Stack:
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032]  00000000000080d0 ffffffffa0293bf0 ffff880cc57f1a98 ffff880db711a800
Jul 13 17:11:36 webspace6 kernel: [ 1736.604032]  0000000000000000 0000000000000000 0000000000000001 ffffffffa026cbb0
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  ffff880cc57f1a98 ffffffffa01cf499 0000000000000000 0000000000000000
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199] Call Trace:
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa01cf499>] __dbuf_hold_impl+0xa9/0x490 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa01cf906>] dbuf_hold_impl+0x86/0xc0 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa01e1620>] dmu_tx_count_write+0x200/0x6d0 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffff8116f545>] ? __kmalloc+0xf5/0x170
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa01e1b34>] dmu_tx_hold_write+0x44/0x60 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa02507a6>] zfs_write+0x346/0xc50 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa0263ace>] ? zpl_open+0x6e/0x90 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa0263a60>] ? zpl_release+0x70/0x70 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffff811820e5>] ? __dentry_open+0x275/0x310
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffff811a2541>] ? mntput_no_expire+0x61/0x190
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa01250e0>] ? tsd_exit+0x1a0/0x1d0 [spl]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa0263ed2>] zpl_write_common+0x52/0x80 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffffa0263f68>] zpl_write+0x68/0xa0 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffff81183fc8>] vfs_write+0xc8/0x190
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffff81184181>] sys_write+0x51/0x90
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  [<ffffffff81674e69>] system_call_fastpath+0x16/0x1b
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199] Code: e8 15 cc 49 e1 48 8b 05 c6 63 0c 00 48 8b 55 c8 4a 8b 04 f8 48 85 c0 75 12 eb 67 0f 1f 80 00 00 00 00 48 8b 40 38 48 85 c0 74 57 <48> 39 18 75 f2 4c 39 60 20 75 ec 44 38 68 50 75 e6 4c 39 70 40 
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199] RIP  [<ffffffffa01cd6e1>] dbuf_find+0x81/0x100 [zfs]
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199]  RSP <ffff880cc57f19f8>
Jul 13 17:11:36 webspace6 kernel: [ 1736.619199] CR2: 0000000000000002
Jul 13 17:11:36 webspace6 kernel: [ 1736.640331] ---[ end trace 2dfeda8b8bb7334c ]---

Jul 17 18:08:39 webspace6 kernel: [ 4848.472063] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:29328]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472331] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl dm_crypt sunrpc psmouse serio_raw joydev i2c_piix4 ppdev mac_hid microcode parport_pc ext2 i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 17 18:08:39 webspace6 kernel: [ 4848.472367] CPU 0 
Jul 17 18:08:39 webspace6 kernel: [ 4848.472369] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl dm_crypt sunrpc psmouse serio_raw joydev i2c_piix4 ppdev mac_hid microcode parport_pc ext2 i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 17 18:08:39 webspace6 kernel: [ 4848.472393] 
Jul 17 18:08:39 webspace6 kernel: [ 4848.472402] Pid: 29328, comm: kworker/0:0 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
Jul 17 18:08:39 webspace6 kernel: [ 4848.472450] RIP: 0010:[<ffffffff810afb32>]  [<ffffffff810afb32>] smp_call_function_many+0x1f2/0x260
Jul 17 18:08:39 webspace6 kernel: [ 4848.472493] RSP: 0018:ffff880b06af9b40  EFLAGS: 00000202
Jul 17 18:08:39 webspace6 kernel: [ 4848.472496] RAX: 0000000000000100 RBX: ffff88101fc14270 RCX: 000000000000000c
Jul 17 18:08:39 webspace6 kernel: [ 4848.472498] RDX: 0000000000000100 RSI: 0000000000000100 RDI: 0000000000000286
Jul 17 18:08:39 webspace6 kernel: [ 4848.472501] RBP: ffff880b06af9b90 R08: ffffffff81cd2260 R09: 0000000000000100
Jul 17 18:08:39 webspace6 kernel: [ 4848.472503] R10: 0000000000000004 R11: 0000000000000000 R12: ffff880b06af9d18
Jul 17 18:08:39 webspace6 kernel: [ 4848.472506] R13: 0000000000000286 R14: ffff88101fc137c0 R15: ffff880b06af9b30
Jul 17 18:08:39 webspace6 kernel: [ 4848.472514] FS:  00007fbf5846d740(0000) GS:ffff88101fc00000(0000) knlGS:0000000000000000
Jul 17 18:08:39 webspace6 kernel: [ 4848.472517] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 17 18:08:39 webspace6 kernel: [ 4848.472519] CR2: 00007f6229aba138 CR3: 0000000aacc15000 CR4: 00000000000006f0
Jul 17 18:08:39 webspace6 kernel: [ 4848.472529] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 17 18:08:39 webspace6 kernel: [ 4848.472534] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 17 18:08:39 webspace6 kernel: [ 4848.472537] Process kworker/0:0 (pid: 29328, threadinfo ffff880b06af8000, task ffff880b23185b80)
Jul 17 18:08:39 webspace6 kernel: [ 4848.472540] Stack:
Jul 17 18:08:39 webspace6 kernel: [ 4848.472541]  00000000000007b4 0100000000002fe9 0000000000000000 ffff88101fc14270
Jul 17 18:08:39 webspace6 kernel: [ 4848.472550]  0000000000000000 ffffffff81048e00 0000000000000000 ffff880b06af9c00
Jul 17 18:08:39 webspace6 kernel: [ 4848.472554]  0000000000008173 0000000000000000 ffff880b06af9ba0 ffffffff810afd02
Jul 17 18:08:39 webspace6 kernel: [ 4848.472558] Call Trace:
Jul 17 18:08:39 webspace6 kernel: [ 4848.472580]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 17 18:08:39 webspace6 kernel: [ 4848.472584]  [<ffffffff810afd02>] smp_call_function+0x22/0x30
Jul 17 18:08:39 webspace6 kernel: [ 4848.472587]  [<ffffffff810afd3b>] on_each_cpu+0x2b/0x70
Jul 17 18:08:39 webspace6 kernel: [ 4848.472591]  [<ffffffff81048bec>] flush_tlb_all+0x1c/0x20
Jul 17 18:08:39 webspace6 kernel: [ 4848.472610]  [<ffffffff81155572>] __purge_vmap_area_lazy+0x102/0x200
Jul 17 18:08:39 webspace6 kernel: [ 4848.472614]  [<ffffffff811556ee>] free_vmap_area_noflush+0x7e/0x80
Jul 17 18:08:39 webspace6 kernel: [ 4848.472618]  [<ffffffff811558a5>] free_unmap_vmap_area+0x25/0x30
Jul 17 18:08:39 webspace6 kernel: [ 4848.472621]  [<ffffffff811558f0>] remove_vm_area+0x40/0xa0
Jul 17 18:08:39 webspace6 kernel: [ 4848.472625]  [<ffffffff81155a12>] __vunmap+0x32/0x110
Jul 17 18:08:39 webspace6 kernel: [ 4848.472629]  [<ffffffff81155b6a>] vfree+0x2a/0x30
Jul 17 18:08:39 webspace6 kernel: [ 4848.472683]  [<ffffffffa0119fb8>] kv_free.isra.7+0x68/0x70 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472694]  [<ffffffffa011bd15>] spl_slab_reclaim+0x335/0x440 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472704]  [<ffffffffa011c100>] ? spl_kmem_reap+0x30/0x30 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472718]  [<ffffffffa011c122>] spl_cache_age+0x22/0x50 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472734]  [<ffffffff8106ed2b>] process_one_work+0x12b/0x470
Jul 17 18:08:39 webspace6 kernel: [ 4848.472739]  [<ffffffff81071896>] worker_thread+0x176/0x420
Jul 17 18:08:39 webspace6 kernel: [ 4848.472743]  [<ffffffff81071720>] ? manage_workers+0x120/0x120
Jul 17 18:08:39 webspace6 kernel: [ 4848.472747]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 17 18:08:39 webspace6 kernel: [ 4848.472770]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 17 18:08:39 webspace6 kernel: [ 4848.472774]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 17 18:08:39 webspace6 kernel: [ 4848.472777]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 17 18:08:39 webspace6 kernel: [ 4848.472779] Code: e5 cc 5b 00 0f ae f0 48 8b 7d c8 ff 15 b0 ca b6 00 80 7d bf 00 0f 84 96 fe ff ff f6 43 20 01 0f 84 8c fe ff ff 0f 1f 40 00 f3 90 <f6> 43 20 01 75 f8 e9 7b fe ff ff 0f 1f 00 4c 89 ea 4c 89 f6 44 
Jul 17 18:08:39 webspace6 kernel: [ 4848.472809] Call Trace:
Jul 17 18:08:39 webspace6 kernel: [ 4848.472813]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 17 18:08:39 webspace6 kernel: [ 4848.472816]  [<ffffffff810afd02>] smp_call_function+0x22/0x30
Jul 17 18:08:39 webspace6 kernel: [ 4848.472819]  [<ffffffff810afd3b>] on_each_cpu+0x2b/0x70
Jul 17 18:08:39 webspace6 kernel: [ 4848.472823]  [<ffffffff81048bec>] flush_tlb_all+0x1c/0x20
Jul 17 18:08:39 webspace6 kernel: [ 4848.472826]  [<ffffffff81155572>] __purge_vmap_area_lazy+0x102/0x200
Jul 17 18:08:39 webspace6 kernel: [ 4848.472830]  [<ffffffff811556ee>] free_vmap_area_noflush+0x7e/0x80
Jul 17 18:08:39 webspace6 kernel: [ 4848.472834]  [<ffffffff811558a5>] free_unmap_vmap_area+0x25/0x30
Jul 17 18:08:39 webspace6 kernel: [ 4848.472838]  [<ffffffff811558f0>] remove_vm_area+0x40/0xa0
Jul 17 18:08:39 webspace6 kernel: [ 4848.472841]  [<ffffffff81155a12>] __vunmap+0x32/0x110
Jul 17 18:08:39 webspace6 kernel: [ 4848.472845]  [<ffffffff81155b6a>] vfree+0x2a/0x30
Jul 17 18:08:39 webspace6 kernel: [ 4848.472854]  [<ffffffffa0119fb8>] kv_free.isra.7+0x68/0x70 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472863]  [<ffffffffa011bd15>] spl_slab_reclaim+0x335/0x440 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472872]  [<ffffffffa011c100>] ? spl_kmem_reap+0x30/0x30 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472881]  [<ffffffffa011c122>] spl_cache_age+0x22/0x50 [spl]
Jul 17 18:08:39 webspace6 kernel: [ 4848.472885]  [<ffffffff8106ed2b>] process_one_work+0x12b/0x470
Jul 17 18:08:39 webspace6 kernel: [ 4848.472906]  [<ffffffff81071896>] worker_thread+0x176/0x420
Jul 17 18:08:39 webspace6 kernel: [ 4848.472910]  [<ffffffff81071720>] ? manage_workers+0x120/0x120
Jul 17 18:08:39 webspace6 kernel: [ 4848.472913]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 17 18:08:39 webspace6 kernel: [ 4848.472917]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 17 18:08:39 webspace6 kernel: [ 4848.472921]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 17 18:08:39 webspace6 kernel: [ 4848.472924]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 17 18:09:03 webspace6 kernel: [ 4871.936012] INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 0, t=15002 jiffies)
Jul 17 18:09:03 webspace6 kernel: [ 4871.936012] INFO: Stall ended before state dump start

Jul 17 18:09:07 webspace6 kernel: [ 4876.472048] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:29328]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472289] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl dm_crypt sunrpc psmouse serio_raw joydev i2c_piix4 ppdev mac_hid microcode parport_pc ext2 i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 17 18:09:07 webspace6 kernel: [ 4876.472325] CPU 0 
Jul 17 18:09:07 webspace6 kernel: [ 4876.472327] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl dm_crypt sunrpc psmouse serio_raw joydev i2c_piix4 ppdev mac_hid microcode parport_pc ext2 i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 17 18:09:07 webspace6 kernel: [ 4876.472346] 
Jul 17 18:09:07 webspace6 kernel: [ 4876.472350] Pid: 29328, comm: kworker/0:0 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
Jul 17 18:09:07 webspace6 kernel: [ 4876.472355] RIP: 0010:[<ffffffff810afb32>]  [<ffffffff810afb32>] smp_call_function_many+0x1f2/0x260
Jul 17 18:09:07 webspace6 kernel: [ 4876.472368] RSP: 0018:ffff880b06af9b40  EFLAGS: 00000202
Jul 17 18:09:07 webspace6 kernel: [ 4876.472371] RAX: 0000000000000100 RBX: ffff88101fc14270 RCX: 000000000000000c
Jul 17 18:09:07 webspace6 kernel: [ 4876.472374] RDX: 0000000000000100 RSI: 0000000000000100 RDI: 0000000000000286
Jul 17 18:09:07 webspace6 kernel: [ 4876.472376] RBP: ffff880b06af9b90 R08: ffffffff81cd2260 R09: 0000000000000100
Jul 17 18:09:07 webspace6 kernel: [ 4876.472378] R10: 0000000000000004 R11: 0000000000000000 R12: ffff880b06af9d18
Jul 17 18:09:07 webspace6 kernel: [ 4876.472381] R13: 0000000000000286 R14: ffff88101fc137c0 R15: ffff880b06af9b30
Jul 17 18:09:07 webspace6 kernel: [ 4876.472388] FS:  00007fbf5846d740(0000) GS:ffff88101fc00000(0000) knlGS:0000000000000000
Jul 17 18:09:07 webspace6 kernel: [ 4876.472391] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 17 18:09:07 webspace6 kernel: [ 4876.472393] CR2: 00007f6229aba138 CR3: 0000000aacc15000 CR4: 00000000000006f0
Jul 17 18:09:07 webspace6 kernel: [ 4876.472403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 17 18:09:07 webspace6 kernel: [ 4876.472408] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 17 18:09:07 webspace6 kernel: [ 4876.472411] Process kworker/0:0 (pid: 29328, threadinfo ffff880b06af8000, task ffff880b23185b80)
Jul 17 18:09:07 webspace6 kernel: [ 4876.472413] Stack:
Jul 17 18:09:07 webspace6 kernel: [ 4876.472415]  00000000000007b4 0100000000002fe9 0000000000000000 ffff88101fc14270
Jul 17 18:09:07 webspace6 kernel: [ 4876.472421]  0000000000000000 ffffffff81048e00 0000000000000000 ffff880b06af9c00
Jul 17 18:09:07 webspace6 kernel: [ 4876.472426]  0000000000008173 0000000000000000 ffff880b06af9ba0 ffffffff810afd02
Jul 17 18:09:07 webspace6 kernel: [ 4876.472430] Call Trace:
Jul 17 18:09:07 webspace6 kernel: [ 4876.472439]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 17 18:09:07 webspace6 kernel: [ 4876.472443]  [<ffffffff810afd02>] smp_call_function+0x22/0x30
Jul 17 18:09:07 webspace6 kernel: [ 4876.472446]  [<ffffffff810afd3b>] on_each_cpu+0x2b/0x70
Jul 17 18:09:07 webspace6 kernel: [ 4876.472450]  [<ffffffff81048bec>] flush_tlb_all+0x1c/0x20
Jul 17 18:09:07 webspace6 kernel: [ 4876.472456]  [<ffffffff81155572>] __purge_vmap_area_lazy+0x102/0x200
Jul 17 18:09:07 webspace6 kernel: [ 4876.472460]  [<ffffffff811556ee>] free_vmap_area_noflush+0x7e/0x80
Jul 17 18:09:07 webspace6 kernel: [ 4876.472464]  [<ffffffff811558a5>] free_unmap_vmap_area+0x25/0x30
Jul 17 18:09:07 webspace6 kernel: [ 4876.472467]  [<ffffffff811558f0>] remove_vm_area+0x40/0xa0
Jul 17 18:09:07 webspace6 kernel: [ 4876.472471]  [<ffffffff81155a12>] __vunmap+0x32/0x110
Jul 17 18:09:07 webspace6 kernel: [ 4876.472474]  [<ffffffff81155b6a>] vfree+0x2a/0x30
Jul 17 18:09:07 webspace6 kernel: [ 4876.472495]  [<ffffffffa0119fb8>] kv_free.isra.7+0x68/0x70 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472506]  [<ffffffffa011bd15>] spl_slab_reclaim+0x335/0x440 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472516]  [<ffffffffa011c100>] ? spl_kmem_reap+0x30/0x30 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472526]  [<ffffffffa011c122>] spl_cache_age+0x22/0x50 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472533]  [<ffffffff8106ed2b>] process_one_work+0x12b/0x470
Jul 17 18:09:07 webspace6 kernel: [ 4876.472537]  [<ffffffff81071896>] worker_thread+0x176/0x420
Jul 17 18:09:07 webspace6 kernel: [ 4876.472540]  [<ffffffff81071720>] ? manage_workers+0x120/0x120
Jul 17 18:09:07 webspace6 kernel: [ 4876.472545]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 17 18:09:07 webspace6 kernel: [ 4876.472550]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 17 18:09:07 webspace6 kernel: [ 4876.472591]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 17 18:09:07 webspace6 kernel: [ 4876.472595]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 17 18:09:07 webspace6 kernel: [ 4876.472597] Code: e5 cc 5b 00 0f ae f0 48 8b 7d c8 ff 15 b0 ca b6 00 80 7d bf 00 0f 84 96 fe ff ff f6 43 20 01 0f 84 8c fe ff ff 0f 1f 40 00 f3 90 <f6> 43 20 01 75 f8 e9 7b fe ff ff 0f 1f 00 4c 89 ea 4c 89 f6 44 
Jul 17 18:09:07 webspace6 kernel: [ 4876.472627] Call Trace:
Jul 17 18:09:07 webspace6 kernel: [ 4876.472631]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 17 18:09:07 webspace6 kernel: [ 4876.472634]  [<ffffffff810afd02>] smp_call_function+0x22/0x30
Jul 17 18:09:07 webspace6 kernel: [ 4876.472637]  [<ffffffff810afd3b>] on_each_cpu+0x2b/0x70
Jul 17 18:09:07 webspace6 kernel: [ 4876.472641]  [<ffffffff81048bec>] flush_tlb_all+0x1c/0x20
Jul 17 18:09:07 webspace6 kernel: [ 4876.472645]  [<ffffffff81155572>] __purge_vmap_area_lazy+0x102/0x200
Jul 17 18:09:07 webspace6 kernel: [ 4876.472649]  [<ffffffff811556ee>] free_vmap_area_noflush+0x7e/0x80
Jul 17 18:09:07 webspace6 kernel: [ 4876.472652]  [<ffffffff811558a5>] free_unmap_vmap_area+0x25/0x30
Jul 17 18:09:07 webspace6 kernel: [ 4876.472656]  [<ffffffff811558f0>] remove_vm_area+0x40/0xa0
Jul 17 18:09:07 webspace6 kernel: [ 4876.472659]  [<ffffffff81155a12>] __vunmap+0x32/0x110
Jul 17 18:09:07 webspace6 kernel: [ 4876.472663]  [<ffffffff81155b6a>] vfree+0x2a/0x30
Jul 17 18:09:07 webspace6 kernel: [ 4876.472674]  [<ffffffffa0119fb8>] kv_free.isra.7+0x68/0x70 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472683]  [<ffffffffa011bd15>] spl_slab_reclaim+0x335/0x440 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472693]  [<ffffffffa011c100>] ? spl_kmem_reap+0x30/0x30 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472702]  [<ffffffffa011c122>] spl_cache_age+0x22/0x50 [spl]
Jul 17 18:09:07 webspace6 kernel: [ 4876.472706]  [<ffffffff8106ed2b>] process_one_work+0x12b/0x470
Jul 17 18:09:07 webspace6 kernel: [ 4876.472709]  [<ffffffff81071896>] worker_thread+0x176/0x420
Jul 17 18:09:07 webspace6 kernel: [ 4876.472713]  [<ffffffff81071720>] ? manage_workers+0x120/0x120
Jul 17 18:09:07 webspace6 kernel: [ 4876.472716]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 17 18:09:07 webspace6 kernel: [ 4876.472719]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 17 18:09:07 webspace6 kernel: [ 4876.472723]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 17 18:09:07 webspace6 kernel: [ 4876.472727]  [<ffffffff81676160>] ? gs_change+0x13/0x13

The text was updated successfully, but these errors were encountered:

phillipp · 2012-07-18T00:39:26Z

We just had another dereference when about 200 processes were accessing zfs concurrently (this is our normal load).

Load jumped quickly from ~20(normal) to 140 and system froze (we have that symptoms everytime we experience a dereference).

Jul 18 01:53:20 webspace6 kernel: [27831.573958] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
Jul 18 01:53:20 webspace6 kernel: [27831.575502] IP: [<ffffffffa01cd741>] dbuf_find+0x81/0x100 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327] PGD 0
Jul 18 01:53:20 webspace6 kernel: [27831.576327] Oops: 0000 [#1] SMP
Jul 18 01:53:20 webspace6 kernel: [27831.576327] CPU 4
Jul 18 01:53:20 webspace6 kernel: [27831.576327] Modules linked in: dm_crypt joydev i2c_piix4 psmouse microcode mac_hid ppdev serio_raw nfsd parport_pc nfs lockd fscache auth_rpcgss nfs_acl sunrpc i6300esb ext2 lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 18 01:53:20 webspace6 kernel: [27831.576327]
Jul 18 01:53:20 webspace6 kernel: [27831.576327] Pid: 24030, comm: pure-ftpd Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
Jul 18 01:53:20 webspace6 kernel: [27831.576327] RIP: 0010:[<ffffffffa01cd741>]  [<ffffffffa01cd741>] dbuf_find+0x81/0x100 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327] RSP: 0018:ffff880693699988  EFLAGS: 00010202
Jul 18 01:53:20 webspace6 kernel: [27831.576327] RAX: 0000000000000001 RBX: 000000000f2a4849 RCX: 0000000000000000
Jul 18 01:53:20 webspace6 kernel: [27831.576327] RDX: ffffffffa0298590 RSI: 000000000f2a4849 RDI: ffffffffa0298590
Jul 18 01:53:20 webspace6 kernel: [27831.576327] RBP: ffff8806936999c8 R08: 00106b13fc47918b R09: ffff88069f220800
Jul 18 01:53:20 webspace6 kernel: [27831.576327] R10: ffff880693699a90 R11: 0000000000000000 R12: ffff880f99762000
Jul 18 01:53:20 webspace6 kernel: [27831.576327] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000af7658
Jul 18 01:53:20 webspace6 kernel: [27831.576327] FS:  00007f9946cfd700(0000) GS:ffff88101fc80000(0000) knlGS:0000000000000000
Jul 18 01:53:20 webspace6 kernel: [27831.576327] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 18 01:53:20 webspace6 kernel: [27831.576327] CR2: 0000000000000001 CR3: 0000000677a48000 CR4: 00000000000006e0
Jul 18 01:53:20 webspace6 kernel: [27831.576327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 18 01:53:20 webspace6 kernel: [27831.576327] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 18 01:53:20 webspace6 kernel: [27831.576327] Process pure-ftpd (pid: 24030, threadinfo ffff880693698000, task ffff8806e83f5b80)
Jul 18 01:53:20 webspace6 kernel: [27831.576327] Stack:
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  00000000000080d0 ffffffffa0298590 ffff880693699a28 ffff88069f220800
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  0000000000000000 0000000000000003 0000000000000001 ffffffffa026ebd0
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  ffff880693699a28 ffffffffa01cf579 0000000000060000 0000000000000000
Jul 18 01:53:20 webspace6 kernel: [27831.576327] Call Trace:
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa01cf579>] __dbuf_hold_impl+0xa9/0x490 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa01cf9e6>] dbuf_hold_impl+0x86/0xc0 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa01cff6f>] dbuf_hold_level+0x1f/0x30 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa01e195a>] dmu_tx_check_ioerr+0x4a/0x140 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa01e1d43>] dmu_tx_count_write+0x2f3/0x6e0 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa0119c9b>] ? kmem_alloc_debug+0x1eb/0x340 [spl]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffff8116f545>] ? __kmalloc+0xf5/0x170
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa0119c9b>] ? kmem_alloc_debug+0x1eb/0x340 [spl]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa0119c9b>] ? kmem_alloc_debug+0x1eb/0x340 [spl]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa01e2174>] dmu_tx_hold_write+0x44/0x60 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa02522f6>] zfs_write+0x346/0xc50 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa0265e32>] zpl_write_common+0x52/0x80 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffffa0265ec8>] zpl_write+0x68/0xa0 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffff81183fc8>] vfs_write+0xc8/0x190
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffff81184181>] sys_write+0x51/0x90
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  [<ffffffff81674e69>] system_call_fastpath+0x16/0x1b
Jul 18 01:53:20 webspace6 kernel: [27831.576327] Code: e8 b5 cb 49 e1 48 8b 05 66 a3 0c 00 48 8b 55 c8 4a 8b 04 f8 48 85 c0 75 12 eb 67 0f 1f 80 00 00 00 00 48 8b 40 38 48 85 c0 74 57 <48> 39 18 75 f2 4c 39 60 20 75 ec 44 38 68 50 75 e6 4c 39 70 40
Jul 18 01:53:20 webspace6 kernel: [27831.576327] RIP  [<ffffffffa01cd741>] dbuf_find+0x81/0x100 [zfs]
Jul 18 01:53:20 webspace6 kernel: [27831.576327]  RSP <ffff880693699988>
Jul 18 01:53:20 webspace6 kernel: [27831.576327] CR2: 0000000000000001
Jul 18 01:53:20 webspace6 kernel: [27831.633178] ---[ end trace ee7f67ef1f4a8f5d ]---

thomasba · 2012-07-18T17:13:39Z

Today around 12pm there where 14 soft lockups an this error showed up in the syslog:

Jul 18 12:12:06 webspace6 kernel: [34682.268234] INFO: task z_wr_iss/0:489 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.268825] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.269532] z_wr_iss/0      D 0000000000000000     0   489      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.269546]  ffff880fb40a9bd0 0000000000000046 ffff880fb40a9fd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.269562]  ffff880fb40a8010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.269574]  ffff880fb40a9fd8 00000000000137c0 ffff8805800e0000 ffff880fb40d8000
Jul 18 12:12:06 webspace6 kernel: [34682.269585] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.269625]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.269704]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.269716]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.269725]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.270142]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270277]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270369]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270456]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270546]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270636]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270736]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270824]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270921]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.270955]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.270985]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.271010]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.271020]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.271029]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.271039]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.271047]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.271055] INFO: task z_wr_iss/1:490 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.271678] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.273097] z_wr_iss/1      D 0000000000000001     0   490      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.273124]  ffff880fb40abbd0 0000000000000046 ffff880fb40abfd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.273159]  ffff880fb40aa010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.273192]  ffff880fb40abfd8 00000000000137c0 ffff880f9da95b80 ffff880fb40d96e0
Jul 18 12:12:06 webspace6 kernel: [34682.273224] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.273242]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.273258]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.273280]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.273296]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.273442]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.273540]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.273637]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.273733]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.273830]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.273927]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.274024]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.274121]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.274218]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.274252]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.274269]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.274301]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.274317]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.274333]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.274350]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.274368]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.274380] INFO: task z_wr_iss/2:491 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.276719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.278696] z_wr_iss/2      D 0000000000000002     0   491      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.278712]  ffff880fb40adbd0 0000000000000046 ffff880fb40adfd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.278730]  ffff880fb40ac010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.278742]  ffff880fb40adfd8 00000000000137c0 ffff880fb40b96e0 ffff880fb40dadc0
Jul 18 12:12:06 webspace6 kernel: [34682.278753] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.278777]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.278788]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.278799]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.278873]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.279024]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279121]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279213]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279301]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279391]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279481]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279579]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279666]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279755]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.279796]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.279810]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.279834]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.279846]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.279856]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.279866]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.279875]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.279885] INFO: task z_wr_iss/3:492 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.281364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.283466] z_wr_iss/3      D 0000000000000003     0   492      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.283479]  ffff880fb40afb70 0000000000000046 ffff880fb40affd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.283492]  ffff880fb40ae010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.283503]  ffff880fb40affd8 00000000000137c0 ffff8805816d5b80 ffff880fb40dc4a0
Jul 18 12:12:06 webspace6 kernel: [34682.283514] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.283526]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.283535]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.283545]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.283554]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.283653]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.283744]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.283834]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.283921]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.284020]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.284192]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.284289]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.284386]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.284481]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.284582]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.284618]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.284636]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.284673]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.284690]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.284706]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.284723]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.284738]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.284750] INFO: task z_wr_iss/4:493 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.285979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.288102] z_wr_iss/4      D 0000000000000004     0   493      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.288114]  ffff880fb4149b10 0000000000000046 ffff880fb4149fd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.288148]  ffff880fb4148010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.288182]  ffff880fb4149fd8 00000000000137c0 ffff880fc4040000 ffff880fb40ddb80
Jul 18 12:12:06 webspace6 kernel: [34682.288217] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.288234]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.288254]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.288271]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.288288]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.288393]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.288560]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.288653]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.288741]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.288830]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.288928]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.289125]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.289213]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.289326]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.289414]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.289509]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.289598]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.289610]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.289634]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.289643]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.289655]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.289665]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.289673]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.289681] INFO: task z_wr_iss/5:494 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.290918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.293003] z_wr_iss/5      D 0000000000000005     0   494      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.293020]  ffff880fb414bae0 0000000000000046 ffff880fb414bfd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.293047]  ffff880fb414a010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.293074]  ffff880fb414bfd8 00000000000137c0 ffff88057e46db80 ffff880fb418adc0
Jul 18 12:12:06 webspace6 kernel: [34682.293101] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.293113]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.293124]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.293135]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.293146]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.293193]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293236]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293278]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293320]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293362]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293405]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293449]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293490]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293532]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293573]  [<ffffffffa02615f5>] zio_ddt_write+0x485/0x490 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293615]  [<ffffffffa025ede0>] ? zio_ddt_child_write_done+0xb0/0xb0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293658]  [<ffffffffa025ed30>] ? zio_walk_parents+0x80/0x80 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293707]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.293731]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.293742]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.293759]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.293769]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.293779]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.293790]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.293801]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.293808] INFO: task z_wr_iss/6:495 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.294311] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.295077] z_wr_iss/6      D 0000000000000006     0   495      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.295082]  ffff880fb4139b10 0000000000000046 ffff880fb4139fd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.295087]  ffff880fb4138010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.295091]  ffff880fb4139fd8 00000000000137c0 ffff880fc8ab0000 ffff880fb41896e0
Jul 18 12:12:06 webspace6 kernel: [34682.295096] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.295100]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.295134]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.295138]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.295142]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.295182]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295218]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295254]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295296]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295337]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295373]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295410]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295445]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295480]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295515]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295550]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.295562]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.295567]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.295576]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.295580]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.295584]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.295588]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.295591]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.295594] INFO: task z_wr_iss/7:496 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.296133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.296955] z_wr_iss/7      D 0000000000000007     0   496      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.296960]  ffff880fb40f9ab0 0000000000000046 ffff880fb40f9fd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.296965]  ffff880fb40f8010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.296969]  ffff880fb40f9fd8 00000000000137c0 ffff880fb40696e0 ffff880fb4188000
Jul 18 12:12:06 webspace6 kernel: [34682.296974] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.296978]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.296981]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.296985]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.296990]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.297028]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297065]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297101]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297136]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297173]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297210]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297246]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297282]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297316]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297351]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297386]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297422]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.297433]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.297437]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.297447]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.297450]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.297454]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.297458]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.297461]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.297464] INFO: task z_wr_iss/8:497 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.297957] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.298782] z_wr_iss/8      D 0000000000000008     0   497      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.298786]  ffff880fb417fb10 0000000000000046 ffff880fb417ffd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.298791]  ffff880fb417e010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.298816]  ffff880fb417ffd8 00000000000137c0 ffff880fa61f0000 ffff880fb4180000
Jul 18 12:12:06 webspace6 kernel: [34682.298821] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.298825]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.298829]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.298833]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.298836]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.298878]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.298914]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.298950]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.298985]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299020]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299056]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299092]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299127]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299162]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299197]  [<ffffffffa0262125>] zio_ready+0x345/0x3b0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299233]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.299244]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.299248]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.299258]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.299262]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.299265]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.299270]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.299273]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:12:06 webspace6 kernel: [34682.299276] INFO: task z_wr_iss/9:498 blocked for more than 120 seconds.
Jul 18 12:12:06 webspace6 kernel: [34682.299789] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 12:12:06 webspace6 kernel: [34682.300612] z_wr_iss/9      D 0000000000000009     0   498      2 0x00000000
Jul 18 12:12:06 webspace6 kernel: [34682.300622]  ffff880fb4119bd0 0000000000000046 ffff880fb4119fd8 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.300648]  ffff880fb4118010 00000000000137c0 00000000000137c0 00000000000137c0
Jul 18 12:12:06 webspace6 kernel: [34682.300673]  ffff880fb4119fd8 00000000000137c0 ffff880fb406c4a0 ffff880fb41816e0
Jul 18 12:12:06 webspace6 kernel: [34682.300699] Call Trace:
Jul 18 12:12:06 webspace6 kernel: [34682.300710]  [<ffffffff8166ba29>] schedule+0x29/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.300721]  [<ffffffff8166bd1e>] schedule_preempt_disabled+0xe/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.300732]  [<ffffffff8166a3ff>] __mutex_lock_slowpath+0xdf/0x160
Jul 18 12:12:06 webspace6 kernel: [34682.300742]  [<ffffffff8166a2fb>] mutex_lock+0x2b/0x50
Jul 18 12:12:06 webspace6 kernel: [34682.300786]  [<ffffffffa0227f3d>] vdev_queue_io+0x6d/0x120 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.300828]  [<ffffffffa0260727>] zio_vdev_io_start+0x1e7/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.300871]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.300912]  [<ffffffffa0260a4f>] zio_nowait+0xaf/0x130 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.300955]  [<ffffffffa02273c8>] vdev_mirror_io_start+0x188/0x3c0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.300997]  [<ffffffffa0226c70>] ? vdev_config_sync+0x180/0x180 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.301040]  [<ffffffffa02199d3>] ? spa_config_enter+0xb3/0x100 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.301081]  [<ffffffffa0260777>] zio_vdev_io_start+0x237/0x2f0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.301128]  [<ffffffffa025e1df>] zio_execute+0x9f/0xf0 [zfs]
Jul 18 12:12:06 webspace6 kernel: [34682.301146]  [<ffffffffa011e18b>] taskq_thread+0x24b/0x5a0 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.301157]  [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
Jul 18 12:12:06 webspace6 kernel: [34682.301173]  [<ffffffffa011df40>] ? task_done+0x140/0x140 [spl]
Jul 18 12:12:06 webspace6 kernel: [34682.301184]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:12:06 webspace6 kernel: [34682.301195]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:12:06 webspace6 kernel: [34682.301207]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:12:06 webspace6 kernel: [34682.301217]  [<ffffffff81676160>] ? gs_change+0x13/0x13

And there’s an log entry longer than usual:

Jul 18 12:15:44 webspace6 kernel: [34900.496154] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:25982]
Jul 18 12:15:44 webspace6 kernel: [34900.496815] Modules linked in: dm_crypt nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw i2c_piix4 mac_hid microcode ppdev parport_pc ext2 i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 18 12:15:44 webspace6 kernel: [34900.496850] CPU 0 
Jul 18 12:15:44 webspace6 kernel: [34900.496852] Modules linked in: dm_crypt nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw i2c_piix4 mac_hid microcode ppdev parport_pc ext2 i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate xfs usbhid hid floppy
Jul 18 12:15:44 webspace6 kernel: [34900.496870] 
Jul 18 12:15:44 webspace6 kernel: [34900.496874] Pid: 25982, comm: kworker/0:1 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
Jul 18 12:15:44 webspace6 kernel: [34900.496879] RIP: 0010:[<ffffffff8132f1c3>]  [<ffffffff8132f1c3>] iowrite16+0x23/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.496915] RSP: 0018:ffff88101fc03a80  EFLAGS: 00000206
Jul 18 12:15:44 webspace6 kernel: [34900.496918] RAX: 0000000000000001 RBX: ffff88101fc03a20 RCX: 000000000000007b
Jul 18 12:15:44 webspace6 kernel: [34900.496921] RDX: 000000000000c050 RSI: 000000000001c050 RDI: 0000000000000001
Jul 18 12:15:44 webspace6 kernel: [34900.496924] RBP: ffff88101fc03a80 R08: ffff880fc89d0bb8 R09: 0000160000000000
Jul 18 12:15:44 webspace6 kernel: [34900.496926] R10: ffff880596e8f600 R11: 0000000000000000 R12: ffff88101fc039f8
Jul 18 12:15:44 webspace6 kernel: [34900.496929] R13: ffffffff8167590a R14: ffff88101fc03a80 R15: ffff880fc89d0000
Jul 18 12:15:44 webspace6 kernel: [34900.496936] FS:  00007fa339aa0700(0000) GS:ffff88101fc00000(0000) knlGS:0000000000000000
Jul 18 12:15:44 webspace6 kernel: [34900.496939] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 18 12:15:44 webspace6 kernel: [34900.496941] CR2: 00007f1f055d9c39 CR3: 0000000f88c6c000 CR4: 00000000000006f0
Jul 18 12:15:44 webspace6 kernel: [34900.496951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 18 12:15:44 webspace6 kernel: [34900.496956] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 18 12:15:44 webspace6 kernel: [34900.496959] Process kworker/0:1 (pid: 25982, threadinfo ffff88060f1f4000, task ffff8806d10edb80)
Jul 18 12:15:44 webspace6 kernel: [34900.496961] Stack:
Jul 18 12:15:44 webspace6 kernel: [34900.496963]  ffff88101fc03a90 ffffffff813bc975 ffff88101fc03ab0 ffffffff813bbce0
Jul 18 12:15:44 webspace6 kernel: [34900.496970]  ffff880fc89d0000 ffff880596e8f600 ffff88101fc03ae0 ffffffff8149aab4
Jul 18 12:15:44 webspace6 kernel: [34900.496974]  ffff880596e8f600 0000000000000000 000000000000004a ffff880fc89d0000
Jul 18 12:15:44 webspace6 kernel: [34900.496979] Call Trace:
Jul 18 12:15:44 webspace6 kernel: [34900.496981]  <IRQ> 
Jul 18 12:15:44 webspace6 kernel: [34900.497008]  [<ffffffff813bc975>] vp_notify+0x25/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497013]  [<ffffffff813bbce0>] virtqueue_kick+0x20/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497033]  [<ffffffff8149aab4>] start_xmit+0x54/0x210
Jul 18 12:15:44 webspace6 kernel: [34900.497052]  [<ffffffff81568211>] dev_hard_start_xmit+0x171/0x580
Jul 18 12:15:44 webspace6 kernel: [34900.497062]  [<ffffffff8116ef0c>] ? __kmalloc_node_track_caller+0x8c/0x1d0
Jul 18 12:15:44 webspace6 kernel: [34900.497078]  [<ffffffff81586201>] sch_direct_xmit+0x101/0x1d0
Jul 18 12:15:44 webspace6 kernel: [34900.497082]  [<ffffffff8156876c>] dev_queue_xmit+0x14c/0x400
Jul 18 12:15:44 webspace6 kernel: [34900.497091]  [<ffffffff815a16c3>] ip_finish_output+0x183/0x310
Jul 18 12:15:44 webspace6 kernel: [34900.497095]  [<ffffffff815a18bb>] ip_output+0x6b/0xa0
Jul 18 12:15:44 webspace6 kernel: [34900.497098]  [<ffffffff815a0bf9>] ip_local_out+0x29/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497102]  [<ffffffff815a1486>] ip_build_and_send_pkt+0x156/0x210
Jul 18 12:15:44 webspace6 kernel: [34900.497106]  [<ffffffff815bc5d2>] tcp_v4_send_synack+0x72/0xd0
Jul 18 12:15:44 webspace6 kernel: [34900.497110]  [<ffffffff815bc656>] tcp_v4_rtx_synack+0x26/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497115]  [<ffffffff815a67bf>] inet_csk_reqsk_queue_prune+0x25f/0x2f0
Jul 18 12:15:44 webspace6 kernel: [34900.497119]  [<ffffffff815bb307>] tcp_keepalive_timer+0x147/0x280
Jul 18 12:15:44 webspace6 kernel: [34900.497123]  [<ffffffff815bb1c0>] ? tcp_set_keepalive+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497127]  [<ffffffff81062ce9>] call_timer_fn+0x49/0x130
Jul 18 12:15:44 webspace6 kernel: [34900.497131]  [<ffffffff815bb1c0>] ? tcp_set_keepalive+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497134]  [<ffffffff81063109>] run_timer_softirq+0x149/0x280
Jul 18 12:15:44 webspace6 kernel: [34900.497143]  [<ffffffff810a2208>] ? ktime_get+0x68/0xf0
Jul 18 12:15:44 webspace6 kernel: [34900.497147]  [<ffffffff8105a30f>] __do_softirq+0xbf/0x200
Jul 18 12:15:44 webspace6 kernel: [34900.497191]  [<ffffffff8107ae87>] ? hrtimer_interrupt+0x127/0x210
Jul 18 12:15:44 webspace6 kernel: [34900.497197]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497201]  [<ffffffff8167625c>] call_softirq+0x1c/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497217]  [<ffffffff81015365>] do_softirq+0x65/0xa0
Jul 18 12:15:44 webspace6 kernel: [34900.497220]  [<ffffffff8105a10d>] irq_exit+0xbd/0xe0
Jul 18 12:15:44 webspace6 kernel: [34900.497224]  [<ffffffff81676b9e>] smp_apic_timer_interrupt+0x6e/0x99
Jul 18 12:15:44 webspace6 kernel: [34900.497229]  [<ffffffff8167590a>] apic_timer_interrupt+0x6a/0x70
Jul 18 12:15:44 webspace6 kernel: [34900.497231]  <EOI> 
Jul 18 12:15:44 webspace6 kernel: [34900.497235]  [<ffffffff810afb32>] ? smp_call_function_many+0x1f2/0x260
Jul 18 12:15:44 webspace6 kernel: [34900.497239]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497243]  [<ffffffff810afd02>] smp_call_function+0x22/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497246]  [<ffffffff810afd3b>] on_each_cpu+0x2b/0x70
Jul 18 12:15:44 webspace6 kernel: [34900.497249]  [<ffffffff81048bec>] flush_tlb_all+0x1c/0x20
Jul 18 12:15:44 webspace6 kernel: [34900.497254]  [<ffffffff81155572>] __purge_vmap_area_lazy+0x102/0x200
Jul 18 12:15:44 webspace6 kernel: [34900.497258]  [<ffffffff811556ee>] free_vmap_area_noflush+0x7e/0x80
Jul 18 12:15:44 webspace6 kernel: [34900.497262]  [<ffffffff811558a5>] free_unmap_vmap_area+0x25/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497265]  [<ffffffff811558f0>] remove_vm_area+0x40/0xa0
Jul 18 12:15:44 webspace6 kernel: [34900.497269]  [<ffffffff81155a12>] __vunmap+0x32/0x110
Jul 18 12:15:44 webspace6 kernel: [34900.497272]  [<ffffffff81155b6a>] vfree+0x2a/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497290]  [<ffffffffa0119fb8>] kv_free.isra.7+0x68/0x70 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497300]  [<ffffffffa011bd15>] spl_slab_reclaim+0x335/0x440 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497311]  [<ffffffffa011c100>] ? spl_kmem_reap+0x30/0x30 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497321]  [<ffffffffa011c122>] spl_cache_age+0x22/0x50 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497326]  [<ffffffff8106ed2b>] process_one_work+0x12b/0x470
Jul 18 12:15:44 webspace6 kernel: [34900.497329]  [<ffffffff81071896>] worker_thread+0x176/0x420
Jul 18 12:15:44 webspace6 kernel: [34900.497333]  [<ffffffff81071720>] ? manage_workers+0x120/0x120
Jul 18 12:15:44 webspace6 kernel: [34900.497336]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:15:44 webspace6 kernel: [34900.497340]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:15:44 webspace6 kernel: [34900.497344]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:15:44 webspace6 kernel: [34900.497347]  [<ffffffff81676160>] ? gs_change+0x13/0x13
Jul 18 12:15:44 webspace6 kernel: [34900.497349] Code: 0f 1f 84 00 00 00 00 00 55 48 81 fe ff ff 03 00 48 89 f2 48 89 e5 77 30 48 81 fe 00 00 01 00 76 0f 81 e2 ff ff 00 00 89 f8 66 ef <c9> c3 0f 1f 00 48 c7 c6 9f 24 a3 81 48 89 d7 e8 99 fc ff ff c9 
Jul 18 12:15:44 webspace6 kernel: [34900.497378] Call Trace:
Jul 18 12:15:44 webspace6 kernel: [34900.497380]  <IRQ>  [<ffffffff813bc975>] vp_notify+0x25/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497386]  [<ffffffff813bbce0>] virtqueue_kick+0x20/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497389]  [<ffffffff8149aab4>] start_xmit+0x54/0x210
Jul 18 12:15:44 webspace6 kernel: [34900.497393]  [<ffffffff81568211>] dev_hard_start_xmit+0x171/0x580
Jul 18 12:15:44 webspace6 kernel: [34900.497397]  [<ffffffff8116ef0c>] ? __kmalloc_node_track_caller+0x8c/0x1d0
Jul 18 12:15:44 webspace6 kernel: [34900.497400]  [<ffffffff81586201>] sch_direct_xmit+0x101/0x1d0
Jul 18 12:15:44 webspace6 kernel: [34900.497404]  [<ffffffff8156876c>] dev_queue_xmit+0x14c/0x400
Jul 18 12:15:44 webspace6 kernel: [34900.497408]  [<ffffffff815a16c3>] ip_finish_output+0x183/0x310
Jul 18 12:15:44 webspace6 kernel: [34900.497411]  [<ffffffff815a18bb>] ip_output+0x6b/0xa0
Jul 18 12:15:44 webspace6 kernel: [34900.497415]  [<ffffffff815a0bf9>] ip_local_out+0x29/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497418]  [<ffffffff815a1486>] ip_build_and_send_pkt+0x156/0x210
Jul 18 12:15:44 webspace6 kernel: [34900.497422]  [<ffffffff815bc5d2>] tcp_v4_send_synack+0x72/0xd0
Jul 18 12:15:44 webspace6 kernel: [34900.497426]  [<ffffffff815bc656>] tcp_v4_rtx_synack+0x26/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497429]  [<ffffffff815a67bf>] inet_csk_reqsk_queue_prune+0x25f/0x2f0
Jul 18 12:15:44 webspace6 kernel: [34900.497433]  [<ffffffff815bb307>] tcp_keepalive_timer+0x147/0x280
Jul 18 12:15:44 webspace6 kernel: [34900.497437]  [<ffffffff815bb1c0>] ? tcp_set_keepalive+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497440]  [<ffffffff81062ce9>] call_timer_fn+0x49/0x130
Jul 18 12:15:44 webspace6 kernel: [34900.497443]  [<ffffffff815bb1c0>] ? tcp_set_keepalive+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497447]  [<ffffffff81063109>] run_timer_softirq+0x149/0x280
Jul 18 12:15:44 webspace6 kernel: [34900.497450]  [<ffffffff810a2208>] ? ktime_get+0x68/0xf0
Jul 18 12:15:44 webspace6 kernel: [34900.497453]  [<ffffffff8105a30f>] __do_softirq+0xbf/0x200
Jul 18 12:15:44 webspace6 kernel: [34900.497456]  [<ffffffff8107ae87>] ? hrtimer_interrupt+0x127/0x210
Jul 18 12:15:44 webspace6 kernel: [34900.497460]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497463]  [<ffffffff8167625c>] call_softirq+0x1c/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497466]  [<ffffffff81015365>] do_softirq+0x65/0xa0
Jul 18 12:15:44 webspace6 kernel: [34900.497470]  [<ffffffff8105a10d>] irq_exit+0xbd/0xe0
Jul 18 12:15:44 webspace6 kernel: [34900.497473]  [<ffffffff81676b9e>] smp_apic_timer_interrupt+0x6e/0x99
Jul 18 12:15:44 webspace6 kernel: [34900.497477]  [<ffffffff8167590a>] apic_timer_interrupt+0x6a/0x70
Jul 18 12:15:44 webspace6 kernel: [34900.497499]  <EOI>  [<ffffffff810afb32>] ? smp_call_function_many+0x1f2/0x260
Jul 18 12:15:44 webspace6 kernel: [34900.497507]  [<ffffffff81048e00>] ? leave_mm+0x50/0x50
Jul 18 12:15:44 webspace6 kernel: [34900.497510]  [<ffffffff810afd02>] smp_call_function+0x22/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497513]  [<ffffffff810afd3b>] on_each_cpu+0x2b/0x70
Jul 18 12:15:44 webspace6 kernel: [34900.497517]  [<ffffffff81048bec>] flush_tlb_all+0x1c/0x20
Jul 18 12:15:44 webspace6 kernel: [34900.497520]  [<ffffffff81155572>] __purge_vmap_area_lazy+0x102/0x200
Jul 18 12:15:44 webspace6 kernel: [34900.497524]  [<ffffffff811556ee>] free_vmap_area_noflush+0x7e/0x80
Jul 18 12:15:44 webspace6 kernel: [34900.497528]  [<ffffffff811558a5>] free_unmap_vmap_area+0x25/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497532]  [<ffffffff811558f0>] remove_vm_area+0x40/0xa0
Jul 18 12:15:44 webspace6 kernel: [34900.497535]  [<ffffffff81155a12>] __vunmap+0x32/0x110
Jul 18 12:15:44 webspace6 kernel: [34900.497539]  [<ffffffff81155b6a>] vfree+0x2a/0x30
Jul 18 12:15:44 webspace6 kernel: [34900.497550]  [<ffffffffa0119fb8>] kv_free.isra.7+0x68/0x70 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497559]  [<ffffffffa011bd15>] spl_slab_reclaim+0x335/0x440 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497569]  [<ffffffffa011c100>] ? spl_kmem_reap+0x30/0x30 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497578]  [<ffffffffa011c122>] spl_cache_age+0x22/0x50 [spl]
Jul 18 12:15:44 webspace6 kernel: [34900.497582]  [<ffffffff8106ed2b>] process_one_work+0x12b/0x470
Jul 18 12:15:44 webspace6 kernel: [34900.497585]  [<ffffffff81071896>] worker_thread+0x176/0x420
Jul 18 12:15:44 webspace6 kernel: [34900.497588]  [<ffffffff81071720>] ? manage_workers+0x120/0x120
Jul 18 12:15:44 webspace6 kernel: [34900.497592]  [<ffffffff810763ee>] kthread+0x9e/0xb0
Jul 18 12:15:44 webspace6 kernel: [34900.497595]  [<ffffffff81676164>] kernel_thread_helper+0x4/0x10
Jul 18 12:15:44 webspace6 kernel: [34900.497599]  [<ffffffff81076350>] ? kthread_freezable_should_stop+0x70/0x70
Jul 18 12:15:44 webspace6 kernel: [34900.497602]  [<ffffffff81676160>] ? gs_change+0x13/0x13

phillipp · 2012-07-18T17:24:13Z

As this seems to be a freeing issue we're trying to limit the arc size to 20GB now.

phillipp · 2012-07-18T18:23:04Z

One other thing came to my mind: when we upgraded to ubuntu 12.04 we changed copied the root partition from a reiserfs partition to a new xfs root partition.

As i look through the other issues, zfs and xfs have big trouble working together...

thomasba · 2012-07-22T21:11:22Z

Today we had to restart the server twice, the dmesg output:

[10903.822484] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[10903.822939] IP: [<ffffffffa00fa741>] dbuf_find+0x81/0x100 [zfs]
[10903.823328] PGD 0 
[10903.823414] Oops: 0000 [#1] SMP 
[10903.823583] CPU 2 
[10903.823651] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ext2 ppdev psmouse serio_raw microcode joydev i2c_piix4 parport_pc mac_hid i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate usbhid hid floppy
[10903.824105] 
[10903.824105] Pid: 11689, comm: apache2 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
[10903.824105] RIP: 0010:[<ffffffffa00fa741>]  [<ffffffffa00fa741>] dbuf_find+0x81/0x100 [zfs]
[10903.824105] RSP: 0018:ffff880cdb0ed858  EFLAGS: 00010202
[10903.824105] RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000031
[10903.824105] RDX: ffffffffa01c55b0 RSI: 0000000000000000 RDI: ffffffffa01c55b0
[10903.824105] RBP: ffff880cdb0ed898 R08: 00ab419a4bd3dec0 R09: 0000160000000000
[10903.824105] R10: ffff880cf2ffdc00 R11: 00000000c7dcf0c0 R12: ffff880fac54a800
[10903.824105] R13: 0000000000000000 R14: 0000000000311155 R15: 00000000006e4859
[10903.824105] FS:  00007fd302149740(0000) GS:ffff88101fc40000(0000) knlGS:0000000000000000
[10903.824105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10903.824105] CR2: 0000000000000004 CR3: 0000000cecd28000 CR4: 00000000000006e0
[10903.824105] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10903.824105] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[10903.824105] Process apache2 (pid: 11689, threadinfo ffff880cdb0ec000, task ffff880e7f6616e0)
[10903.824105] Stack:
[10903.824105]  ffff880cdb0ed878 ffffffffa01c55b0 ffff880f149fc680 ffff880fa6663168
[10903.824105]  0000000000311155 ffff880fa66634f0 0000000000000005 0000000000311151
[10903.824105]  ffff880cdb0ed918 ffffffffa00fca7d 0000000000000006 00000000000000c0
[10903.824105] Call Trace:
[10903.824105]  [<ffffffffa00fca7d>] dbuf_prefetch+0x5d/0x290 [zfs]
[10903.824105]  [<ffffffffa01102e9>] dmu_zfetch_dofetch.isra.4+0xf9/0x170 [zfs]
[10903.824105]  [<ffffffffa0110df1>] dmu_zfetch+0x7e1/0xf60 [zfs]
[10903.824105]  [<ffffffffa00fbba1>] dbuf_read+0x6e1/0x830 [zfs]
[10903.824105]  [<ffffffffa0112696>] dnode_hold_impl+0x1d6/0x5b0 [zfs]
[10903.824105]  [<ffffffffa00fbe49>] ? dbuf_rele_and_unlock+0x159/0x200 [zfs]
[10903.824105]  [<ffffffffa0112a89>] dnode_hold+0x19/0x20 [zfs]
[10903.824105]  [<ffffffffa0102784>] dmu_bonus_hold+0x34/0x320 [zfs]
[10903.824105]  [<ffffffffa013481e>] sa_buf_hold+0xe/0x10 [zfs]
[10903.824105]  [<ffffffffa018503a>] zfs_zget+0xda/0x210 [zfs]
[10903.824105]  [<ffffffffa015ec61>] ? zap_lookup_norm+0xd1/0x1c0 [zfs]
[10903.824105]  [<ffffffffa0165323>] zfs_dirent_lock+0x4c3/0x5d0 [zfs]
[10903.824105]  [<ffffffffa017d7ff>] zfs_remove+0x12f/0x410 [zfs]
[10903.824105]  [<ffffffff8118f8a8>] ? inode_permission+0x48/0x110
[10903.824105]  [<ffffffffa0193446>] zpl_unlink+0x46/0x70 [zfs]
[10903.824105]  [<ffffffff8119041f>] vfs_unlink+0x9f/0x120
[10903.824105]  [<ffffffff81194103>] do_unlinkat+0x1c3/0x1f0
[10903.824105]  [<ffffffff810cf52c>] ? __audit_syscall_entry+0xcc/0x210
[10903.824105]  [<ffffffff810cf3fe>] ? __audit_syscall_exit+0x41e/0x480
[10903.824105]  [<ffffffff81194146>] sys_unlink+0x16/0x20
[10903.824105]  [<ffffffff81674e69>] system_call_fastpath+0x16/0x1b
[10903.824105] Code: e8 b5 fb 56 e1 48 8b 05 66 a3 0c 00 48 8b 55 c8 4a 8b 04 f8 48 85 c0 75 12 eb 67 0f 1f 80 00 00 00 00 48 8b 40 38 48 85 c0 74 57 <48> 39 18 75 f2 4c 39 60 20 75 ec 44 38 68 50 75 e6 4c 39 70 40 
[10903.824105] RIP  [<ffffffffa00fa741>] dbuf_find+0x81/0x100 [zfs]
[10903.824105]  RSP <ffff880cdb0ed858>
[10903.824105] CR2: 0000000000000004
[10903.849331] ---[ end trace 67694f4fde1ff672 ]---

[17493.729261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[17493.730000] IP: [<ffffffffa00fa741>] dbuf_find+0x81/0x100 [zfs]
[17493.730859] PGD 0 
[17493.731059] Oops: 0000 [#1] SMP 
[17493.731410] CPU 5 
[17493.731576] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ext2 microcode psmouse i2c_piix4 serio_raw joydev i6300esb parport_pc mac_hid lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate usbhid hid floppy
[17493.732064] 
[17493.732064] Pid: 5565, comm: apache2 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
[17493.732064] RIP: 0010:[<ffffffffa00fa741>]  [<ffffffffa00fa741>] dbuf_find+0x81/0x100 [zfs]
[17493.732064] RSP: 0018:ffff880782d43318  EFLAGS: 00010202
[17493.732064] RAX: 0000000000000004 RBX: 000000000be12738 RCX: 0000000000000000
[17493.732064] RDX: ffffffffa01c5590 RSI: 000000000be12738 RDI: ffffffffa01c5590
[17493.732064] RBP: ffff880782d43358 R08: 0081127eb9a0f00a R09: ffff8809cf824800
[17493.732064] R10: ffff880782d43420 R11: 0000000000000000 R12: ffff880f9bd02800
[17493.732064] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000a45058
[17493.732064] FS:  00007ff08d5ff740(0000) GS:ffff88101fca0000(0000) knlGS:0000000000000000
[17493.732064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17493.732064] CR2: 0000000000000004 CR3: 0000000798b6a000 CR4: 00000000000006e0
[17493.732064] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17493.732064] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17493.732064] Process apache2 (pid: 5565, threadinfo ffff880782d42000, task ffff880e937ac4a0)
[17493.732064] Stack:
[17493.732064]  00000000000080d0 ffffffffa01c5590 ffff880782d433b8 ffff8809cf824800
[17493.732064]  0000000000000000 0000000000000000 0000000000000001 ffffffffa019b58f
[17493.732064]  ffff880782d433b8 ffffffffa00fc579 0000000000000000 0000000000000000
[17493.732064] Call Trace:
[17493.732064]  [<ffffffffa00fc579>] __dbuf_hold_impl+0xa9/0x490 [zfs]
[17493.732064]  [<ffffffffa00fc9e6>] dbuf_hold_impl+0x86/0xc0 [zfs]
[17493.732064]  [<ffffffffa00fcfa0>] dbuf_hold+0x20/0x30 [zfs]
[17493.732064]  [<ffffffffa0102bb1>] dmu_buf_hold_array_by_dnode+0x141/0x550 [zfs]
[17493.732064]  [<ffffffffa0103657>] dmu_read+0x97/0x180 [zfs]
[17493.732064]  [<ffffffff8166a2ee>] ? mutex_lock+0x1e/0x50
[17493.732064]  [<ffffffffa017eeb2>] zfs_getpage+0x132/0x230 [zfs]
[17493.732064]  [<ffffffffa0192bd0>] ? zpl_writepages+0x20/0x20 [zfs]
[17493.750271]  [<ffffffffa0192c01>] zpl_readpage+0x31/0x70 [zfs]
[17493.750271]  [<ffffffff8112cfda>] read_cache_pages+0xba/0x130
[17493.750271]  [<ffffffffa0192bae>] zpl_readpages+0x1e/0x20 [zfs]
[17493.750271]  [<ffffffff8112c828>] read_pages+0x48/0x100
[17493.750271]  [<ffffffff8112ca3b>] __do_page_cache_readahead+0x15b/0x170
[17493.750271]  [<ffffffff8112ca71>] ra_submit+0x21/0x30
[17493.750271]  [<ffffffff811212d8>] filemap_fault+0x358/0x440
[17493.750271]  [<ffffffff81149422>] __do_fault+0x72/0x570
[17493.750271]  [<ffffffff81149a07>] handle_pte_fault+0xe7/0x1f0
[17493.750271]  [<ffffffff81149ce5>] handle_mm_fault+0x1d5/0x330
[17493.750271]  [<ffffffff816704cc>] do_page_fault+0x14c/0x490
[17493.750271]  [<ffffffffa01850e8>] ? zfs_zget+0x188/0x210 [zfs]
[17493.750271]  [<ffffffff8166cee5>] page_fault+0x25/0x30
[17493.750271]  [<ffffffff813285fd>] ? copy_user_generic_string+0x2d/0x40
[17493.750271]  [<ffffffff815a8120>] ? copy_from_user+0x30/0x40
[17493.750271]  [<ffffffff815ac2d6>] tcp_sendmsg+0x4c6/0xc10
[17493.750271]  [<ffffffff815cf508>] inet_sendmsg+0x48/0xb0
[17493.750271]  [<ffffffff81551d58>] sock_aio_write+0x198/0x1b0
[17493.750271]  [<ffffffff81551bc0>] ? __sock_recv_ts_and_drops+0x150/0x150
[17493.750271]  [<ffffffff8118390b>] do_sync_readv_writev+0xdb/0x120
[17493.750271]  [<ffffffff81149a07>] ? handle_pte_fault+0xe7/0x1f0
[17493.750271]  [<ffffffff8114cb48>] ? vma_link+0x88/0xe0
[17493.750271]  [<ffffffff812e34c8>] ? apparmor_file_permission+0x18/0x20
[17493.750271]  [<ffffffff812a9cc3>] ? security_file_permission+0x23/0x90
[17493.750271]  [<ffffffff811848eb>] do_readv_writev+0xdb/0x1f0
[17493.750271]  [<ffffffff81670570>] ? do_page_fault+0x1f0/0x490
[17493.750271]  [<ffffffff8114f2f4>] ? do_mmap_pgoff+0x354/0x3a0
[17493.750271]  [<ffffffff81184a48>] vfs_writev+0x48/0x60
[17493.750271]  [<ffffffff81184b81>] sys_writev+0x51/0xc0
[17493.750271]  [<ffffffff81674e69>] system_call_fastpath+0x16/0x1b
[17493.750271] Code: e8 b5 fb 56 e1 48 8b 05 66 a3 0c 00 48 8b 55 c8 4a 8b 04 f8 48 85 c0 75 12 eb 67 0f 1f 80 00 00 00 00 48 8b 40 38 48 85 c0 74 57 <48> 39 18 75 f2 4c 39 60 20 75 ec 44 38 68 50 75 e6 4c 39 70 40 
[17493.750271] RIP  [<ffffffffa00fa741>] dbuf_find+0x81/0x100 [zfs]
[17493.750271]  RSP <ffff880782d43318>
[17493.750271] CR2: 0000000000000004
[17493.768728] ---[ end trace b84bdb06d7ae584a ]---

phillipp · 2012-07-22T21:24:22Z

Btw: after seeing lots of problems with xfs and zfs on the same system in the issues, we have moved the root fs from xfs to ext4 the day before.

What can we do to help fixing this?

ryao · 2012-07-23T05:32:41Z

Is there any chance that you could post the zfs.ko kernel module somewhere online with an accompanying panic message? That would let me disassemble it to get a better idea of where the NULL pointer dereference occurs.

thomasba · 2012-07-26T09:47:21Z

ryao, I’ve just send you an E-Mail with a panic message and the compiled module.

thomasba · 2012-08-01T08:24:11Z

[16730.778849] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
[16730.781332] IP: [<ffffffffa00fa6e1>] dbuf_find+0x81/0x100 [zfs]
[16730.782703] PGD 0 
[16730.782703] Oops: 0000 [#1] SMP 
[16730.782703] CPU 7 
[16730.782703] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ext2 psmouse ppdev joydev serio_raw i2c_piix4 microcode mac_hid parport_pc i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate usbhid hid floppy
[16730.790951] 
[16730.790951] Pid: 18929, comm: apache2 Tainted: P           O 3.4.0-030400-generic #201205210521 Bochs Bochs
[16730.790951] RIP: 0010:[<ffffffffa00fa6e1>]  [<ffffffffa00fa6e1>] dbuf_find+0x81/0x100 [zfs]
[16730.790951] RSP: 0018:ffff8807577c5758  EFLAGS: 00010202
[16730.790951] RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000018
[16730.790951] RDX: ffffffffa01c0950 RSI: 0000000000000000 RDI: ffffffffa01c0950
[16730.790951] RBP: ffff8807577c5798 R08: 009d059c08288bc6 R09: ffff8807f5705800
[16730.790951] R10: ffff8807577c5860 R11: 0000000000000000 R12: ffff880f8c3c0800
[16730.790951] R13: 0000000000000000 R14: 00000000001841aa R15: 0000000000a51d0a
[16730.790951] FS:  00007f800b6da740(0000) GS:ffff88101fce0000(0000) knlGS:0000000000000000
[16730.790951] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16730.790951] CR2: 0000000000000007 CR3: 000000076f4d2000 CR4: 00000000000006e0
[16730.790951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16730.790951] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[16730.808206] Process apache2 (pid: 18929, threadinfo ffff8807577c4000, task ffff880f611a2dc0)
[16730.808206] Stack:
[16730.810707]  00000000000080d0 ffffffffa01c0950 ffff8807577c57f8 ffff8807f5705800
[16730.810707]  0000000000000000 00000000001841aa 0000000000000001 ffffffffa0199d60
[16730.836449]  ffff8807577c57f8 ffffffffa00fc499 0000000000000000 0000000000000000
[16730.836449] Call Trace:
[16730.836449]  [<ffffffffa00fc499>] __dbuf_hold_impl+0xa9/0x490 [zfs]
[16730.836449]  [<ffffffffa00fc906>] dbuf_hold_impl+0x86/0xc0 [zfs]
[16730.836449]  [<ffffffffa00fcea0>] dbuf_hold+0x20/0x30 [zfs]
[16730.836449]  [<ffffffffa0111d22>] dnode_hold_impl+0x132/0x560 [zfs]
[16730.836449]  [<ffffffff8166a2ee>] ? mutex_lock+0x1e/0x50
[16730.836449]  [<ffffffffa00f1f68>] ? remove_reference.isra.8+0xa8/0xd0 [zfs]
[16730.836449]  [<ffffffffa0112169>] dnode_hold+0x19/0x20 [zfs]
[16730.836449]  [<ffffffffa0102574>] dmu_bonus_hold+0x34/0x2c0 [zfs]
[16730.836449]  [<ffffffffa013361e>] sa_buf_hold+0xe/0x10 [zfs]
[16730.836449]  [<ffffffffa01832ba>] zfs_zget+0xda/0x210 [zfs]
[16730.836449]  [<ffffffffa015d5f1>] ? zap_lookup_norm+0xd1/0x1c0 [zfs]
[16730.836449]  [<ffffffffa0163bf1>] zfs_dirent_lock+0x411/0x520 [zfs]
[16730.836449]  [<ffffffffa0163d8b>] zfs_dirlook+0x8b/0x2d0 [zfs]
[16730.836449]  [<ffffffffa0160b73>] ? zfs_zaccess+0xa3/0x230 [zfs]
[16730.836449]  [<ffffffff81199d9d>] ? _d_rehash+0x3d/0x40
[16730.910915]  [<ffffffffa017ca61>] zfs_lookup+0x2e1/0x330 [zfs]
[16730.915673]  [<ffffffffa0191507>] zpl_lookup+0x57/0xc0 [zfs]
[16730.915673]  [<ffffffff8118ef17>] __lookup_hash+0x47/0x90
[16730.915673]  [<ffffffff81190b5b>] do_lookup+0x2bb/0x310
[16730.915673]  [<ffffffff81192e98>] path_lookupat+0x118/0x690
[16730.915673]  [<ffffffff811526ad>] ? page_add_new_anon_rmap+0x9d/0xf0
[16730.915673]  [<ffffffff8117065c>] ? kmem_cache_alloc+0x3c/0x130
[16730.915673]  [<ffffffff81193441>] do_path_lookup+0x31/0xc0
[16730.915673]  [<ffffffff81194359>] user_path_at_empty+0x59/0xa0
[16730.915673]  [<ffffffff81670570>] ? do_page_fault+0x1f0/0x490
[16730.915673]  [<ffffffff811943b1>] user_path_at+0x11/0x20
[16730.915673]  [<ffffffff81188d77>] vfs_fstatat+0x47/0x80
[16730.915673]  [<ffffffff81188e1e>] vfs_lstat+0x1e/0x20
[16730.915673]  [<ffffffff81188e44>] sys_newlstat+0x24/0x50
[16730.915673]  [<ffffffff810cf52c>] ? __audit_syscall_entry+0xcc/0x210
[16730.915673]  [<ffffffff81674e69>] system_call_fastpath+0x16/0x1b
[16730.915673] Code: e8 15 fc 56 e1 48 8b 05 46 61 0c 00 48 8b 55 c8 4a 8b 04 f8 48 85 c0 75 12 eb 67 0f 1f 80 00 00 00 00 48 8b 40 38 48 85 c0 74 57 <48> 39 18 75 f2 4c 39 60 20 75 ec 44 38 68 50 75 e6 4c 39 70 40 
[16730.915673] RIP  [<ffffffffa00fa6e1>] dbuf_find+0x81/0x100 [zfs]
[16730.915673]  RSP <ffff8807577c5758>
[16730.915673] CR2: 0000000000000007
[16731.110687] ---[ end trace 9292cb940124c268 ]---

Several hours after the NULL pointer dereference:

load average: 419.74, 419.30, 419.19

phillipp · 2012-08-16T00:46:37Z

Issues are fixed now (system has 8 days uptime) after we downgraded to zfs 0.6.0.56 and spl 0.6.0.56.

behlendorf · 2012-08-17T21:03:13Z

I suspect commit 302f753 might be responsible for your issue. This change was introduced in 0.6.0.62, if you get a chance could you try 0.6.0.61

thomasba · 2012-08-21T23:51:39Z

We just installed 0.6.0.61 on our server. If there are any troubles we let you know.

thomasba · 2012-08-22T11:52:00Z

There we go … some hours after we installed version 0.6.0.61:

Aug 22 03:58:44 webspace6 kernel: [ 9850.323224] BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
Aug 22 03:58:44 webspace6 kernel: [ 9850.323873] IP: [<ffffffffa00fa7c1>] dbuf_find+0x81/0x100 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.324042] PGD 0
Aug 22 03:58:44 webspace6 kernel: [ 9850.324042] Oops: 0000 [#1] SMP
Aug 22 03:58:44 webspace6 kernel: [ 9850.324042] CPU 5
Aug 22 03:58:44 webspace6 kernel: [ 9850.324042] Modules linked in: nfsd nfs dm_crypt lockd fscache auth_rpcgss nfs_acl sunrpc ext2 ppdev psmouse serio_raw joydev i6300esb i2c_piix4 parport_pc mac_hid lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate usbhid hid floppy
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] Pid: 14617, comm: apache2 Tainted: P           O 3.3.0owncomp #1 Bochs Bochs
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] RIP: 0010:[<ffffffffa00fa7c1>]  [<ffffffffa00fa7c1>] dbuf_find+0x81/0x100 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] RSP: 0018:ffff880bb660dbb8  EFLAGS: 00010206
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 000000000000005b
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] RDX: ffffffffa01c1450 RSI: 0000000000000000 RDI: ffffffffa01c1450
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] RBP: ffff880bb660dbf8 R08: 00996b03db0843f4 R09: 0000000000000000
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] R10: 67706a2e32315f31 R11: 5f303230705f6269 R12: ffff880fc359f800
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] R13: 0000000000000000 R14: 00000000005b8c17 R15: 00000000006ab562
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] FS:  00007f3592f62740(0000) GS:ffff88101fca0000(0000) knlGS:0000000000000000
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] CR2: 0000000000000005 CR3: 0000000bb9f85000 CR4: 00000000000006e0
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] Process apache2 (pid: 14617, threadinfo ffff880bb660c000, task ffff880f9c55ade0)
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] Stack:
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  ffff880bb660dbd8 ffffffffa01c1450 0000000000000282 ffff880f98cc7168
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  00000000005b8c17 ffff880eee30f440 ffff880bb660dd08 ffff880fc42ce388
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  ffff880bb660dc78 ffffffffa00fca3d ffff880bb660dc06 ffff880bbd2c02a0
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] Call Trace:
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffffa00fca3d>] dbuf_prefetch+0x5d/0x290 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff8103db99>] ? default_spin_lock_flags+0x9/0x10
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffffa015b8c7>] ? zap_cursor_retrieve+0x77/0x2c0 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffffa0103946>] dmu_prefetch+0xf6/0x250 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffffa017a6ef>] zfs_readdir+0x39f/0x4d0 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff8118e3a0>] ? sys_ioctl+0xb0/0xb0
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff81142f85>] ? handle_mm_fault+0x1d5/0x350
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff8118e3a0>] ? sys_ioctl+0xb0/0xb0
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffffa018fa5c>] zpl_readdir+0x5c/0x90 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff8118e3a0>] ? sys_ioctl+0xb0/0xb0
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff8118e538>] vfs_readdir+0xb8/0xe0
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff8118e6c9>] sys_getdents+0x89/0xf0
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  [<ffffffff816593e9>] system_call_fastpath+0x16/0x1b
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] Code: e8 15 44 55 e1 48 8b 05 66 60 0c 00 48 8b 55 c8 4a 8b 04 f8 48 85 c0 75 12 eb 67 0f 1f 80 00 00 00 00 48 8b 40 38 48 85 c0 74 57 <48> 39 18 75 f2 4c 39 60 20 75 ec 44 38 68 50 75 e6 4c 39 70 40
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] RIP  [<ffffffffa00fa7c1>] dbuf_find+0x81/0x100 [zfs]
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846]  RSP <ffff880bb660dbb8>
Aug 22 03:58:44 webspace6 kernel: [ 9850.325846] CR2: 0000000000000005
Aug 22 03:58:44 webspace6 kernel: [ 9850.349405] ---[ end trace 7b537604b2855061 ]---

behlendorf · 2012-08-22T22:39:12Z

That's surprising... the changes between 0.6.0.56 and 0.6.0.61 for the most part are pretty safe. None of them look like they could cause this. Is there any chance you could bisect the remaining couple tags and determine which one introduced the issue.

$ git log upstream/0.6.0.56..upstream/0.6.0.61 --oneline
ebf8e3a Illumos #1909: disk sync write perf regression when slog is used post oi
409dc1a Use KM_PUSHPAGE in l2arc_write_buffers
cf81b00 ZFS list snapshot property alias
10b7549 ZFS snapshot alias
7d5cd71 Illumos #1346: zfs incremental receive may leave behind temporary clones
22cd4a4 Illumos #1475: zfs spill block hold can access invalid spill blkptr
5ffb9d1 Illumos #1951: leaking a vdev when removing an l2cache device
b129c65 OS-926: zfs panic in zfs_fill_zplprops_impl()
3adfc40 Illumos #1680: zfs vdev_file_io_start: validate vdev before using vdev_t
109491a Improve error message consistency
f4605f0 Document the zle compression algorithm
f0fd83b Export additional dsl symbols
1f0d8a5 Fixed a NULL pointer dereference bug in zfs_preumount
2ce9d0e Make Gentoo initscript use modinfo
847de12 Print human readable error message for ENOENT
fc41c64 Properly expose the mfu ghost list kstats
9fc6070 Remove hard-coded 80 column output

thomasba · 2012-08-23T19:24:40Z

Short summary:

March 2012: We switched to ZFS 0.6.0.56 on Ubuntu 10.04; no problems

In July 2012 we upgraded Ubuntu to 12.04. After the Upgrade the first Problems with ZFS occurred, so we thought it maybe is the version an tried different versions. After we tried version 0.6.0.56 we had an Uptime of ~8 Days, but then it failed again …

We experienced high load average after a null pointer dereference. Then after reboot there immediately showed up a null pointer dereference. With options zfs zil_replay_disable=1 we "fixed" that.

And finally we compiled an installed version 0.6.0.61. But still no solution.

DanielSmedegaardBuus · 2012-08-28T21:48:05Z

I'm wondering if this might be the same issue I'm facing. Which I previously thought had to do with me mounting server shares on my client computer. But the last couple of lockups haven't happened with shares mounted, but they have happened while numerous threads were having fun with the zfs pool. This last one happened while testing 160+ GB files' md5sums while par2'ing a 160+ GB tar archive while watching Avengers in XBMC.

Usually, I'm left with a completely unresponsive system, or at least one I cannot get to via SSH or a TTY. But this time, I actually have SSH access. But any process that attempts to access my pool will hang indefinitely.

Syslog mentions this "Ooops", which interestingly enough mentions arc_adapt:

Aug 28 23:30:38 lnxsrv kernel: [ 1278.369592] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369599] IP: [<ffffffffa01c293d>] arc_evict_ghost+0xdd/0x3c0 [zfs]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369635] PGD 22f053067 PUD 22ec27067 PMD 0 
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369640] Oops: 0000 [#1] SMP 
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369644] CPU 1 
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369645] Modules linked in: vmnet vsock vmci rfcomm vmmon bnep bluetooth parport_pc ppdev nfsd binfmt_misc nfs lockd fscache auth_rpcgss nfs_acl sunrpc xfs nvidia(P) snd_hda_codec_hdmi arc4 xt_state iptable_filter shpchp ipt_MASQUERADE serio_raw iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event rtl8187 snd_seq mac80211 snd_timer snd_seq_device cfg80211 snd eeprom_93cx6 soundcore snd_page_alloc lp parport zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl btrfs zlib_deflate libcrc32c vesafb ahci usbhid hid 3c59x libahci r8169 sata_sil24
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369695] 
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369698] Pid: 436, comm: arc_adapt Tainted: P            3.0.0-17-generic #30-Ubuntu To Be Filled By O.E.M. To Be Filled By O.E.M./P43Twins1600
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369703] RIP: 0010:[<ffffffffa01c293d>]  [<ffffffffa01c293d>] arc_evict_ghost+0xdd/0x3c0 [zfs]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369722] RSP: 0018:ffff880212749cc0  EFLAGS: 00010286
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369724] RAX: 00000000000000d8 RBX: 0000000000000000 RCX: 000000000002c925
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369727] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369729] RBP: ffff880212749e70 R08: 7db66413f4e60ece R09: 0000000000000000
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369732] R10: 0000000000000000 R11: dead000000100100 R12: ffffffffffffff28
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369734] R13: ffffffffffffff28 R14: ffffffffa0295860 R15: ffffffffa0295870
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369737] FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369740] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369742] CR2: 0000000000000008 CR3: 000000022effd000 CR4: 00000000000406e0
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369745] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369748] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369750] Process arc_adapt (pid: 436, threadinfo ffff880212748000, task ffff88022dbe1720)
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369752] Stack:
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369754]  0000000000000286 ffff880212749d68 ffff880212749d20 ffffffff8106f262
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369759]  ffff880212749f01 0000000000000202 ffffffffa0295840 ffffffffa0295898
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369763]  0000000000000001 0000000001a22400 ffff88022dbe1720 0000000001978400
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369768] Call Trace:
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369774]  [<ffffffff8106f262>] ? try_to_del_timer_sync+0x92/0x130
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369795]  [<ffffffffa01c358e>] arc_adjust+0x11e/0x1e0 [zfs]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369814]  [<ffffffffa01c4a70>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369832]  [<ffffffffa01c4adf>] arc_adapt_thread+0x6f/0xd0 [zfs]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369845]  [<ffffffffa014d0a8>] thread_generic_wrapper+0x78/0x90 [spl]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369853]  [<ffffffffa014d030>] ? __thread_create+0x310/0x310 [spl]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369858]  [<ffffffff8108135c>] kthread+0x8c/0xa0
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369863]  [<ffffffff815fc464>] kernel_thread_helper+0x4/0x10
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369867]  [<ffffffff810812d0>] ? flush_kthread_worker+0xa0/0xa0
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369871]  [<ffffffff815fc460>] ? gs_change+0x13/0x13
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369872] Code: 00 65 48 8b 0c 25 80 cd 00 00 48 89 8d a0 fe ff ff eb 18 0f 1f 00 48 39 de 74 37 4d 89 ec 4d 85 e4 0f 84 17 01 00 00 49 8b 46 08 
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369908] RIP  [<ffffffffa01c293d>] arc_evict_ghost+0xdd/0x3c0 [zfs]
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369926]  RSP <ffff880212749cc0>
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369927] CR2: 0000000000000008
Aug 28 23:30:38 lnxsrv kernel: [ 1278.369931] ---[ end trace 2a29882c992d43f2 ]---

I think I'll try that 0.6.0.56 version you're talking about (if possible thru PPA), or perhaps some ARC fiddling.

DanielSmedegaardBuus · 2012-08-28T22:10:23Z

Just a thought... Could this simply be ZFS running out of memory? This isn't exactly enterprise-grade hardware I'm running here, and mem isn't that impressive. I have 8GB of RAM, a RAIDZ3 zpool of 19 2TB disks (plus a 60GB SSD cache), and a max ARC size of 1024 MB. 4 GB swap partition.

As this guy says - https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!topic/zfs-discuss/8dDfYK1p1oc - that doesn't necessarily mean that it won't use up more mem, and adding to that I'm md5summing like hell (on mostly smaller files) while par2'ing a 160+ GB file. I have no idea how much RAM par2 was allocating for that, but probably a significant chunk. All mem has now been released from the hanging md5sum and par2 processes.

DanielSmedegaardBuus · 2012-08-28T22:28:26Z

Hm. Actually, neither one of them are using any measurable mem at all :/ Thought I'd reboot and retry the whole thing while watching top. Next thing will be to try to get that .56 version installed then :)

phillipp · 2012-08-30T13:31:43Z

We just got another null pointer dereference, but the stack trace seems to be new for me:

[229783.146902] BUG: unable to handle kernel NULL pointer dereference at           (null)
[229783.150800] IP: [<ffffffff8107c8a1>] __wake_up_common+0x31/0x90
[229783.179131] PGD 0
[229783.179131] Oops: 0000 [#1] SMP
[229783.179835] CPU 7
[229783.179835] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ext2 psmouse joydev serio_raw mac_hid ppdev i2c_piix4 parport_pc i6300esb lp parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate usbhid hid floppy
[229783.179835]
[229783.179835] Pid: 2441, comm: apache2 Tainted: P           O 3.3.0owncomp #1 Bochs Bochs
[229783.179835] RIP: 0010:[<ffffffff8107c8a1>]  [<ffffffff8107c8a1>] __wake_up_common+0x31/0x90
[229783.179835] RSP: 0018:ffff8800a52e9af8  EFLAGS: 00010007
[229783.179835] RAX: ffffffffffffffe8 RBX: ffff8805b4213a58 RCX: 0000000000000000
[229783.179835] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8805b4213a58
[229783.179835] RBP: ffff8800a52e9b38 R08: 0000000000000000 R09: 0000000000000000
[229783.179835] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000286
[229783.179835] R13: ffff8805b4213a60 R14: 0000000000000000 R15: 0000000000000000
[229783.179835] FS:  00007f3dca3a4740(0000) GS:ffff88081fdc0000(0000) knlGS:0000000000000000
[229783.179835] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[229783.179835] CR2: 0000000000000000 CR3: 0000000656110000 CR4: 00000000000006e0
[229783.179835] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[229783.179835] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[229783.179835] Process apache2 (pid: 2441, threadinfo ffff8800a52e8000, task ffff8807e615dbc0)
[229783.179835] Stack:
[229783.179835]  0000000000000000 0000000300000001 ffff8800a52e9b18 ffff8805b4213a58
[229783.179835]  0000000000000286 0000000000000003 0000000000000001 0000000000000000
[229783.179835]  ffff8800a52e9b78 ffffffff8107f918 ffffffff8107f8d1 ffff8805b4213a38
[229783.179835] Call Trace:
[229783.179835]  [<ffffffff8107f918>] __wake_up+0x48/0x70
[229783.179835]  [<ffffffff8107f8d1>] ? __wake_up+0x1/0x70
[229783.179835]  [<ffffffffa005d835>] cv_wait_common+0x135/0x190 [spl]
[229783.179835]  [<ffffffff81074350>] ? wake_up_bit+0x40/0x40
[229783.179835]  [<ffffffffa005d8c3>] __cv_wait+0x13/0x20 [spl]
[229783.179835]  [<ffffffffa0172524>] zfs_range_lock+0x294/0x5c0 [zfs]
[229783.179835]  [<ffffffff8164ebce>] ? mutex_lock+0x1e/0x50
[229783.179835]  [<ffffffffa017c875>] zfs_write+0x215/0xc50 [zfs]
[229783.179835]  [<ffffffff811423c7>] ? handle_pte_fault+0xf7/0xae0
[229783.179835]  [<ffffffff8130f7bb>] ? cpumask_any_but+0x2b/0x40
[229783.179835]  [<ffffffff81142f85>] ? handle_mm_fault+0x1d5/0x350
[229783.179835]  [<ffffffff8113e234>] ? tlb_finish_mmu+0x14/0x40
[229783.179835]  [<ffffffffa018fd32>] zpl_write_common+0x52/0x80 [zfs]
[229783.179835]  [<ffffffffa018fdc8>] zpl_write+0x68/0xa0 [zfs]
[229783.179835]  [<ffffffff8117c108>] vfs_write+0xc8/0x190
[229783.179835]  [<ffffffff8117c2c1>] sys_write+0x51/0x90
[229783.179835]  [<ffffffff816593e9>] system_call_fastpath+0x16/0x1b
[229783.179835] Code: 41 56 41 55 41 54 53 48 83 ec 18 66 66 66 66 90 89 75 cc 89 55 c8 4c 8d 6f 08 48 8b 57 08 41 89 cf 4d 89 c6 48 8d 42 e8 4c 39 ea <48> 8b 58 18 74 3f 48 83 eb 18 eb 0a eb 01 90 48 89 d8 48 8d 5a
[229783.179835] RIP  [<ffffffff8107c8a1>] __wake_up_common+0x31/0x90
[229783.179835]  RSP <ffff8800a52e9af8>
[229783.179835] CR2: 0000000000000000
[229783.179835] ---[ end trace 69b7c03ea10d6ff2 ]---

DanielSmedegaardBuus · 2012-08-30T13:33:29Z

I'd like to note that my issue stemmed from problems with Intel SpeedStep vs. a rather low-quality consumer motherboard. I no longer have this issue after disabling SpeedStep. So please disregard my messages, as the problems I experienced were not related to ZFS.

Sorry for any inconvenience and thank you so much for native ZFS :)

behlendorf · 2012-09-17T19:15:37Z

@phillipp No promises this will fix your issue, but a very small race was recently closed in the spl condition variable code. Since this is where your NULL dereference is you may want to pick up the fix and see if your still able to reproduce the problem. See commit openzfs/spl@3c60f50

phillipp · 2012-09-18T21:25:12Z

We moved all data off the zfs and back to a reiserfs because of the problems...

behlendorf · 2012-09-18T21:44:01Z

Totally understandable.

ryao · 2013-03-18T15:46:25Z

@phillipp It might be too late to ask, but if that kernel is still around, would you load the raw vmlinux into gdb and run list *__wake_up_common+0x31/0x90 to find out what line of source code the NULL pointer exception was on.

phillipp · 2013-03-18T21:05:49Z

Yep, sorry, i'ts too late...

behlendorf · 2013-04-11T21:36:35Z

Since we've lost our test case and there have been a lot of zfs fixes since this was last updated I'm going to close the issue. We can reopen it or file a new issue if similar symptoms are observed with the latest code.

There are only a few operations that use the caller-visible timeout `OAError::TimeoutError`: * The PoolOwnerPhys, during try_claim(). Timeout is <2s. * PoolPhys::create(), during pool creation. Timeout is 30s. * The HeartbeatPhys, during start_heartbeat(). Timeout is 10s. * create_object_test(), during test_connectivity. Timeout is 30s. This commit simplifies the timeout code by: * Always applying the PER_OPERATION_TIMEOUT (2s), which is retried transparently and indefinitely. * Optionally applying a user-visible timeout only to put_object_timeout(). There's a slight change in behavior which is that when the caller's timeout is >2s, their request may now be timed out and retried after 2s, whereas before we would wait until the caller-specified timeout (up to 30s) and then fail. This allows reducing the None arguments that are passed in many places, as well as the tricky `xor` logic in the retry code.

phillipp mentioned this issue Jul 22, 2012

Random hangs/kernel panics #842

Closed

behlendorf mentioned this issue Sep 17, 2012

ZFS + NFS soft lockups #825

Closed

ryao mentioned this issue Sep 30, 2012

ZFSOnLinux is not NUMA-safe #1000

Closed

behlendorf closed this as completed Apr 11, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

soft lockups und NULL pointer dereferenes after upgrade to ubuntu 12.04 #837

soft lockups und NULL pointer dereferenes after upgrade to ubuntu 12.04 #837

phillipp commented Jul 17, 2012

phillipp commented Jul 18, 2012

thomasba commented Jul 18, 2012

phillipp commented Jul 18, 2012

phillipp commented Jul 18, 2012

thomasba commented Jul 22, 2012

phillipp commented Jul 22, 2012

ryao commented Jul 23, 2012

thomasba commented Jul 26, 2012

thomasba commented Aug 1, 2012

phillipp commented Aug 16, 2012

behlendorf commented Aug 17, 2012

thomasba commented Aug 21, 2012

thomasba commented Aug 22, 2012

behlendorf commented Aug 22, 2012

thomasba commented Aug 23, 2012

DanielSmedegaardBuus commented Aug 28, 2012

DanielSmedegaardBuus commented Aug 28, 2012

DanielSmedegaardBuus commented Aug 28, 2012

phillipp commented Aug 30, 2012

DanielSmedegaardBuus commented Aug 30, 2012

behlendorf commented Sep 17, 2012

phillipp commented Sep 18, 2012

behlendorf commented Sep 18, 2012

ryao commented Mar 18, 2013

phillipp commented Mar 18, 2013

behlendorf commented Apr 11, 2013

soft lockups und NULL pointer dereferenes after upgrade to ubuntu 12.04 #837

soft lockups und NULL pointer dereferenes after upgrade to ubuntu 12.04 #837

Comments

phillipp commented Jul 17, 2012

phillipp commented Jul 18, 2012

thomasba commented Jul 18, 2012

phillipp commented Jul 18, 2012

phillipp commented Jul 18, 2012

thomasba commented Jul 22, 2012

phillipp commented Jul 22, 2012

ryao commented Jul 23, 2012

thomasba commented Jul 26, 2012

thomasba commented Aug 1, 2012

phillipp commented Aug 16, 2012

behlendorf commented Aug 17, 2012

thomasba commented Aug 21, 2012

thomasba commented Aug 22, 2012

behlendorf commented Aug 22, 2012

thomasba commented Aug 23, 2012

DanielSmedegaardBuus commented Aug 28, 2012

DanielSmedegaardBuus commented Aug 28, 2012

DanielSmedegaardBuus commented Aug 28, 2012

phillipp commented Aug 30, 2012

DanielSmedegaardBuus commented Aug 30, 2012

behlendorf commented Sep 17, 2012

phillipp commented Sep 18, 2012

behlendorf commented Sep 18, 2012

ryao commented Mar 18, 2013

phillipp commented Mar 18, 2013

behlendorf commented Apr 11, 2013