You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a build of 0.7.0 RC3 with PR5824 to investigate the PRs impact on our performance problems (see #4880 for performance impact)
We run a Xen virtualised system with ZFS hosted in our driver domain, so providing zvols for each of our hosted guests. This performance testing has been going for approximately one week, last night all access to the server through the virtualised networking was lost, and all VMs were 'frozen'.
This is akin to the lock up zfs errors we have seen for 0.6.4.2, but analysing the virtual CPUs assigned to our disk domain did not show any issues. However, the cause was found in the kern.log on next reboot, see trace below:
@koplover thanks for filing the issue. This looks like some kind of memory corruption, but it's not clear how it could happen. This area of the code is also largely unrelated to the changes from PR5824. Definitely let us know if you see this again.
@behlendorf no problem, I'll keep an eye on it. I'd preferably like to try PR5824 against the 0.6.5.x stream as that is more stable than 0.7.0 RCs, and issues such as this may not occur - and if they do they are much more critical as people will really be putting that into production.
When I tried initially the merge of the PR based on 0.6.5.9 was not straightforward. I'll take a look and see if it is now on the daily builds for 0.6.5.x stream.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I created a build of 0.7.0 RC3 with PR5824 to investigate the PRs impact on our performance problems (see #4880 for performance impact)
We run a Xen virtualised system with ZFS hosted in our driver domain, so providing zvols for each of our hosted guests. This performance testing has been going for approximately one week, last night all access to the server through the virtualised networking was lost, and all VMs were 'frozen'.
This is akin to the lock up zfs errors we have seen for 0.6.4.2, but analysing the virtual CPUs assigned to our disk domain did not show any issues. However, the cause was found in the kern.log on next reboot, see trace below:
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213467] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213482] IP: [] buf_hash_find+0xc1/0x130 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213519] PGD 1a50bc067 PUD 25b0f4067 PMD 0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213527] Oops: 0000 [#1] SMP
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213532] Modules linked in: xen_blkback xen_netback drbd xt_addrtype xt_owner xt_multiport xt_hl xt_tcpudp xt_conntrack xt_NFLOG nfnetlink_log nfnetlink ip6table_mangle iptable_mangle ip6table_nat nf_conntrack_ipv6 nf
defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel dm_multipath aes_x86_64 lrw gf128mul scsi_dh g
lue_helper ablk_helper cryptd lru_cache libcrc32c xenfs xen_privcmd zfs(POE) zunicode(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) dm_mirror dm_region_hash dm_log raid0 multipath linear dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async
tx raid1 raid10 xor raid6_pq nvme ahci libahci [last unloaded: drbd]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213620] CPU: 2 PID: 23967 Comm: z_wr_int_5 Tainted: P B OE 3.19.0-39-zdomu #44~14.04.1
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213627] task: ffff880284de93a0 ti: ffff8802265c4000 task.ti: ffff8802265c4000
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213632] RIP: e030:[] [] buf_hash_find+0xc1/0x130 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213658] RSP: e02b:ffff8802265c7b18 EFLAGS: 00010202
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213663] RAX: 0000000000000007 RBX: ffff8802265c7b70 RCX: 972d0935c2d85410
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213669] RDX: 0000000000000008 RSI: 0000000000025083 RDI: 000000009f7c4a10
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213675] RBP: ffff8802265c7b58 R08: 972d0935c2d85410 R09: 000000018040003a
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213680] R10: ffffea0009ee6e80 R11: ffffffffc02f1dca R12: ffff8801a005db18
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213686] R13: ffffffffc058d5f0 R14: 009e7e0ecceee9e6 R15: 000000000006ea7a
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213696] FS: 0000000000000000(0000) GS:ffff880294d00000(0000) knlGS:0000000000504d80
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213703] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213707] CR2: 0000000000000007 CR3: 00000002417ba000 CR4: 0000000000002660
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213712] Stack:
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213715] ffff880286ee7400 0000000000025083 0000000000000000 ffff8801a005db18
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213723] ffff8802866d4000 0000000000000000 00000000000254ac 0000000000000000
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213731] ffff8802265c7b88 ffffffffc0400e65 ffff8802866d4000 0000000000000000
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213739] Call Trace:
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213768] [] arc_freed+0x25/0x90 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213823] [] zio_free_sync+0x45/0x140 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213834] [] ? __slab_free+0x101/0x2c1
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213882] [] zio_free+0xc3/0x120 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213919] [] dsl_free+0x11/0x20 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213954] [] dsl_dataset_block_kill+0x248/0x450 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213984] [] dbuf_write_done+0x172/0x1c0 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214009] [] arc_write_done+0x236/0x3d0 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214055] [] zio_done+0x2f9/0xc10 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214070] [] ? spl_kmem_free+0x2a/0x40 [spl]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214115] [] zio_execute+0x9c/0x100 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214128] [] taskq_thread+0x243/0x480 [spl]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214137] [] ? wake_up_state+0x20/0x20
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214148] [] ? taskq_cancel_id+0x120/0x120 [spl]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214155] [] kthread+0xc9/0xe0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214162] [] ? kthread_create_on_node+0x1c0/0x1c0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214171] [] ret_from_fork+0x58/0x90
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214178] [] ? kthread_create_on_node+0x1c0/0x1c0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214182] Code: 38 c1 48 8b 05 c9 a2 16 00 4a 8b 04 f8 48 85 c0 74 40 49 8b 14 24 48 8b 75 c8 eb 0f 66 0f 1f 44 00 00 48 8b 40 20 48 85 c0 74 27 <48> 39 10 75 f2 49 8b 7c 24 08 48 39 78 08 75 e7 48 3b 70 10 75
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214247] RIP [] buf_hash_find+0xc1/0x130 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214272] RSP
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214276] CR2: 0000000000000007
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214283] ---[ end trace 071f472bc78f6c36 ]---
The text was updated successfully, but these errors were encountered: