Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS seems to be causing kernel BUG #1687

Closed
pepoluan opened this issue Aug 30, 2013 · 13 comments
Closed

ZFS seems to be causing kernel BUG #1687

pepoluan opened this issue Aug 30, 2013 · 13 comments
Milestone

Comments

@pepoluan
Copy link

My email server has been experiencing kernel BUGs since the past 2 days. There has been 4 (four) incidents until now. Prakash Surya (of LLNL) at the ZoL forum suggested I paste the BUG traces here.

So, hereby I pasted the first BUG of every incident. (Search for "Incident No" to quickly find the start of each Incident). I also uploaded the syslog's (grepped to show only kernel: messages) to MediaFire: http://www.mediafire.com/folder/3puddxgab2ydk/BUGs_01

Incident No. 1:
Aug 28 12:17:34 mail-3 kernel: [130936.076081] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss_h/0:14959]
Aug 28 12:17:34 mail-3 kernel: [130936.076103] Modules linked in: tcp_diag inet_diag nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc xenfs lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 28 12:17:34 mail-3 kernel: [130936.076124] CPU 0
Aug 28 12:17:34 mail-3 kernel: [130936.076127] Modules linked in: tcp_diag inet_diag nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc xenfs lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 28 12:17:34 mail-3 kernel: [130936.076141]
Aug 28 12:17:34 mail-3 kernel: [130936.076149] Pid: 14959, comm: z_wr_iss_h/0 Tainted: P O 3.2.0-52-virtual #78-Ubuntu
Aug 28 12:17:34 mail-3 kernel: [130936.076154] RIP: e030:[] [] mutex_lock+0x1b/0x40
Aug 28 12:17:34 mail-3 kernel: [130936.076165] RSP: e02b:ffff88026c0f1bd0 EFLAGS: 00000246
Aug 28 12:17:34 mail-3 kernel: [130936.076171] RAX: ffff88025db3a900 RBX: ffff880266c7fa80 RCX: ffff880266c7fa80
Aug 28 12:17:34 mail-3 kernel: [130936.076173] RDX: ffff880267b433c0 RSI: 8000000000000000 RDI: ffff880266c7fa80
Aug 28 12:17:34 mail-3 kernel: [130936.076175] RBP: ffff88026c0f1be0 R08: 00000004affb8000 R09: 0000000000000010
Aug 28 12:17:34 mail-3 kernel: [130936.076177] R10: 0000000000000010 R11: 0000000000000001 R12: ffffc900683058f0
Aug 28 12:17:34 mail-3 kernel: [130936.076179] R13: ffff880267b43300 R14: 000000000000e000 R15: 000000095ff70000
Aug 28 12:17:34 mail-3 kernel: [130936.076189] FS: 00007ffa0648c700(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
Aug 28 12:17:34 mail-3 kernel: [130936.076191] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 28 12:17:34 mail-3 kernel: [130936.076193] CR2: 00007ffa0607018c CR3: 000000003ba5f000 CR4: 0000000000000660
Aug 28 12:17:34 mail-3 kernel: [130936.076196] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 28 12:17:34 mail-3 kernel: [130936.076199] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 28 12:17:34 mail-3 kernel: [130936.076201] Process z_wr_iss_h/0 (pid: 14959, threadinfo ffff88026c0f0000, task ffff8802653ddb80)
Aug 28 12:17:34 mail-3 kernel: [130936.076203] Stack:
Aug 28 12:17:34 mail-3 kernel: [130936.076207] 0000000000000000 ffffc900683058f0 ffff88026c0f1d20 ffffffffa010dc78
Aug 28 12:17:34 mail-3 kernel: [130936.076219] 0000000000000000 0000000000000002 ffff88026c0f1c90 ffffffffa0131600
Aug 28 12:17:34 mail-3 kernel: [130936.076223] ffff880200000002 0000003000000000 ffff880200000000 ffffffffa0130e10
Aug 28 12:17:34 mail-3 kernel: [130936.076227] Call Trace:
Aug 28 12:17:34 mail-3 kernel: [130936.076308] [] metaslab_alloc+0x338/0xb30 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076339] [] ? vdev_mirror_io_start+0x220/0x4e0 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076365] [] ? vdev_config_sync+0x180/0x180 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076396] [] ? zio_cons+0x140/0x140 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076421] [] zio_dva_allocate+0x8c/0x410 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076446] [] ? zio_push_transform+0x51/0xb0 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076471] [] ? zio_taskq_member.isra.5+0x75/0xb0 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076496] [] zio_execute+0xba/0x160 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076513] [] taskq_thread+0x236/0x4b0 [spl]
Aug 28 12:17:34 mail-3 kernel: [130936.076520] [] ? try_to_wake_up+0x200/0x200
Aug 28 12:17:34 mail-3 kernel: [130936.076528] [] ? task_done+0x160/0x160 [spl]
Aug 28 12:17:34 mail-3 kernel: [130936.076535] [] kthread+0x8c/0xa0
Aug 28 12:17:34 mail-3 kernel: [130936.076540] [] kernel_thread_helper+0x4/0x10
Aug 28 12:17:34 mail-3 kernel: [130936.076544] [] ? int_ret_from_sys_call+0x7/0x1b
Aug 28 12:17:34 mail-3 kernel: [130936.076548] [] ? retint_restore_args+0x5/0x6
Aug 28 12:17:34 mail-3 kernel: [130936.076552] [] ? gs_change+0x13/0x13
Aug 28 12:17:34 mail-3 kernel: [130936.076553] Code: 4c 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb f0 ff 0f <79> 05 e8 6e 03 00 00 65 48 8b 04 25 00 c5 00 00 4c 8b 65 f8 48
Aug 28 12:17:34 mail-3 kernel: [130936.076616] Call Trace:
Aug 28 12:17:34 mail-3 kernel: [130936.076641] [] metaslab_alloc+0x338/0xb30 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076667] [] ? vdev_mirror_io_start+0x220/0x4e0 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076693] [] ? vdev_config_sync+0x180/0x180 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076717] [] ? zio_cons+0x140/0x140 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076742] [] zio_dva_allocate+0x8c/0x410 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076767] [] ? zio_push_transform+0x51/0xb0 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076792] [] ? zio_taskq_member.isra.5+0x75/0xb0 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076817] [] zio_execute+0xba/0x160 [zfs]
Aug 28 12:17:34 mail-3 kernel: [130936.076824] [] taskq_thread+0x236/0x4b0 [spl]
Aug 28 12:17:34 mail-3 kernel: [130936.076828] [] ? try_to_wake_up+0x200/0x200
Aug 28 12:17:34 mail-3 kernel: [130936.076835] [] ? task_done+0x160/0x160 [spl]
Aug 28 12:17:34 mail-3 kernel: [130936.076838] [] kthread+0x8c/0xa0
Aug 28 12:17:34 mail-3 kernel: [130936.076842] [] kernel_thread_helper+0x4/0x10
Aug 28 12:17:34 mail-3 kernel: [130936.076845] [] ? int_ret_from_sys_call+0x7/0x1b
Aug 28 12:17:34 mail-3 kernel: [130936.076848] [] ? retint_restore_args+0x5/0x6
Aug 28 12:17:34 mail-3 kernel: [130936.076852] [] ? gs_change+0x13/0x13

Incident No. 2:
Aug 29 17:25:30 mail-3 kernel: [66052.076079] BUG: soft lockup - CPU#0 stuck for 22s! [axigen:5814]
Aug 29 17:25:30 mail-3 kernel: [66052.076099] Modules linked in: tcp_diag inet_diag xenfs nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 29 17:25:30 mail-3 kernel: [66052.076121] CPU 0
Aug 29 17:25:30 mail-3 kernel: [66052.076126] Modules linked in: tcp_diag inet_diag xenfs nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 29 17:25:30 mail-3 kernel: [66052.076140]
Aug 29 17:25:30 mail-3 kernel: [66052.076145] Pid: 5814, comm: axigen Tainted: P O 3.2.0-52-virtual #78-Ubuntu
Aug 29 17:25:30 mail-3 kernel: [66052.076149] RIP: e030:[] [] avl_first+0x29/0x40 [zavl]
Aug 29 17:25:30 mail-3 kernel: [66052.076158] RSP: e02b:ffff8801e233dc68 EFLAGS: 00000286
Aug 29 17:25:30 mail-3 kernel: [66052.076160] RAX: ffff880264072bc0 RBX: 0000000000000000 RCX: 00000000000000c0
Aug 29 17:25:30 mail-3 kernel: [66052.076162] RDX: ffff880264072cc0 RSI: 8000000000000000 RDI: ffff88026e352860
Aug 29 17:25:30 mail-3 kernel: [66052.076164] RBP: ffff8801e233dc68 R08: 00000004affb8000 R09: 0000000000000010
Aug 29 17:25:30 mail-3 kernel: [66052.076166] R10: 0000000000000010 R11: 0000000000000001 R12: ffffc9012cae0008
Aug 29 17:25:30 mail-3 kernel: [66052.076169] R13: ffff880264072b00 R14: 0000000000003000 R15: 000000095ff70000
Aug 29 17:25:30 mail-3 kernel: [66052.076180] FS: 00007f2b793af700(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
Aug 29 17:25:30 mail-3 kernel: [66052.076185] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 29 17:25:30 mail-3 kernel: [66052.076187] CR2: 00007fe76422518c CR3: 000000026c286000 CR4: 0000000000000660
Aug 29 17:25:30 mail-3 kernel: [66052.076190] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 29 17:25:30 mail-3 kernel: [66052.076192] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 29 17:25:30 mail-3 kernel: [66052.076195] Process axigen (pid: 5814, threadinfo ffff8801e233c000, task ffff88022978c4a0)
Aug 29 17:25:30 mail-3 kernel: [66052.076197] Stack:
Aug 29 17:25:30 mail-3 kernel: [66052.076201] ffff8801e233dda8 ffffffffa010dc81 ffffffffffffffff 007a9e0eefb7cfc1
Aug 29 17:25:30 mail-3 kernel: [66052.076208] ffff8801e233dcc8 ffff8801e233dd28 00000000019f013c ffffffffade1e06a
Aug 29 17:25:30 mail-3 kernel: [66052.076212] 0000000001160f56 ffffffff81cdad80 ffff8801e233dcc8 ffffffff8100a820
Aug 29 17:25:30 mail-3 kernel: [66052.076216] Call Trace:
Aug 29 17:25:30 mail-3 kernel: [66052.076292] [] metaslab_alloc+0x341/0xb30 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076303] [] ? xen_clocksource_read+0x20/0x30
Aug 29 17:25:30 mail-3 kernel: [66052.076307] [] ? xen_clocksource_get_cycles+0x9/0x10
Aug 29 17:25:30 mail-3 kernel: [66052.076339] [] zio_alloc_zil+0x60/0x180 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076364] [] zil_lwb_write_start+0x2e9/0x390 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076390] [] zil_commit+0x44d/0x6a0 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076416] [] zfs_fsync+0x91/0x110 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076441] [] zpl_fsync+0x59/0x80 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076447] [] ? fput+0x25/0x30
Aug 29 17:25:30 mail-3 kernel: [66052.076454] [] do_fsync+0x56/0x80
Aug 29 17:25:30 mail-3 kernel: [66052.076458] [] sys_fsync+0x10/0x20
Aug 29 17:25:30 mail-3 kernel: [66052.076463] [] system_call_fastpath+0x16/0x1b
Aug 29 17:25:30 mail-3 kernel: [66052.076464] Code: 00 00 55 48 89 e5 66 66 66 66 90 48 8b 17 31 c0 48 8b 4f 10 48 85 d2 75 0c eb 1d 0f 1f 80 00 00 00 00 48 89 c2 48 8b 02 48 85 c0 <75> f5 48 85 d2 74 08 48 89 d0 48 29 c8 5d c3 31 c0 5d c3 0f 1f
Aug 29 17:25:30 mail-3 kernel: [66052.076524] Call Trace:
Aug 29 17:25:30 mail-3 kernel: [66052.076553] [] metaslab_alloc+0x341/0xb30 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076557] [] ? xen_clocksource_read+0x20/0x30
Aug 29 17:25:30 mail-3 kernel: [66052.076561] [] ? xen_clocksource_get_cycles+0x9/0x10
Aug 29 17:25:30 mail-3 kernel: [66052.076586] [] zio_alloc_zil+0x60/0x180 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076611] [] zil_lwb_write_start+0x2e9/0x390 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076636] [] zil_commit+0x44d/0x6a0 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076662] [] zfs_fsync+0x91/0x110 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076686] [] zpl_fsync+0x59/0x80 [zfs]
Aug 29 17:25:30 mail-3 kernel: [66052.076689] [] ? fput+0x25/0x30
Aug 29 17:25:30 mail-3 kernel: [66052.076693] [] do_fsync+0x56/0x80
Aug 29 17:25:30 mail-3 kernel: [66052.076696] [] sys_fsync+0x10/0x20
Aug 29 17:25:30 mail-3 kernel: [66052.076699] [] system_call_fastpath+0x16/0x1b

Incident No. 3:
Aug 30 06:48:48 mail-3 kernel: [46184.187094] BUG: soft lockup - CPU#0 stuck for 23s! [axigen:17296]
Aug 30 06:48:48 mail-3 kernel: [46184.187127] Modules linked in: tcp_diag inet_diag xenfs nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 30 06:48:48 mail-3 kernel: [46184.187246] CPU 0
Aug 30 06:48:48 mail-3 kernel: [46184.187252] Modules linked in: tcp_diag inet_diag xenfs nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 30 06:48:48 mail-3 kernel: [46184.187375]
Aug 30 06:48:48 mail-3 kernel: [46184.187404] Pid: 17296, comm: axigen Tainted: P O 3.2.0-52-virtual #78-Ubuntu
Aug 30 06:48:48 mail-3 kernel: [46184.187424] RIP: e030:[] [] mutex_unlock+0x14/0x20
Aug 30 06:48:48 mail-3 kernel: [46184.187467] RSP: e02b:ffff8801cfec3c68 EFLAGS: 00000202
Aug 30 06:48:48 mail-3 kernel: [46184.187478] RAX: 8000000000000000 RBX: 0000000000000000 RCX: 00000000000000c0
Aug 30 06:48:48 mail-3 kernel: [46184.187504] RDX: ffff880263bf47c0 RSI: 8000000000000000 RDI: ffff88026bb01180
Aug 30 06:48:48 mail-3 kernel: [46184.187518] RBP: ffff8801cfec3c68 R08: 00000004affb8000 R09: 0000000000000010
Aug 30 06:48:48 mail-3 kernel: [46184.187535] R10: 0000000000000010 R11: 0000000000000001 R12: ffffc90113c5d008
Aug 30 06:48:48 mail-3 kernel: [46184.187547] R13: ffff880263bf4700 R14: 0000000000020000 R15: 000000095ff70000
Aug 30 06:48:48 mail-3 kernel: [46184.187572] FS: 00007f6bf4103700(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
Aug 30 06:48:48 mail-3 kernel: [46184.187583] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 30 06:48:48 mail-3 kernel: [46184.187596] CR2: 00007f53f43429b8 CR3: 00000001e1f20000 CR4: 0000000000000660
Aug 30 06:48:48 mail-3 kernel: [46184.187608] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 30 06:48:48 mail-3 kernel: [46184.187620] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 30 06:48:48 mail-3 kernel: [46184.187632] Process axigen (pid: 17296, threadinfo ffff8801cfec2000, task ffff8801d937adc0)
Aug 30 06:48:48 mail-3 kernel: [46184.187646] Stack:
Aug 30 06:48:48 mail-3 kernel: [46184.187663] ffff8801cfec3da8 ffffffffa010deec ffffffffffffffff 007b04283b7bce8d
Aug 30 06:48:48 mail-3 kernel: [46184.187696] ffff8801cfec3cc8 ffff8801cfec3d28 0000000000e53050 ffffffffade0d687
Aug 30 06:48:48 mail-3 kernel: [46184.187744] 000000001f962f20 ffffffff81cdad80 ffff8801cfec3cc8 ffffffff8100a820
Aug 30 06:48:48 mail-3 kernel: [46184.187807] Call Trace:
Aug 30 06:48:48 mail-3 kernel: [46184.187948] [] metaslab_alloc+0x5ac/0xb30 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.187980] [] ? xen_clocksource_read+0x20/0x30
Aug 30 06:48:48 mail-3 kernel: [46184.188005] [] ? xen_clocksource_get_cycles+0x9/0x10
Aug 30 06:48:48 mail-3 kernel: [46184.188102] [] zio_alloc_zil+0x60/0x180 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.188195] [] zil_lwb_write_start+0x2e9/0x390 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.188284] [] zil_commit+0x44d/0x6a0 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.188376] [] zfs_fsync+0x91/0x110 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.188466] [] zpl_fsync+0x59/0x80 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.188492] [] do_fsync+0x56/0x80
Aug 30 06:48:48 mail-3 kernel: [46184.188517] [] sys_fsync+0x10/0x20
Aug 30 06:48:48 mail-3 kernel: [46184.188534] [] system_call_fastpath+0x16/0x1b
Aug 30 06:48:48 mail-3 kernel: [46184.188552] Code: 00 00 4c 8b 65 f8 48 89 43 18 48 8b 5d f0 c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 c7 47 18 00 00 00 00 f0 ff 07 <7f> 05 e8 d5 02 00 00 5d c3 0f 1f 00 55 48 89 e5 66 66 66 66 90
Aug 30 06:48:48 mail-3 kernel: [46184.188970] Call Trace:
Aug 30 06:48:48 mail-3 kernel: [46184.189049] [] metaslab_alloc+0x5ac/0xb30 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.189072] [] ? xen_clocksource_read+0x20/0x30
Aug 30 06:48:48 mail-3 kernel: [46184.189087] [] ? xen_clocksource_get_cycles+0x9/0x10
Aug 30 06:48:48 mail-3 kernel: [46184.189170] [] zio_alloc_zil+0x60/0x180 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.189256] [] zil_lwb_write_start+0x2e9/0x390 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.189342] [] zil_commit+0x44d/0x6a0 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.189430] [] zfs_fsync+0x91/0x110 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.189521] [] zpl_fsync+0x59/0x80 [zfs]
Aug 30 06:48:48 mail-3 kernel: [46184.189542] [] do_fsync+0x56/0x80
Aug 30 06:48:48 mail-3 kernel: [46184.189556] [] sys_fsync+0x10/0x20
Aug 30 06:48:48 mail-3 kernel: [46184.189574] [] system_call_fastpath+0x16/0x1b

Incident No.4:
Aug 30 09:02:53 mail-3 kernel: [ 6428.080074] BUG: soft lockup - CPU#0 stuck for 22s! [axigen:7693]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080094] Modules linked in: tcp_diag inet_diag xenfs nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 30 09:02:53 mail-3 kernel: [ 6428.080116] CPU 0
Aug 30 09:02:53 mail-3 kernel: [ 6428.080116] Modules linked in: tcp_diag inet_diag xenfs nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport tcp_westwood zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl(O) zlib_deflate
Aug 30 09:02:53 mail-3 kernel: [ 6428.080136]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080140] Pid: 7693, comm: axigen Tainted: P O 3.2.0-52-virtual #78-Ubuntu
Aug 30 09:02:53 mail-3 kernel: [ 6428.080143] RIP: e030:[] [] mutex_unlock+0x14/0x20
Aug 30 09:02:53 mail-3 kernel: [ 6428.080154] RSP: e02b:ffff880213de7c68 EFLAGS: 00000202
Aug 30 09:02:53 mail-3 kernel: [ 6428.080157] RAX: 8000000000000000 RBX: 0000000000000000 RCX: 00000000000000c0
Aug 30 09:02:53 mail-3 kernel: [ 6428.080159] RDX: ffff880265c2cbc0 RSI: 8000000000000000 RDI: ffff88026e623300
Aug 30 09:02:53 mail-3 kernel: [ 6428.080161] RBP: ffff880213de7c68 R08: 00000004affb8000 R09: 0000000000000010
Aug 30 09:02:53 mail-3 kernel: [ 6428.080163] R10: 0000000000000010 R11: 0000000000000001 R12: ffffc9015abd7008
Aug 30 09:02:53 mail-3 kernel: [ 6428.080165] R13: ffff880265c2cb00 R14: 0000000000003000 R15: 000000095ff70000
Aug 30 09:02:53 mail-3 kernel: [ 6428.080175] FS: 00007f52adefb700(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
Aug 30 09:02:53 mail-3 kernel: [ 6428.080178] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 30 09:02:53 mail-3 kernel: [ 6428.080180] CR2: 00007f51a8bcc000 CR3: 000000026bf59000 CR4: 0000000000000660
Aug 30 09:02:53 mail-3 kernel: [ 6428.080183] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 30 09:02:53 mail-3 kernel: [ 6428.080185] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 30 09:02:53 mail-3 kernel: [ 6428.080188] Process axigen (pid: 7693, threadinfo ffff880213de6000, task ffff880236baadc0)
Aug 30 09:02:53 mail-3 kernel: [ 6428.080190] Stack:
Aug 30 09:02:53 mail-3 kernel: [ 6428.080199] ffff880213de7da8 ffffffffa010deec ffffffffffffffff 007b188212898ab9
Aug 30 09:02:53 mail-3 kernel: [ 6428.080206] ffff880213de7cc8 ffff880213de7d28 00000000002b06f0 ffffffffade01bce
Aug 30 09:02:53 mail-3 kernel: [ 6428.080211] 000000003024f667 ffffffff81cdad80 ffff880213de7cc8 ffffffff8100a820
Aug 30 09:02:53 mail-3 kernel: [ 6428.080215] Call Trace:
Aug 30 09:02:53 mail-3 kernel: [ 6428.080288] [] metaslab_alloc+0x5ac/0xb30 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080296] [] ? xen_clocksource_read+0x20/0x30
Aug 30 09:02:53 mail-3 kernel: [ 6428.080300] [] ? xen_clocksource_get_cycles+0x9/0x10
Aug 30 09:02:53 mail-3 kernel: [ 6428.080334] [] zio_alloc_zil+0x60/0x180 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080359] [] zil_lwb_write_start+0x2e9/0x390 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080385] [] zil_commit+0x44d/0x6a0 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080410] [] zfs_fsync+0x91/0x110 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080435] [] zpl_fsync+0x59/0x80 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080441] [] do_fsync+0x56/0x80
Aug 30 09:02:53 mail-3 kernel: [ 6428.080444] [] sys_fsync+0x10/0x20
Aug 30 09:02:53 mail-3 kernel: [ 6428.080448] [] system_call_fastpath+0x16/0x1b
Aug 30 09:02:53 mail-3 kernel: [ 6428.080450] Code: 00 00 4c 8b 65 f8 48 89 43 18 48 8b 5d f0 c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 c7 47 18 00 00 00 00 f0 ff 07 <7f> 05 e8 d5 02 00 00 5d c3 0f 1f 00 55 48 89 e5 66 66 66 66 90
Aug 30 09:02:53 mail-3 kernel: [ 6428.080521] Call Trace:
Aug 30 09:02:53 mail-3 kernel: [ 6428.080546] [] metaslab_alloc+0x5ac/0xb30 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080550] [] ? xen_clocksource_read+0x20/0x30
Aug 30 09:02:53 mail-3 kernel: [ 6428.080553] [] ? xen_clocksource_get_cycles+0x9/0x10
Aug 30 09:02:53 mail-3 kernel: [ 6428.080578] [] zio_alloc_zil+0x60/0x180 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080603] [] zil_lwb_write_start+0x2e9/0x390 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080628] [] zil_commit+0x44d/0x6a0 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080653] [] zfs_fsync+0x91/0x110 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080678] [] zpl_fsync+0x59/0x80 [zfs]
Aug 30 09:02:53 mail-3 kernel: [ 6428.080682] [] do_fsync+0x56/0x80
Aug 30 09:02:53 mail-3 kernel: [ 6428.080685] [] sys_fsync+0x10/0x20
Aug 30 09:02:53 mail-3 kernel: [ 6428.080689] [] system_call_fastpath+0x16/0x1b

@pepoluan
Copy link
Author

Forgot to add details:

uname -a returns:

Linux mail-3 3.2.0-52-virtual #78-Ubuntu SMP Fri Jul 26 16:45:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

dpkg-query -l | awk '$1$2 ~ /zfs|spl/ {print $1,$2,$3}' returns:

ii libzfs1 0.6.2-1precise
ii spl 0.6.2-1
precise
ii spl-dkms 0.6.2-1precise
ii ubuntu-zfs 7
precise
ii zfs-auto-snapshot 1.0.8-0ubuntu1precise1
ii zfs-dkms 0.6.2-1
precise
ii zfsutils 0.6.2-1~precise

@DeHackEd
Copy link
Contributor

Unfortunately when memory gets full problems arise.

I'd suggest manually limiting your ARC size (default is 1/2 of RAM, maybe make it 1/4). You can do this by changing the kernel module parameter zfs_arc_max to a value in bytes.

@atonkyra
Copy link

atonkyra commented Sep 1, 2013

Are you using PREEMPT? If not, please try with PREEMPT_VOLUNTARY.

This may be a duplicate of #1682, #1604 and #1620

@pepoluan
Copy link
Author

pepoluan commented Sep 1, 2013

@atonkyra well, I'm a newbie of ZoL, so I have to honestly admit that I do not understand what you meant by PREEMPT. Is it perhaps a kernel config?

@DeHackEd won't that cause a significant drop in performance? Will creating a ZRAM-backed L2ARC help alleviate the performance drop?

@atonkyra
Copy link

atonkyra commented Sep 1, 2013

@pepoluan yes, that is a kernel config parameter, zcat /proc/config.gz | grep PREEMPT might give you some clues if you didn't build the kernel yourself.

@pepoluan
Copy link
Author

Sorry for not answering sooner.

@atonkyra hmm, I'm reluctant to build my own kernel. Not that I'm unfamiliar with the process -- I used to use Gentoo -- but because I'm afraid I'll get out of sync with whatever updates being pushed for Ubuntu.

That said, I just want to add the following:

I think the issue might be related to RAM fragmentation. In the server where I saw the kernel BUG, I suspect the RAM is extremely fragmented, because I am running Perdition there. With more than 2000 total users, I see a huge amount of churn re: Perdition processes. A lot of processes exist only for several seconds before terminating.

On another server, also busy but without process churn, ZFS has been extremely stable.

@atonkyra
Copy link

In my case a mysql replication running on a VM was enough to trigger the bug. You, however, didn't answer my question about the PREEMPT being on or off so please provide that detail :)

@pepoluan
Copy link
Author

Ah, sorry about that. grep "PREEMPT" /boot/config-$(uname -r) returns:

# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

Just to reiterate, uname -a is:

Linux mail-3 3.2.0-52-virtual #78-Ubuntu SMP Fri Jul 26 16:45:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

@atonkyra
Copy link

You could install a desktop kernel and test with it. Ubuntu has those in their repository IIRC and they should have voluntary PREEMPT set.

@ColdCanuck
Copy link
Contributor

AFAIK Ubuntu std. install server kernels have voluntary PREEMPT set as well.

Two recent installs

root@zephyr:# uname -a
Linux zephyr 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
root@zephyr:# grep "PREEMPT" /boot/config-$(uname -r)

CONFIG_PREEMPT_RCU is not set

CONFIG_PREEMPT_NOTIFIERS=y

CONFIG_PREEMPT_NONE is not set

CONFIG_PREEMPT_VOLUNTARY=y

CONFIG_PREEMPT is not set

root@zebra:# uname -a
Linux zebra 3.5.0-39-generic #60
precise1-Ubuntu SMP Wed Aug 14 15:38:41 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
root@zebra:~# grep "PREEMPT" /boot/config-$(uname -r)

CONFIG_PREEMPT_RCU is not set

CONFIG_PREEMPT_NOTIFIERS=y

CONFIG_PREEMPT_NONE is not set

CONFIG_PREEMPT_VOLUNTARY=y

CONFIG_PREEMPT is not set

On Sep 19,2013, at 11:41 , atonkyra wrote:

You could install a desktop kernel and test with it. Ubuntu has those in their repository IIRC and they should have voluntary PREEMPT set.


Reply to this email directly or view it on GitHub.

@behlendorf
Copy link
Contributor

This also may be related to #1720.

ryao pushed a commit to ryao/zfs that referenced this issue Oct 18, 2013
…read (fix race condition)

Ported-by: Richard Yao <[email protected]>

References:
https://www.illumos.org/issues/3552
illumos/illumos-gate@03f8c36

Porting notes:

This fixes an upstream regression that was introduced in
openzfs/zfs@e51be06, which ported
the Illumos 3552 changes. This fix was added to upstream rather quickly,
but at the time of the port, no one spotted it and the race was rare
enough that it passed our regression test. I discovered this when
comparing our metaslab.c to the illumos metaslab.c.

Closes:
openzfs#1687
openzfs#1720
openzfs#1747
@ryao
Copy link
Contributor

ryao commented Oct 18, 2013

I believe that #1797 will fix this.

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
3552 condensing one space map burns 3 seconds of CPU in spa_sync()
     thread (fix race condition)

References:
  https://www.illumos.org/issues/3552
  illumos/illumos-gate@03f8c36

Ported-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>

Porting notes:

This fixes an upstream regression that was introduced in commit
openzfs/zfs@e51be06, which
ported the Illumos 3552 changes. This fix was added to upstream
rather quickly, but at the time of the port, no one spotted it and
the race was rare enough that it passed our regression tests. I
discovered this when comparing our metaslab.c to the illumos
metaslab.c.

Without this change it is possible for metaslab_group_alloc() to
consume a large amount of cpu time.  Since this occurs under a
mutex in a rcu critical section the kernel will log this to the
console as a self-detected cpu stall as follows:

  INFO: rcu_sched self-detected stall on CPU { 0}
  (t=60000 jiffies g=11431890 c=11431889 q=18271)

Closes openzfs#1687
Closes openzfs#1720
Closes openzfs#1731
Closes openzfs#1747
@rnavarro
Copy link

Not sure if this helps anyone, but I ran into the same issue.

In my case, I had the linux-image-virtual package installed, the kernels that come with this package don't have voluntary PREEMPT set.

Instead, if you install linux-image-generic, those images will have voluntary PREEMPT set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants