Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.7.0-rc3 kernel lockups when running bonnie++ benchmark #5932

Closed
SenH opened this issue Mar 27, 2017 · 4 comments
Closed

0.7.0-rc3 kernel lockups when running bonnie++ benchmark #5932

SenH opened this issue Mar 27, 2017 · 4 comments
Labels
Type: Regression Indicates a functional regression
Milestone

Comments

@SenH
Copy link
Contributor

SenH commented Mar 27, 2017

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 12.04.5 LTS
Linux Kernel 3.13.0-110-generic
Architecture x86_64
ZFS Version 0.7.0-rc3_154_g7b0dc2a31
SPL Version 0.7.0-rc3_7_gbf8abea

Describe the problem you're observing

After installing 0.7.0-rc3, my system consequently locks ups during a bonnie++ benchmark. All ZFS commands hang, only solution is to power cycle the system. Since this system is my home NAS I did not investigate further and rolled back to 0.6.5.9. I can provide the full syslog (which repeats the lockup messages until reboot) if necessary.

The bonnie benchmark completes successfully in 0.6.5.9.

Describe how to reproduce the problem

  • Installed 0.7.0-rc3
  • Ran bonnie benchmark bonnie++ -f -b -s 16G:1048576 -n 0 -d /mnt/tank

Include any warning/errors/backtraces from the system logs

[   48.201760] BUG: soft lockup - CPU#0 stuck for 22s! [z_null_int:2470]
[   48.203492] Modules linked in: ctr ccm sit ip_tunnel tunnel4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_TCPMSS ip6table_mangle ip6_tables xt_pkttype xt_recent ipt_REJECT xt_LOG xt_limit xt_state iptable_filter ipt_MASQUERADE xt_nat xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack zfs(POX) iptable_mangle iptable_raw zavl(POX) ip_tables x_tables zcommon(POX) znvpair(POX) icp(POX) spl(OX) zunicode(POX) joydev kvm_amd arc4 kvm amd64_edac_mod edac_core edac_mce_amd k10temp i2c_piix4 mac_hid ath9k_htc mac80211 ath9k_common shpchp ath9k_hw ath cfg80211 8021q mrp garp stp llc lp parport pata_acpi hid_generic usbhid hid ahci mvsas tg3 libsas ptp pata_atiixp scsi_transport_sas pps_core libahci ast ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea
[   48.217143] CPU: 0 PID: 2470 Comm: z_null_int Tainted: P           OX 3.13.0-110-generic #157~precise1-Ubuntu
[   48.219241] Hardware name: HP ProLiant MicroServer, BIOS O41     07/29/2011
[   48.221342] task: ffff880210a4c800 ti: ffff88020fd56000 task.ti: ffff88020fd56000
[   48.223486] RIP: 0010:[<ffffffffa068b017>]  [<ffffffffa068b017>] abd_iter_map+0x17/0x90 [zfs]
[   48.225789] RSP: 0018:ffff88020fd57be8  EFLAGS: 00000246
[   48.227971] RAX: ffff8802137b1578 RBX: 0000000000000001 RCX: ffffc90011fae000
[   48.230138] RDX: 000000000001c000 RSI: 000000000001c000 RDI: ffff88020fd57c48
[   48.232262] RBP: ffff88020fd57be8 R08: 000000000001c000 R09: ffffffffa068b0f0
[   48.234382] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[   48.236491] R13: ffff88020fd57be8 R14: 0000000000000000 R15: 0000000000000000
[   48.238594] FS:  00007fe9f0d42740(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[   48.240713] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   48.242858] CR2: 0000000001fff018 CR3: 0000000001c0d000 CR4: 00000000000007f0
[   48.245037] Stack:
[   48.247207]  ffff88020fd57ca8 ffffffffa068c66c 0000000000000000 0000000000004000
[   48.249451]  0000000000000000 ffff8802141de400 ffffc90011fae000 0000000000028000
[   48.251725]  ffff8800df3ddc08 0000000000018000 0000000000018000 0000000000000000
[   48.254001] Call Trace:
[   48.256351]  [<ffffffffa068c66c>] abd_iterate_func2+0x15c/0x210 [zfs]
[   48.258716]  [<ffffffffa068c741>] abd_copy_off+0x21/0x30 [zfs]
[   48.261053]  [<ffffffffa0701cb1>] vdev_cache_write+0x151/0x1c0 [zfs]
[   48.263376]  [<ffffffffa07074c5>] ? vdev_queue_io_done+0x195/0x1e0 [zfs]
[   48.265690]  [<ffffffffa0748738>] zio_vdev_io_done+0x188/0x1c0 [zfs]
[   48.267989]  [<ffffffffa074b5b8>] zio_execute+0xa8/0x110 [zfs]
[   48.270179]  [<ffffffffa065c86f>] taskq_thread+0x25f/0x550 [spl]
[   48.272329]  [<ffffffff810a3620>] ? try_to_wake_up+0x210/0x210
[   48.274452]  [<ffffffffa065c610>] ? taskq_dispatch+0x210/0x210 [spl]
[   48.276550]  [<ffffffff810934c9>] kthread+0xc9/0xe0
[   48.278632]  [<ffffffff81093400>] ? flush_kthread_worker+0xb0/0xb0
[   48.280716]  [<ffffffff8177b5a8>] ret_from_fork+0x58/0x90
[   48.282785]  [<ffffffff81093400>] ? flush_kthread_worker+0xb0/0xb0

[ 1756.397397] BUG: unable to handle kernel paging request at ffffc90020556000
[ 1756.403139] IP: [<ffffffff8138fa4d>] memcpy+0xd/0x110
[ 1756.408768] PGD 217019067 PUD 21701a067 PMD 205d76067 PTE 0
[ 1756.414362] Oops: 0000 [#1] SMP 
[ 1756.419834] Modules linked in: ipmi_watchdog ipmi_devintf ipmi_si ctr ccm sit ip_tunnel tunnel4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_TCPMSS ip6table_mangle ip6_tables xt_pkttype xt_recent ipt_REJECT xt_LOG xt_limit xt_state iptable_filter ipt_MASQUERADE xt_nat zfs(POX) xt_tcpudp zavl(POX) iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw zcommon(POX) ip_tables x_tables znvpair(POX) icp(POX) spl(OX) zunicode(POX) kvm_amd kvm joydev arc4 k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 mac_hid ath9k_htc mac80211 ath9k_common ath9k_hw ath shpchp cfg80211 8021q mrp garp stp llc lp parport pata_acpi hid_generic tg3 usbhid ptp pps_core pata_atiixp hid mvsas libsas ahci scsi_transport_sas libahci ast ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea
[ 1756.458972] CPU: 0 PID: 2373 Comm: z_wr_int_3 Tainted: P           OX 3.13.0-110-generic #157~precise1-Ubuntu
[ 1756.464486] Hardware name: HP ProLiant MicroServer, BIOS O41     07/29/2011
[ 1756.469951] task: ffff8802117de000 ti: ffff8800dd24c000 task.ti: ffff8800dd24c000
[ 1756.475359] RIP: 0010:[<ffffffff8138fa4d>]  [<ffffffff8138fa4d>] memcpy+0xd/0x110
[ 1756.480733] RSP: 0018:ffff8800dd24dbe0  EFLAGS: 00010246
[ 1756.486023] RAX: ffffc90016bbc000 RBX: 0000000000008000 RCX: 0000000000001000
[ 1756.491278] RDX: 0000000000000000 RSI: ffffc90020556000 RDI: ffffc90016bbc000
[ 1756.496461] RBP: ffff8800dd24dbe8 R08: 0000000000008000 R09: ffffffffa097c0f0
[ 1756.501584] R10: 0000000000000bf4 R11: ffff88020eea55b0 R12: 0000000000008000
[ 1756.506647] R13: ffffffffa097c0f0 R14: 0000000000000000 R15: 0000000000000000
[ 1756.511643] FS:  00007fa5742ff700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[ 1756.516596] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1756.521470] CR2: ffffc90020556000 CR3: 00000000d8131000 CR4: 00000000000007f0
[ 1756.526327] Stack:
[ 1756.531070]  ffffffffa097c0fe ffff8800dd24dca8 ffffffffa097d697 ffff8800df51dd40
[ 1756.535845]  000000000001f000 0000000000000000 ffff880212100e28 ffffc90016bbc000
[ 1756.540554]  0000000000080000 ffff8800c3a78078 0000000000000000 0000000000000000
[ 1756.545195] Call Trace:
[ 1756.549906]  [<ffffffffa097c0fe>] ? abd_copy_off_cb+0xe/0x20 [zfs]
[ 1756.554579]  [<ffffffffa097d697>] abd_iterate_func2+0x187/0x210 [zfs]
[ 1756.559172]  [<ffffffffa097d741>] abd_copy_off+0x21/0x30 [zfs]
[ 1756.563702]  [<ffffffffa09f2cb1>] vdev_cache_write+0x151/0x1c0 [zfs]
[ 1756.568151]  [<ffffffffa09f84c5>] ? vdev_queue_io_done+0x195/0x1e0 [zfs]
[ 1756.572536]  [<ffffffffa0a39738>] zio_vdev_io_done+0x188/0x1c0 [zfs]
[ 1756.576852]  [<ffffffffa0a3c5b8>] zio_execute+0xa8/0x110 [zfs]
[ 1756.580956]  [<ffffffffa043386f>] taskq_thread+0x25f/0x550 [spl]
[ 1756.584963]  [<ffffffff810a3620>] ? try_to_wake_up+0x210/0x210
[ 1756.588923]  [<ffffffffa0433610>] ? taskq_dispatch+0x210/0x210 [spl]
[ 1756.592773]  [<ffffffff810934c9>] kthread+0xc9/0xe0
[ 1756.596534]  [<ffffffff81093400>] ? flush_kthread_worker+0xb0/0xb0
[ 1756.600220]  [<ffffffff8177b5a8>] ret_from_fork+0x58/0x90
[ 1756.603825]  [<ffffffff81093400>] ? flush_kthread_worker+0xb0/0xb0
[ 1756.607349] Code: 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 
[ 1756.614706] RIP  [<ffffffff8138fa4d>] memcpy+0xd/0x110
[ 1756.618175]  RSP <ffff8800dd24dbe0>
[ 1756.621542] CR2: ffffc90020556000
[ 1756.649915] ---[ end trace f2d058bbce92f97a ]---
@SenH SenH changed the title 0.7.0-rc3 kernel lockups 0.7.0-rc3 kernel lockups when running bonnie++ benchmark Mar 27, 2017
@behlendorf behlendorf added this to the 0.7.0 milestone Mar 27, 2017
@behlendorf behlendorf added the Type: Regression Indicates a functional regression label Mar 27, 2017
@behlendorf
Copy link
Contributor

@SenH thanks for reporting this issue, can the problem we consistently reproduced with bonnie++. @tuxoko @ironMann we're going to need to resolve this issue before making an 0.7 tag.

@tuxoko
Copy link
Contributor

tuxoko commented Mar 27, 2017

The offset in vdev_cache_write seems wrong

-                       bcopy((char *)zio->io_data + start - io_start,
-                           ve->ve_data + start - ve->ve_offset, end - start);
+                       abd_copy_off(ve->ve_abd, zio->io_abd, start - io_start,
+                           start - ve->ve_offset, end - start);

@behlendorf
Copy link
Contributor

@SenH this has been fixed in master.

behlendorf pushed a commit to LLNL/zfs that referenced this issue Mar 28, 2017
The offset arguments is wrong when changing to abd_copy_off in a6255b7

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Gvozden Neskovic <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5932 
Closes openzfs#5936
@SenH
Copy link
Contributor Author

SenH commented Mar 29, 2017

Thanks for looking into it and fixing this so fast. Several runs of bonnie++ seem to complete without my kernel locking up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Regression Indicates a functional regression
Projects
None yet
Development

No branches or pull requests

3 participants