Broken ZIL (during heavy testing of #6566) #6856

sempervictus · 2017-11-12T07:18:04Z

While trying to diagnose a DVA error seen in ztest on some hosts, i seem to have broken a production pool's intent log. The pool is a raidz consisting of 5 SSDs on dm-crypt - a pretty much universal setup around here. During one of the crashes seen yesterday, something bad happened to the ZIL such that import fails crashing the system spectacularly:

[Sun Nov 12 01:51:55 2017] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [dmu_objset_find:6907]
[Sun Nov 12 01:51:55 2017] Modules linked in: serpent_sse2_x86_64 serpent_generic lrw glue_helper ablk_helper cryptd jc42 nls_iso8859_1 nls_cp437 vfat fat ext4 crc16 jbd2 fscrypto mbcache iTCO_wdt iTCO_vendor_support gpio_ich ppdev ttm drm_kms_helper intel_powerclamp drm coretemp intel_cstate input_leds evdev i2c_algo_bit fb_sys_fops led_class joydev syscopyarea mousedev mac_hid sysfillrect usbmouse usbkbd sysimgblt psmouse uio_pdrv_genirq pcspkr uio i2c_i801 parport_pc i2c_smbus ioatdma i7core_edac parport button edac_core shpchp i5500_temp ipmi_ssif dca lpc_ich acpi_cpufreq sch_fq_codel 8021q garp mrp iscsi_scst scst_local scst dlm dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic kvm_intel kvm irqbypass ipmi_devintf ipmi_si ipmi_msghandler br_netfilter bridge stp llc ip_tables x_tables
[Sun Nov 12 01:51:55 2017]  zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) zlib_deflate xts gf128mul algif_skcipher af_alg dm_crypt dm_mod sd_mod mlx4_en hid_generic usbhid hid uas usb_storage uhci_hcd serio_raw atkbd libps2 crc32c_intel mlx4_core ahci devlink libahci ehci_pci ehci_hcd usbcore e1000e mpt3sas usb_common ptp raid_class scsi_transport_sas pps_core i8042 serio fjes
[Sun Nov 12 01:51:55 2017] CPU: 0 PID: 6907 Comm: dmu_objset_find Tainted: P           OEL  4.9.61 #1
[Sun Nov 12 01:51:55 2017] Hardware name: Intel Thurley/Greencity, BIOS GEMAV200 01/28/2011
[Sun Nov 12 01:51:55 2017] task: ffff8807eb65ee00 task.stack: ffffc9003e910000
[Sun Nov 12 01:51:55 2017] RIP: 0010:[<ffffffffa0626649>]  [<ffffffffa0626649>] zil_claim_log_record+0x19/0x60 [zfs]
[Sun Nov 12 01:51:55 2017] RSP: 0000:ffffc9003e913a38  EFLAGS: 00000246
[Sun Nov 12 01:51:55 2017] RAX: 0000000000000000 RBX: ffffc90036d86efe RCX: 0000000002305788
[Sun Nov 12 01:51:55 2017] RDX: 0000000000000000 RSI: ffffc90036d86efe RDI: ffff880813d56800
[Sun Nov 12 01:51:55 2017] RBP: ffffc9003e913a58 R08: ffff8807e0324b58 R09: 00000001802e001c
[Sun Nov 12 01:51:55 2017] R10: 0100000000000000 R11: 0100000000000000 R12: ffff880813d56800
[Sun Nov 12 01:51:55 2017] R13: 0000000000000000 R14: 0000000002305788 R15: ffffffffffffffff
[Sun Nov 12 01:51:55 2017] FS:  0000000000000000(0000) GS:ffff88081fa00000(0000) knlGS:0000000000000000
[Sun Nov 12 01:51:55 2017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Nov 12 01:51:55 2017] CR2: 00000061b0b41000 CR3: 0000000001d5a000 CR4: 00000000000006f0
[Sun Nov 12 01:51:55 2017] Stack:
[Sun Nov 12 01:51:55 2017]  0000000000000000 ffffffffa0626630 ffffc90036d86efe 00000002540ee182
[Sun Nov 12 01:51:55 2017]  ffffc9003e913c68 ffffffffa06280b8 ffffc90001800190 ffffc9003e913aec
[Sun Nov 12 01:51:55 2017]  ffffc9003e913af8 ffffffffa0626630 ffffc90037d97000 0000000000000000
[Sun Nov 12 01:51:55 2017] Call Trace:
[Sun Nov 12 01:51:55 2017]  [<ffffffffa0626630>] ? zil_claim_log_block+0x90/0x90 [zfs]
[Sun Nov 12 01:51:55 2017]  [<ffffffffa06280b8>] zil_parse+0x688/0x910 [zfs]
[Sun Nov 12 01:51:55 2017]  [<ffffffffa0626630>] ? zil_claim_log_block+0x90/0x90 [zfs]
[Sun Nov 12 01:51:55 2017]  [<ffffffffa06265a0>] ? zil_replaying+0x70/0x70 [zfs]
[Sun Nov 12 01:51:55 2017]  [<ffffffffa0628d91>] zil_check_log_chain+0xe1/0x1a0 [zfs]
[Sun Nov 12 01:51:55 2017]  [<ffffffffa0574e11>] dmu_objset_find_dp_impl+0x151/0x400 [zfs]
[Sun Nov 12 01:51:55 2017]  [<ffffffffa05750e6>] dmu_objset_find_dp_cb+0x26/0x40 [zfs]
[Sun Nov 12 01:51:55 2017]  [<ffffffffa014cca1>] taskq_thread+0x2b1/0x4d0 [spl]
[Sun Nov 12 01:51:55 2017]  [<ffffffff810a0160>] ? wake_up_q+0x90/0x90
[Sun Nov 12 01:51:55 2017]  [<ffffffffa014c9f0>] ? task_done+0xa0/0xa0 [spl]
[Sun Nov 12 01:51:55 2017]  [<ffffffff81095ddd>] kthread+0xfd/0x120
[Sun Nov 12 01:51:55 2017]  [<ffffffff81095ce0>] ? kthread_parkme+0x40/0x40
[Sun Nov 12 01:51:55 2017]  [<ffffffff8180c5e7>] ret_from_fork+0x37/0x50

Tried to import with different revisions of ZFS compatible with the feature flags on that pool (luckily not a crypto pool), all with the same result. The -m and -F flags also give nada. Google searches for ignoring the ZIL lead to SLOG-related issues, though i seem to recall that someone had a patch allowing import dropping the current ZIL state to avoid this sort of mess.

I'm pushing a restore from just before this all started back to another pool in the system, but would be nice to not be missing a couple of days of delta.
@dweeezil, @prakashsurya, @behlendorf: do you folks happen to recall if and where such a ZIL-drop patch would be? Probably something we want to doc and push up in the search results if it exists.

The text was updated successfully, but these errors were encountered:

sempervictus · 2017-11-12T07:33:01Z

Looking at the replay headers in zdb, it looks like they're all at 0:

zdb -eip /dev/ poolname|egrep -v 'ZPL|ZVOL|^$'
Dataset mos [META], ID 0, cr_txg 4, 2.18G, 23914 objects
    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
Verified large_blocks feature refcount of 0 is correct
Verified large_dnode feature refcount of 0 is correct
Verified sha512 feature refcount of 24 is correct
Verified skein feature refcount of 0 is correct
Verified edonr feature refcount of 0 is correct
Verified userobj_accounting feature refcount of 947 is correct

Rather curious to see how this can get tracked down with zdb.

sempervictus · 2017-11-13T10:17:31Z

Adding -hh to the invocation produces a ton of:

  dsname: 'pool-name/lxc/sv-so00/%recv'
  dsid: 3721
  history internal str: 'parent=sv-so00'
  internal_name: 'clone swap'
  history txg: 955
  history time: 1486892356
  history hostname: 'sys-name'

Pulling the TXGs from that output, sorting them, and using either of the last two via zpool import -FX -T 36719543 -N pool-ID results in
cannot import' pool-name': one or more devices is currently unavailable
which is a bit odd as all vdevs are mapped into the recovery VM (as raw volumes, and show as ONLINE by zpool import).
The same command using -FXn works, though "works" is kind of subjective for a noop.

sempervictus · 2017-11-13T10:38:16Z

As i go through and re-add PRs to the master branch to figure out where the errors started, i'm beginning to think that the crypto fmt changes might be related since i've beeen building most of the recent stacks with the datto/crypt_disk_fmt branch pulled in. @tcaputi: am i doing something obscenely dumb pulling off that branch?
The functional testing approach does have drawbacks though - may miss, or have already missed, the relevant commit as the DVA error seems to sometimes be consistently present, and sometimes goes dormant for a bit (all that test on fresh VMs with zloop creating fresh pools).

sempervictus · 2017-11-13T11:19:35Z

Hmm, so somewhat at the end of permissible rope here - have all the data from the pool staged from recovery on adjacent SSD pool, and will need to pull the trigger on running labelclear on the constituent VDEVs sometime this morning to seed a new pool with the restored data unless i can recover.
@sages: What if any data would you like me to pull from this oddly corrupted pool so as to aid in preventing this situation for others down the line?

tcaputi · 2017-11-13T15:30:58Z

@tcaputi: am i doing something obscenely dumb pulling off that branch?

I would be surprised if that branch broke the ZIL just because it doesn't really touch that, but that code is really raw and completely untested. In fact, I have since decided to take a different approach and have created a new branch for that, so it is unlikely that the code from that branch will end up getting merged.

sempervictus · 2017-11-13T19:57:12Z

@tcaputi: as i mentioned in #6566, i think its actually the improved scrub branch that's causing this. The PR has been in approved state for quite a while and just running as part of our test and now prod stacks, but may have developed unhappy interactions in the interrim. Without the now abandoned branch and the scrub improvements stuff i am no longer seeing the source DVA error which started all of this. Have you guys seen any crashes in ztest on that branch?

tcaputi · 2017-11-13T20:33:34Z

We haven't seen any crashes in ztest or production tests in quite some time on that patch (or at least not anything that doesn't occasionally pop up in other PR's). The scrub code doesn't really interact with the ZIL in any new ways so I wwould be very surprised if this caused any issues.

sempervictus · 2017-11-17T20:56:01Z

So i destroyed the pool, restored from an older snap, and after another crash, again, i see:

[Fri Nov 17 20:51:50 2017] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [dmu_objset_find:5344]
[Fri Nov 17 20:51:50 2017] Modules linked in: serpent_sse2_x86_64 serpent_generic lrw glue_helper ablk_helper cryptd jc42 nls_iso8859_1 nls_cp437 vfat fat ext4 crc16 jbd2 fscrypto mbcache iTCO_wdt iTCO_vendor_support gpio_ich ppdev ttm psmouse drm_kms_helper evdev intel_powerclamp pcspkr joydev coretemp mousedev input_leds led_class intel_cstate drm uio_pdrv_genirq uio mac_hid i2c_algo_bit usbkbd fb_sys_fops syscopyarea usbmouse sysfillrect sysimgblt i2c_i801 i5500_temp button i7core_edac edac_core ioatdma i2c_smbus lpc_ich shpchp dca acpi_cpufreq parport_pc parport ipmi_ssif sch_fq_codel 8021q garp mrp iscsi_scst scst_local scst dlm dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic kvm_intel kvm irqbypass ipmi_devintf ipmi_si ipmi_msghandler br_netfilter bridge stp llc ip_tables x_tables
[Fri Nov 17 20:51:50 2017]  zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) zlib_deflate xts gf128mul algif_skcipher af_alg dm_crypt dm_mod sd_mod mlx4_en hid_generic usbhid hid uas usb_storage uhci_hcd serio_raw atkbd libps2 crc32c_intel mlx4_core ahci ehci_pci devlink libahci ehci_hcd mpt3sas usbcore e1000e usb_common raid_class ptp scsi_transport_sas pps_core i8042 serio fjes
[Fri Nov 17 20:51:50 2017] CPU: 11 PID: 5344 Comm: dmu_objset_find Tainted: P           OE   4.9.62 #1
[Fri Nov 17 20:51:50 2017] Hardware name: Intel Thurley/Greencity, BIOS GEMAV200 01/28/2011
[Fri Nov 17 20:51:50 2017] task: ffff8807f68f6e00 task.stack: ffffc9003b104000
[Fri Nov 17 20:51:50 2017] RIP: 0010:[<ffffffffa0693718>]  [<ffffffffa0693718>] zil_parse+0x688/0x910 [zfs]
[Fri Nov 17 20:51:50 2017] RSP: 0000:ffffc9003b107a68  EFLAGS: 00000293
[Fri Nov 17 20:51:50 2017] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000104dc
[Fri Nov 17 20:51:50 2017] RDX: 0000000000000000 RSI: ffffc90030302efe RDI: ffff8807f8cb0000
[Fri Nov 17 20:51:50 2017] RBP: ffffc9003b107c68 R08: ffff8808019e7dc0 R09: 00000001802e0022
[Fri Nov 17 20:51:50 2017] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0692470
[Fri Nov 17 20:51:50 2017] R13: ffffc90030302efe R14: 00000000b5f7e6e5 R15: ffffffffffffffff
[Fri Nov 17 20:51:50 2017] FS:  0000000000000000(0000) GS:ffff88081fbc0000(0000) knlGS:0000000000000000
[Fri Nov 17 20:51:50 2017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Nov 17 20:51:50 2017] CR2: 0000034584004038 CR3: 0000000001d70000 CR4: 00000000000006f0
[Fri Nov 17 20:51:50 2017] Stack:
[Fri Nov 17 20:51:50 2017]  ffffc90001800190 ffffc9003b107aec ffffc9003b107af8 ffffffffa0692470
[Fri Nov 17 20:51:50 2017]  ffffc90031313000 0000000000000000 ffffffffa06923e0 ffff8807f8cb02d0
[Fri Nov 17 20:51:50 2017]  00000000000024d3 000000000000000a ffffffffffffffff ffffc90031332f48
[Fri Nov 17 20:51:50 2017] Call Trace:
[Fri Nov 17 20:51:50 2017]  [<ffffffffa0692470>] ? zil_claim_log_block+0x90/0x90 [zfs]
[Fri Nov 17 20:51:50 2017]  [<ffffffffa06923e0>] ? zil_replaying+0x70/0x70 [zfs]
[Fri Nov 17 20:51:50 2017]  [<ffffffffa0693f91>] zil_check_log_chain+0xe1/0x1a0 [zfs]
[Fri Nov 17 20:51:50 2017]  [<ffffffffa05dcfc1>] dmu_objset_find_dp_impl+0x151/0x400 [zfs]
[Fri Nov 17 20:51:50 2017]  [<ffffffffa05dd296>] dmu_objset_find_dp_cb+0x26/0x40 [zfs]
[Fri Nov 17 20:51:50 2017]  [<ffffffffa0109ca1>] taskq_thread+0x2b1/0x4d0 [spl]
[Fri Nov 17 20:51:50 2017]  [<ffffffff810a01e0>] ? wake_up_q+0x90/0x90
[Fri Nov 17 20:51:50 2017]  [<ffffffffa01099f0>] ? task_done+0xa0/0xa0 [spl]
[Fri Nov 17 20:51:50 2017]  [<ffffffff81095e4d>] kthread+0xfd/0x120
[Fri Nov 17 20:51:50 2017]  [<ffffffff81095d50>] ? kthread_parkme+0x40/0x40
[Fri Nov 17 20:51:50 2017]  [<ffffffff8180c667>] ret_from_fork+0x37/0x50
[Fri Nov 17 20:51:50 2017] Code: 00 00 80 48 0b 85 18 fe ff ff 4d 89 e5 49 89 c4 48 8b 8d 68 fe ff ff 4c 89 ee 48 8b 95 70 fe ff ff 48 8b bd 60 fe ff ff 41 ff d4 <85> c0 75 25 49 8b 4d 18 49 83 c6 01 49 01 dd 48 89 8d 78 fe ff

when i try to import after a reboot. Not great

gmelikov · 2020-08-22T16:50:15Z

Closed as stale.

sempervictus mentioned this issue Nov 18, 2017

OpenZFS 8585 - improve batching done in zil_commit() #6566

Merged

gmelikov closed this as completed Aug 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken ZIL (during heavy testing of #6566) #6856

Broken ZIL (during heavy testing of #6566) #6856

sempervictus commented Nov 12, 2017

sempervictus commented Nov 12, 2017

sempervictus commented Nov 13, 2017 •

edited

Loading

sempervictus commented Nov 13, 2017

sempervictus commented Nov 13, 2017

tcaputi commented Nov 13, 2017

sempervictus commented Nov 13, 2017

tcaputi commented Nov 13, 2017

sempervictus commented Nov 17, 2017

gmelikov commented Aug 22, 2020

Broken ZIL (during heavy testing of #6566) #6856

Broken ZIL (during heavy testing of #6566) #6856

Comments

sempervictus commented Nov 12, 2017

sempervictus commented Nov 12, 2017

sempervictus commented Nov 13, 2017 • edited Loading

sempervictus commented Nov 13, 2017

sempervictus commented Nov 13, 2017

tcaputi commented Nov 13, 2017

sempervictus commented Nov 13, 2017

tcaputi commented Nov 13, 2017

sempervictus commented Nov 17, 2017

gmelikov commented Aug 22, 2020

sempervictus commented Nov 13, 2017 •

edited

Loading