Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 4.1 oops with loop devices on ZFS #3511

Closed
l1k opened this issue Jun 20, 2015 · 9 comments
Closed

Linux 4.1 oops with loop devices on ZFS #3511

l1k opened this issue Jun 20, 2015 · 9 comments
Milestone

Comments

@l1k
Copy link
Contributor

l1k commented Jun 20, 2015

Mounting a DVD image located on a ZFS dataset revealed a regression with 4.1-rc7. It used to work with 4.0 and it does work with 4.1 if the loop device is located on a plain FAT partition and not on a zpool. The zpool is layered above dm-crypt but I believe that's irrelevant.

To replicate, dd if=/dev/zero of=testfs bs=1M count=32 && losetup /dev/loop0 testfs, then wait briefly to get the following hard lockup caused by workqueue item cache_reap:

[  136.618967] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  136.620647] IP: [<ffffffff811aeb13>] free_block+0xd3/0x1a0
[  136.622303] PGD 0 
[  136.623909] Oops: 0002 [#1] SMP 
[  136.625512] Modules linked in: netconsole configfs efivarfs snd_hda_codec_hdmi b43 mac80211 cfg80211 uvcvideo videobuf2_vmalloc snd_hda_codec_cirrus ssb videobuf2_memops x86_pkg_temp_thermal btusb nls_utf8 intel_powerclamp videobuf2_core rng_core btbcm intel_rapl v4l2_common snd_hda_codec_generic nls_cp437 joydev btintel tg3 pcmcia iosf_mbi pcmcia_core coretemp iTCO_wdt ptp vfat iTCO_vendor_support bluetooth applesmc efi_pstore kvm_intel videodev input_polldev rfkill snd_hda_intel pps_core bcm5974 fat snd_hda_controller crc16 media libphy snd_hda_codec sdhci_pci hid_appleir kvm snd_hda_core pcspkr sdhci efivars evdev mmc_core snd_hwdep sg i2c_i801 lpc_ich mfd_core bcma mei_me snd_pcm sbs battery snd_timer mei sbshc video snd apple_bl xhci_pci ac button xhci_hcd soundcore shpchp processor thermal_sys loop zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) dm_crypt blowfish_generic blowfish_x86_64 blowfish_common ecb des_generic cast5_avx_x86_64 cast5_generic cast_common cbc twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg hid_apple hid_generic usbhid hid sr_mod cdrom sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci firewire_ohci libahci aesni_intel ehci_pci aes_x86_64 lrw gf128mul ehci_hcd glue_helper libata ablk_helper cryptd scsi_mod firewire_core crc_itu_t usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod autofs4
[  136.645759] CPU: 3 PID: 157 Comm: kworker/3:2 Tainted: P           O    4.1.0-1-amd64 #1 Debian 4.1-rc7-1~exp1
[  136.648007] Hardware name: Apple Inc. MacBookPro9,1/Mac-4B7AC7E43945597E, BIOS MBP91.88Z.00D3.B08.1208081132 08/08/2012
[  136.650307] Workqueue: events cache_reap
[  136.652590] task: ffff8804564f8350 ti: ffff880456f58000 task.ti: ffff880456f58000
[  136.654892] RIP: 0010:[<ffffffff811aeb13>]  [<ffffffff811aeb13>] free_block+0xd3/0x1a0
[  136.657241] RSP: 0018:ffff880456f5bd28  EFLAGS: 00010046
[  136.659580] RAX: ffff8804559c8a00 RBX: 000077ff80000000 RCX: 0000000000000000
[  136.661945] RDX: ffffea0011567200 RSI: ffff88046e2dd998 RDI: ffff88045d8001c0
[  136.664315] RBP: ffff88046e2dda30 R08: ffff880456f5bd68 R09: ffff88045d8011c0
[  136.666684] R10: 0000000080000000 R11: ffffea0000000000 R12: ffff88045d8011c8
[  136.669056] R13: ffff88045d8011e8 R14: ffffea001169c020 R15: ffff88046e2dd970
[  136.671430] FS:  0000000000000000(0000) GS:ffff88046e2c0000(0000) knlGS:0000000000000000
[  136.673819] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  136.676229] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000001407e0
[  136.678676] Stack:
[  136.681100]  ffff88046e2dd960 ffff88045d8011c0 0000000000000018 0000000000000000
[  136.683579]  ffff88045d8001c0 ffffffff811afb27 0000000000000000 0000001800000000
[  136.686073]  ffff880456f5bd68 ffff880456f5bd68 ffff880400000000 ffff88045d8011c0
[  136.688595] Call Trace:
[  136.691091]  [<ffffffff811afb27>] ? drain_array+0xa7/0x140
[  136.693605]  [<ffffffff811b020c>] ? cache_reap+0x6c/0x260
[  136.696128]  [<ffffffff81089832>] ? process_one_work+0x152/0x440
[  136.698670]  [<ffffffff8108a3bb>] ? worker_thread+0x6b/0x570
[  136.701206]  [<ffffffff8108a350>] ? rescuer_thread+0x3b0/0x3b0
[  136.703739]  [<ffffffff8108f863>] ? kthread+0xd3/0xf0
[  136.706250]  [<ffffffff8108f790>] ? kthread_create_on_node+0x180/0x180
[  136.708758]  [<ffffffff8157c7f2>] ? ret_from_fork+0x42/0x70
[  136.711253]  [<ffffffff8108f790>] ? kthread_create_on_node+0x180/0x180
[  136.713753] Code: d1 48 c1 e9 0c 48 c1 e1 06 4c 01 d9 4c 8b 31 48 89 ca 41 f7 c6 00 80 00 00 0f 85 c9 00 00 00 4c 8b 72 20 48 8b 4a 28 49 89 4e 08 <4c> 89 31 48 b9 00 01 10 00 00 00 ad de 48 89 4a 20 48 2b 42 08 
[  136.720887] RIP  [<ffffffff811aeb13>] free_block+0xd3/0x1a0
[  136.723601]  RSP <ffff880456f5bd28>
[  136.726286] CR2: 0000000000000000
[  136.728954] ---[ end trace d51d9b17c30b0751 ]---

Alternatively, issue mkfs -t ext2 /dev/loop0 immediately after losetup to get a hard lockup at kernfs_fop_write+0xaa caused by systemd-udevd, plus another one from systemd-udevd, plus another one in dsl_dir_tempreserve_clear+0xfd (the system then continues spewing out oopses every few seconds but console switching is no longer possible, as is writing to the filesystem so I had to grab this with netconsole):

[  101.437956] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  101.439601] IP: [<ffffffff811afc3d>] kfree+0x7d/0x310
[  101.441372] PGD 0 
[  101.442984] Oops: 0000 [#1] SMP 
[  101.444612] Modules linked in: netconsole configfs snd_hda_codec_hdmi b43 mac80211 cfg80211 ssb rng_core pcmcia x86_pkg_temp_thermal intel_powerclamp pcmcia_core joydev intel_rapl uvcvideo snd_hda_codec_cirrus btusb iTCO_wdt iosf_mbi videobuf2_vmalloc snd_hda_codec_generic videobuf2_memops btbcm iTCO_vendor_support videobuf2_core sdhci_pci coretemp btintel v4l2_common tg3 nls_utf8 nls_cp437 snd_hda_intel ptp videodev snd_hda_controller applesmc vfat kvm_intel sdhci efi_pstore snd_hda_codec bluetooth input_polldev fat pps_core i2c_i801 kvm media libphy mmc_core bcm5974 rfkill pcspkr crc16 hid_appleir evdev lpc_ich snd_hda_core efivars sg snd_hwdep bcma mfd_core snd_pcm battery sbs xhci_pci snd_timer sbshc xhci_hcd apple_bl snd mei_me video ac button mei soundcore shpchp processor thermal_sys loop zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) dm_crypt blowfish_generic blowfish_x86_64 blowfish_common ecb des_generic cast5_avx_x86_64 cast5_generic cast_common cbc twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg hid_apple hid_generic usbhid hid sr_mod cdrom sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci ehci_pci libahci aesni_intel aes_x86_64 lrw gf128mul ehci_hcd glue_helper libata ablk_helper firewire_ohci cryptd scsi_mod firewire_core crc_itu_t usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod autofs4
[  101.462904] CPU: 0 PID: 1119 Comm: systemd-udevd Tainted: P           O    4.1.0-1-amd64 #1 Debian 4.1-rc7-1~exp1
[  101.465168] Hardware name: Apple Inc. MacBookPro9,1/Mac-4B7AC7E43945597E, BIOS MBP91.88Z.00D3.B08.1208081132 08/08/2012
[  101.467449] task: ffff880452316d20 ti: ffff880457e6c000 task.ti: ffff880457e6c000
[  101.469767] RIP: 0010:[<ffffffff811afc3d>]  [<ffffffff811afc3d>] kfree+0x7d/0x310
[  101.472091] RSP: 0018:ffff880457e6fd78  EFLAGS: 00010046
[  101.474532] RAX: ffffea00022df480 RBX: ffff88008b7d2300 RCX: 0000000000000000
[  101.476859] RDX: ffff88010b7d2300 RSI: 0000000000000000 RDI: ffff88008b7d2300
[  101.479232] RBP: 0000000000000286 R08: 0000000000000000 R09: 0000000000001f1e
[  101.481623] R10: 0000000000000246 R11: 0000000000000000 R12: ffffffff81242b2a
[  101.483980] R13: ffff880455dbf858 R14: 0000000000000000 R15: ffff880457e6ff20
[  101.486353] FS:  00007f53b86bd880(0000) GS:ffff88046e200000(0000) knlGS:0000000000000000
[  101.488753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  101.491201] CR2: 0000000000000000 CR3: 000000045890c000 CR4: 00000000001407f0
[  101.493667] Stack:
[  101.496108]  ffff88045ba2d400 00000000468822f8 0000000000000000 0000000000000007
[  101.498775]  ffff880452290470 0000000000000007 ffff880455dbf858 fffffffffffffff2
[  101.501335]  ffff880457e6ff20 ffff880455dbf840 ffff88008b7d2300 0000000000000007
[  101.503876] Call Trace:
[  101.506415]  [<ffffffff81242b2a>] ? kernfs_fop_write+0xaa/0x180
[  101.508939]  [<ffffffff811cb7f3>] ? __vfs_write+0x23/0xf0
[  101.511468]  [<ffffffff811ce2c2>] ? __sb_start_write+0x42/0xf0
[  101.513984]  [<ffffffff81261581>] ? security_file_permission+0x21/0xa0
[  101.516485]  [<ffffffff811cbef4>] ? vfs_write+0xa4/0x1b0
[  101.518992]  [<ffffffff811ccc62>] ? SyS_write+0x42/0xb0
[  101.521485]  [<ffffffff8157c3b2>] ? system_call_fast_compare_end+0xc/0x6b
[  101.523957] Code: ff ff 48 01 da 48 0f 42 05 f1 43 66 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 01 c8 48 8b 08 80 e5 80 0f 85 cf 01 00 00 4c 8b 70 30 <4d> 8b 3e 65 4c 03 3d c0 a5 e5 7e 83 3d 15 6d 75 00 01 7e 42 48 
[  101.529781] RIP  [<ffffffff811afc3d>] kfree+0x7d/0x310
[  101.532511]  RSP <ffff880457e6fd78>
[  101.535232] CR2: 0000000000000000
[  101.537940] ---[ end trace 5d5dbedc45330379 ]---
[  102.108389] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  102.109601] IP: [<ffffffff811afc3d>] kfree+0x7d/0x310
[  102.110814] PGD 0 
[  102.110959] systemd-udevd[1608]: starting version 215
[  102.113221] Oops: 0000 [#2] SMP 
[  102.114407] Modules linked in: netconsole configfs snd_hda_codec_hdmi b43 mac80211 cfg80211 ssb rng_core pcmcia x86_pkg_temp_thermal intel_powerclamp pcmcia_core joydev intel_rapl uvcvideo snd_hda_codec_cirrus btusb iTCO_wdt iosf_mbi videobuf2_vmalloc snd_hda_codec_generic videobuf2_memops btbcm iTCO_vendor_support videobuf2_core sdhci_pci coretemp btintel v4l2_common tg3 nls_utf8 nls_cp437 snd_hda_intel ptp videodev snd_hda_controller applesmc vfat kvm_intel sdhci efi_pstore snd_hda_codec bluetooth input_polldev fat pps_core i2c_i801 kvm media libphy mmc_core bcm5974 rfkill pcspkr crc16 hid_appleir evdev lpc_ich snd_hda_core efivars sg snd_hwdep bcma mfd_core snd_pcm battery sbs xhci_pci snd_timer sbshc xhci_hcd apple_bl snd mei_me video ac button mei soundcore shpchp processor thermal_sys loop zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) dm_crypt blowfish_generic blowfish_x86_64 blowfish_common ecb des_generic cast5_avx_x86_64 cast5_generic cast_common cbc twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg hid_apple hid_generic usbhid hid sr_mod cdrom sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci ehci_pci libahci aesni_intel aes_x86_64 lrw gf128mul ehci_hcd glue_helper libata ablk_helper firewire_ohci cryptd scsi_mod firewire_core crc_itu_t usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod autofs4
[  102.141880] CPU: 0 PID: 1606 Comm: systemd-udevd Tainted: P      D    O    4.1.0-1-amd64 #1 Debian 4.1-rc7-1~exp1
[  102.145481] Hardware name: Apple Inc. MacBookPro9,1/Mac-4B7AC7E43945597E, BIOS MBP91.88Z.00D3.B08.1208081132 08/08/2012
[  102.149103] task: ffff88007d052290 ti: ffff8804469d0000 task.ti: ffff8804469d0000
[  102.152742] RIP: 0010:[<ffffffff811afc3d>]  [<ffffffff811afc3d>] kfree+0x7d/0x310
[  102.156407] RSP: 0018:ffff8804469d3bf8  EFLAGS: 00010046
[  102.160101] RAX: ffffea00022df480 RBX: ffff88008b7d24a0 RCX: 0000000000000000
[  102.163797] RDX: ffff88010b7d24a0 RSI: ffff8804590d8600 RDI: ffff88008b7d24a0
[  102.167505] RBP: 0000000000000286 R08: 0000000000000000 R09: 000000000000000a
[  102.171210] R10: 0000000000000246 R11: 0000000000000001 R12: ffffffff812124d5
[  102.174904] R13: ffff88045bea5060 R14: 0000000000000000 R15: ffff88045b0ae4c8
[  102.178562] FS:  0000000000000000(0000) GS:ffff88046e200000(0000) knlGS:0000000000000000
[  102.182243] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  102.185961] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000001407f0
[  102.189632] Stack:
[  102.193204]  ffff880455d83e80 ffff8804590d86d8 ffff88045b0ae4c8 ffff88008c4d9f40
[  102.196762]  ffff880455d83e80 ffff8804590d8630 ffff880455d83e80 ffffffff81210343
[  102.200223]  ffff8804590d8680 ffff8804590d8600 0000000000000008 ffff88045b0ae4c8
[  102.203608] Call Trace:
[  102.206911]  [<ffffffff81210343>] ? ep_remove+0xa3/0xc0
[  102.210153]  [<ffffffff812124d5>] ? signalfd_release+0x15/0x20
[  102.213311]  [<ffffffff811cd7fc>] ? __fput+0xcc/0x1e0
[  102.216413]  [<ffffffff8108df64>] ? task_work_run+0xd4/0xf0
[  102.219498]  [<ffffffff81073abc>] ? do_exit+0x37c/0xa90
[  102.222550]  [<ffffffff811e74c1>] ? generic_update_time+0x71/0xd0
[  102.225565]  [<ffffffff811ce23d>] ? __sb_end_write+0x2d/0x70
[  102.228562]  [<ffffffff81074249>] ? do_group_exit+0x39/0xb0
[  102.231543]  [<ffffffff8107f89d>] ? get_signal+0x2bd/0x7d0
[  102.234532]  [<ffffffff810134e3>] ? do_signal+0x23/0xb30
[  102.237518]  [<ffffffff811cbe2c>] ? vfs_read+0x10c/0x130
[  102.240485]  [<ffffffff81014068>] ? do_notify_resume+0x78/0x90
[  102.243469]  [<ffffffff8157c584>] ? int_signal+0x12/0x17
[  102.246532] Code: ff ff 48 01 da 48 0f 42 05 f1 43 66 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 01 c8 48 8b 08 80 e5 80 0f 85 cf 01 00 00 4c 8b 70 30 <4d> 8b 3e 65 4c 03 3d c0 a5 e5 7e 83 3d 15 6d 75 00 01 7e 42 48 
[  102.253256] RIP  [<ffffffff811afc3d>] kfree+0x7d/0x310
[  102.256441]  RSP <ffff8804469d3bf8>
[  102.259619] CR2: 0000000000000000
[  102.262784] ---[ end trace 5d5dbedc4533037a ]---
[  102.262789] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  102.262797] IP: [<ffffffff811afc3d>] kfree+0x7d/0x310
[  102.262800] PGD 449695067 PUD 449692067 PMD 0 
[  102.262802] Oops: 0000 [#3] SMP 
[  102.262844] Modules linked in: netconsole configfs snd_hda_codec_hdmi b43 mac80211 cfg80211 ssb rng_core pcmcia x86_pkg_temp_thermal intel_powerclamp pcmcia_core joydev intel_rapl uvcvideo snd_hda_codec_cirrus btusb iTCO_wdt iosf_mbi videobuf2_vmalloc snd_hda_codec_generic videobuf2_memops btbcm iTCO_vendor_support videobuf2_core sdhci_pci coretemp btintel v4l2_common tg3 nls_utf8 nls_cp437 snd_hda_intel ptp videodev snd_hda_controller applesmc vfat kvm_intel sdhci efi_pstore snd_hda_codec bluetooth input_polldev fat pps_core i2c_i801 kvm media libphy mmc_core bcm5974 rfkill pcspkr crc16 hid_appleir evdev lpc_ich snd_hda_core efivars sg snd_hwdep bcma mfd_core snd_pcm battery sbs xhci_pci snd_timer sbshc xhci_hcd apple_bl snd mei_me video ac button mei soundcore shpchp processor thermal_sys loop zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) dm_crypt blowfish_generic blowfish_x86_64 blowfish_common ecb des_generic cast5_avx_x86_64 cast5_generic cast_common cbc twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg hid_apple hid_generic usbhid hid sr_mod cdrom sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci ehci_pci libahci aesni_intel aes_x86_64 lrw gf128mul ehci_hcd glue_helper libata ablk_helper firewire_ohci cryptd scsi_mod firewire_core crc_itu_t usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod autofs4
[  102.262879] CPU: 5 PID: 1514 Comm: rs:main Q:Reg Tainted: P      D    O    4.1.0-1-amd64 #1 Debian 4.1-rc7-1~exp1
[  102.262881] Hardware name: Apple Inc. MacBookPro9,1/Mac-4B7AC7E43945597E, BIOS MBP91.88Z.00D3.B08.1208081132 08/08/2012
[  102.262882] task: ffff8804522cec20 ti: ffff88044894c000 task.ti: ffff88044894c000
[  102.262886] RIP: 0010:[<ffffffff811afc3d>]  [<ffffffff811afc3d>] kfree+0x7d/0x310
[  102.262888] RSP: 0018:ffff88044894faf8  EFLAGS: 00010046
[  102.262889] RAX: ffffea00022df480 RBX: ffff88008b7d2820 RCX: 0000000000000000
[  102.262890] RDX: ffff88010b7d2820 RSI: 0000000000000020 RDI: ffff88008b7d2820
[  102.262891] RBP: 0000000000000282 R08: 0000000000001130 R09: ffff88044da735d8
[  102.262892] R10: 0000000000000202 R11: 0000000000000000 R12: ffffffffa037f9ed
[  102.262893] R13: ffff880447f15630 R14: 0000000000000000 R15: ffff880449b7abc0
[  102.262895] FS:  00007f1ca3f6a700(0000) GS:ffff88046e340000(0000) knlGS:0000000000000000
[  102.262896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  102.262898] CR2: 0000000000000000 CR3: 0000000449694000 CR4: 00000000001407e0
[  102.262899] Stack:
[  102.262901]  ffff880447f155c8 000000000009e010 ffff880455d1fdc0 00000000000000c0
[  102.262904]  0000000000000009 0000000000000286 0000000000000000 0000000005401992
[  102.262906]  ffff8804496ab6a0 ffff88008b7d2820 ffff880447f15620 0000000000000010
[  102.262906] Call Trace:
[  102.262940]  [<ffffffffa037f9ed>] ? dsl_dir_tempreserve_clear+0xfd/0x160 [zfs]
[  102.262963]  [<ffffffffa0369e7f>] ? dmu_tx_commit+0xff/0x1d0 [zfs]
[  102.262988]  [<ffffffffa03e5e24>] ? zfs_write+0x5c4/0xb80 [zfs]
[  102.262992]  [<ffffffff81577660>] ? __schedule+0x2a0/0x920
[  102.262998]  [<ffffffff810e733e>] ? futex_wait+0x23e/0x270
[  102.263019]  [<ffffffffa03fc7f8>] ? zpl_write+0xb8/0x130 [zfs]
[  102.263023]  [<ffffffff811cb7f3>] ? __vfs_write+0x23/0xf0
[  102.263026]  [<ffffffff811ce2c2>] ? __sb_start_write+0x42/0xf0
[  102.263031]  [<ffffffff81261581>] ? security_file_permission+0x21/0xa0
[  102.263034]  [<ffffffff811cbef4>] ? vfs_write+0xa4/0x1b0
[  102.263037]  [<ffffffff81579f4e>] ? mutex_lock+0xe/0x30
[  102.263040]  [<ffffffff811ccc62>] ? SyS_write+0x42/0xb0
[  102.263044]  [<ffffffff8157c3b2>] ? system_call_fast_compare_end+0xc/0x6b
[  102.263067] Code: ff ff 48 01 da 48 0f 42 05 f1 43 66 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 01 c8 48 8b 08 80 e5 80 0f 85 cf 01 00 00 4c 8b 70 30 <4d> 8b 3e 65 4c 03 3d c0 a5 e5 7e 83 3d 15 6d 75 00 01 7e 42 48 
[  102.263071] RIP  [<ffffffff811afc3d>] kfree+0x7d/0x310
[  102.263071]  RSP <ffff88044894faf8>
[  102.263072] CR2: 0000000000000000
[  102.263074] ---[ end trace 5d5dbedc4533037b ]---

Where kernfs_fop_write+0xaa looks like this:

0xffffffff81242b2a is in kernfs_fop_write (/root/airlied/fs/kernfs/file.c:326).
321             mutex_unlock(&of->mutex);
322     out_free:
323             if (buf != of->prealloc_buf)
324                     kfree(buf);
325             return len;
326     }
327     
328     static void kernfs_vma_open(struct vm_area_struct *vma)
329     {
330             struct file *file = vma->vm_file;

And dsl_dir_tempreserve_clear+0xfd looks like this (it's the invocation of list_head() in the while loop condition of dsl_dir_tempreserve_clear(), this calls list_empty()) :

0x1d0d is in dsl_dir_tempreserve_clear (/usr/src/linux-headers-4.1.0-1-common/include/linux/list.h:189).
184      * list_empty - tests whether a list is empty
185      * @head: the list to test.
186      */
187     static inline int list_empty(const struct list_head *head)
188     {
189             return head->next == head;
190     }
191     
192     /**
193      * list_empty_careful - tests whether a list is empty and not being modified
@FransUrbo
Copy link
Contributor

I've seen this in the test suite as well, but never had the time to investigate it.

@behlendorf
Copy link
Contributor

@l1k thanks for the detailed issue report and particularly the reproducer. Once someone has time to look carefully in to this that should make it much easier to determine what's wrong. I briefly looked at the stacks and on the surface there's nothing special going on here. Just normal IO, and very little of the stack actually implicates ZFS. Still it needs to be explained.

@anthonyde
Copy link

This merge from 4.1-rc6 may be relevant: torvalds/linux@0f1e5b5

@DorianGray
Copy link

I think I found other ways to reproduce this issue... while using docker 1.7 on zfs root I get an oops if I use devmapper or aufs as the backing store, but not zfs or vfs. It's also instantly reproduceable, start the binary and the kernel panics. Kernel 4.1.3.

@l1k
Copy link
Contributor Author

l1k commented Jul 26, 2015

Bisected to torvalds/linux@aa4d861 ("block: loop: switch to VFS ITER_BVEC"). Unfortunately the commit message is somewhat terse but the commit changes loop.c to expect the requests enqueued by blk_mq to contain bvec instead of kvec structures.

The commit also refactors the code for transforming the data read/written to a loop device, this code is used for encrypted loop devices. However, the loop devices I've tested this with are unencrypted, i.e. they use the transfer_none transformation. (That transformation is eliminated by the commit to make the code a bit simpler.) I believe that the refactored code is not the culprit here but rather the transition from kvec to bvec.

So, the commit looks sane, it does work if the file backing the loop device is located on a FAT partition instead of a ZFS dataset, reverting the commit fixes the issue.

Hm, are requests backed by kvec if the file backing the loop device is located on a ZFS dataset, and by bvec otherwise? Why?

@behlendorf
Copy link
Contributor

@l1k nice job bisecting this to the offending commit. I had a quick look and one thing I noticed is that this patch modifies the code to do I/O via .iter_write() instead of .write(). Now that should work fine but I notice in the stack traces that zpl_write() was called instead of zpl_iter_write() which isn't what I'd expect. That might be a good place to start investigating this.

@dweeezil
Copy link
Contributor

I'll note that the terse commit comment for torvalds/linux@aa4d861 is due to it being part of a large merge commit (torvalds/linux@4fc8adcf).

There have also been a bunch of other wads of VFS changes which may bear some looking at (in reverse chronological order):
torvalds/linux@1dc51b82 (post-4.1)
torvalds/linux@052b398a (post-4.1)
torvalds/linux@9ec3a646f (in 4.1)
torvalds/linux@4fc8adcf (in 4.1 - patch mentioned above)
torvalds/linux@fa927894 (in 4.1 - looks like it doesn't change anything which matters to ZoL)
torvalds/linux@ca2ec326 (in 4.1)

I think it would be worth skimming the changes in each of these commits to see whether they've got anything which would affect ZoL.

In fact, I'll note that XFS was most of these (at least the first 4 in the list). In my experience, that's a good place to start for whomever might look into this.

In summary, it looks like we've got some more 4.1 compatibility stuff to deal with.

@kernelOfTruth
Copy link
Contributor

I'm only occasionally using loop devices but running into a BUG or even lock would be a real show-stopper, so I took a look at the mailing list:

Maybe XFS & loop related upstream thread:

http://marc.info/?l=linux-mm&m=143745156221454&w=2 [[regression 4.2-rc3] loop: xfstests xfs/073 deadlocked in low memory conditions]

The problem merely seems to be triggered by

block: loop: switch to VFS ITER_BVEC

http://marc.info/?l=linux-mm&m=143745156221454&w=2

The problem, fundamentally, is that mpage_readpages() does a
GFP_KERNEL allocation, rather than paying attention to the inode's
mapping gfp mask, which is set to GFP_NOFS.

That looks the root cause, and I guess the issue is just triggered
after commit aa4d86163e4(block: loop: switch to VFS ITER_BVEC)
which changes splice to bvec iterator.

Potential fix: http://marc.info/?l=linux-kernel&m=143746915525411&w=2
perhaps ZFS also would need a fix ?

behlendorf pushed a commit to openzfs/spl that referenced this issue Aug 24, 2015
Starting from Linux 4.1, bio_vec will be allowed to pass into filesystem via
iter_read/iter_write, so we add a bio_vec field in uio_t to hold it, and use
UIO_BVEC in segflg to determine which "vec".

Also, to be consistent to newer kernel, we make iovec and bio_vec immutable,
and make uio act as an iterator with the new uio_skip field indicating number
of bytes to skip in the first segment.

Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs/zfs#3511
Issue openzfs/zfs#3640
Closes #468
tomgarcia pushed a commit to tomgarcia/zfs that referenced this issue Aug 25, 2015
Starting from Linux 4.1 allows iov_iter with bio_vec to be passed into
iter_read/iter_write. Notably, the loop device will pass bio_vec to backend
filesystem. However, current ZFS code assumes iovec without any check, so it
will always crash when using loop device.

With the restructured uio_t, we can safely pass bio_vec in uio_t with UIO_BVEC
set. The uio* functions are modified to handle bio_vec case separately.

The const uio_iov causes some warning in xuio related stuff, so explicit
convert them to non const.

Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#3511
Closes openzfs#3640
@cgarwood82
Copy link

Looking forward to this release. I had to downgrade my kernel, zfs-git, zfs-utils-git, and spl-git packages on arch to get my server stable.

I was seeing the same null pointer reference as this bug: #3640

Thanks for the hard work guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants