Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null dereference while tyring to zpool remove #16786

Open
pshirshov opened this issue Nov 20, 2024 · 2 comments
Open

Null dereference while tyring to zpool remove #16786

pshirshov opened this issue Nov 20, 2024 · 2 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@pshirshov
Copy link

System information

Type Version/Name
Distribution Name NixOS
Distribution Version 24.11.20241011.a28d979 (Vicuna)
Kernel Version 6.6.56
Architecture x86_64
OpenZFS Version zfs-2.2.6-1 zfs-kmod-2.2.6-1

Describe the problem you're observing

I have the following pool:

❯ zpool status -v
  pool: storage-main
 state: ONLINE
  scan: scrub repaired 0B in 13:09:39 with 0 errors on Fri Nov  1 14:59:55 2024
remove: Removal of vdev 1 copied 8.33T in 12h16m, completed on Wed Nov 20 07:31:18 2024
	22.5M memory used for removed device mappings
config:

	NAME                                   STATE     READ WRITE CKSUM
	storage-main                           ONLINE       0     0     0
	 mirror-0                             ONLINE       0     0     0
	   virtio-ST16T-ZL267DH30000C0-part1  ONLINE       0     0     0
	   virtio-WD16T-3WHSW4HP-part1        ONLINE       0     0     0
	 mirror-5                             ONLINE       0     0     0
	   virtio-ST24-ZYD1GEJA               ONLINE       0     0     0
	   virtio-WDC24-65JWH7WB              ONLINE       0     0     0
	 mirror-6                             ONLINE       0     0     0
	   virtio-ST24-ZYD0EMCT               ONLINE       0     0     0
	   virtio-WDC24-65JWJ9JB              ONLINE       0     0     0
	logs	
	 mirror-4                             ONLINE       0     0     0
	   virtio-WD-2T-21210P800016-part1    ONLINE       0     0     0
	   virtio-WD-2T-21210P800027-part1    ONLINE       0     0     0
	cache
	 virtio-SAMSUNG-256G-S42VNF0-part1    ONLINE       0     0     0

errors: No known data errors

I'm trying to:

 zpool remove storage-main mirror-0

Immediately I'm getting a stacktrace from the kernel:

[  317.434903] BUG: kernel NULL pointer dereference, address: 0000000000000088
[  317.435083] #PF: supervisor read access in kernel mode
[  317.435174] #PF: error_code(0x0000) - not-present page
[  317.435219] PGD 80000001d797e067 P4D 80000001d797e067 PUD 1d7900067 PMD 0
[  317.435268] Oops: 0000 [#1] PREEMPT SMP PTI
[  317.435313] CPU: 1 PID: 5543 Comm: zpool Tainted: P           O       6.6.56 #1-NixOS
[  317.435372] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  317.435435] RIP: 0010:vdev_passivate+0x10c/0x190 [zfs]
[  317.436545] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 fa 74 3a 49 8b 0c d0 48 39 cb 74 ee 48 81 79 60 60 88 a0 c0 74 e4 48 8b b1 98 2b 00 00 <48> 3b 86 88 00 00 00 75 d4 48 83 b9 d0 2c 00 00 00 0f 84 17 ff ff
[  317.436710] RSP: 0018:ffffb5850ae2fd70 EFLAGS: 00010202
[  317.436821] RAX: ffff99955ebdf400 RBX: ffff9995564ac000 RCX: ffff9995563c8000
[  317.436903] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000007
[  317.436952] RBP: ffff99955afc4000 R08: ffff9995456d0500 R09: 0000000000000000
[  317.437005] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5850ae2fdd0
[  317.437052] R13: ffff999558a9e800 R14: ffff9995567ac000 R15: ffff9995d0391880
[  317.437091] FS:  00007fe4a6cc17c0(0000) GS:ffff999c9fa80000(0000) knlGS:0000000000000000
[  317.437142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  317.437173] CR2: 0000000000000088 CR3: 00000001ce6b8001 CR4: 0000000000770ee0
[  317.437223] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  317.437256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  317.437291] PKRU: 55555554
[  317.437310] Call Trace:
[  317.437335]  <TASK>
[  317.437366]  ? __die+0x23/0x80
[  317.437429]  ? page_fault_oops+0x171/0x500
[  317.437471]  ? __schedule+0x404/0x1430
[  317.437525]  ? exc_page_fault+0x71/0x160
[  317.437556]  ? asm_exc_page_fault+0x26/0x30
[  317.437613]  ? vdev_passivate+0x10c/0x190 [zfs]
[  317.438651]  spa_vdev_remove+0x7f6/0x9c0 [zfs]
[  317.439236]  ? spa_open_common+0x27f/0x440 [zfs]
[  317.439837]  zfs_ioc_vdev_remove+0x5b/0xa0 [zfs]
[  317.440408]  zfsdev_ioctl_common+0x878/0x9b0 [zfs]
[  317.440975]  ? kvmalloc_node+0x43/0xe0
[  317.441039]  zfsdev_ioctl+0x53/0xe0 [zfs]
[  317.441565]  __x64_sys_ioctl+0x9c/0xe0
[  317.441625]  do_syscall_64+0x39/0x90
[  317.441684]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[  317.441733] RIP: 0033:0x7fe4a739b79f
[  317.441889] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 28 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  317.442965] RSP: 002b:00007fff7de93ca0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  317.443856] RAX: ffffffffffffffda RBX: 0000000012c96670 RCX: 00007fe4a739b79f
[  317.444743] RDX: 00007fff7de94110 RSI: 0000000000005a0c RDI: 0000000000000003
[  317.445623] RBP: 00007fff7de976f0 R08: 0000000000000000 R09: 0000000000000000
[  317.446500] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000012c991e0
[  317.447377] R13: 00007fff7de94110 R14: 00007fff7de93d10 R15: 0000000012c8f4e0
[  317.448079]  </TASK>
[  317.448601] Modules linked in: xt_MASQUERADE xt_mark nft_chain_nat nf_nat veth af_packet skx_edac_common nfit edac_core libnvdimm cfg80211 cbc encrypted_keys trusted asn1_encoder tee tpm rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common kvm_intel kvm snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi irqbypass crc32_pclmul iTCO_wdt snd_hda_codec polyval_clmulni polyval_generic intel_pmc_bxt gf128mul ghash_clmulni_intel watchdog sha512_ssse3 snd_hda_core sha256_ssse3 sha1_ssse3 aesni_intel snd_hwdep crypto_simd snd_pcm cryptd xt_comment snd_timer rapl i2c_i801 joydev snd i2c_smbus psmouse soundcore lpc_ich mousedev tiny_power_button evdev intel_agp qxl intel_gtt xt_conntrack drm_ttm_helper ttm button input_leds led_class mac_hid nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 serio_raw ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc fuse efi_pstore
[  317.449043]  configfs nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 hid_generic usbhid hid sr_mod cdrom virtio_net virtio_scsi net_failover failover virtio_blk ahci libahci xhci_pci xhci_pci_renesas libata atkbd libps2 xhci_hcd vivaldi_fmap scsi_mod crct10dif_pclmul crct10dif_common scsi_common i8042 virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev rtc_cmos serio dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq zfs(PO) spl(O) virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring
[  317.457518] CR2: 0000000000000088
[  317.458076] ---[ end trace 0000000000000000 ]---
[  317.458527] RIP: 0010:vdev_passivate+0x10c/0x190 [zfs]
[  317.459718] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 fa 74 3a 49 8b 0c d0 48 39 cb 74 ee 48 81 79 60 60 88 a0 c0 74 e4 48 8b b1 98 2b 00 00 <48> 3b 86 88 00 00 00 75 d4 48 83 b9 d0 2c 00 00 00 0f 84 17 ff ff
[  317.460606] RSP: 0018:ffffb5850ae2fd70 EFLAGS: 00010202
[  317.461063] RAX: ffff99955ebdf400 RBX: ffff9995564ac000 RCX: ffff9995563c8000
[  317.461518] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000007
[  317.461977] RBP: ffff99955afc4000 R08: ffff9995456d0500 R09: 0000000000000000
[  317.462450] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5850ae2fdd0
[  317.462914] R13: ffff999558a9e800 R14: ffff9995567ac000 R15: ffff9995d0391880
[  317.463390] FS:  00007fe4a6cc17c0(0000) GS:ffff999c9fa80000(0000) knlGS:0000000000000000
[  317.463865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  317.464347] CR2: 0000000000000088 CR3: 00000001ce6b8001 CR4: 0000000000770ee0
[  317.464836] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  317.465327] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  317.465810] PKRU: 55555554

I don't have a reproducer for this, as this only happens on a live pool. Previous removal of another mirror completed successfully.

@pshirshov pshirshov added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 20, 2024
@pshirshov
Copy link
Author

Potentially relevant: #13552

It seems like

  1. the problem is the presence of the indirect-1 vdev, which are mappings for the previously evacuated mirror.
  2. this is a bug, openzfs tries to treat the indirect vdev as a regular one in
    vdev_passivate(vdev_t *vd, uint64_t *txg)
  3. Once you removed a mirror from a pool, it's over, it won't be possible to remove another one and the pool has to be recreated

@Domini
Copy link

Domini commented Jan 14, 2025

I think I've encountered this or similar issue that may be easier to reproduce due to a simpler pool layout (though, unfortunately, the pool is no longer available, so I cannot provide further logs).

From the get-go it was a simple RAID0 pool of two full disks with mirrored log on separate partitions on two disks (i.e. zpool create tank /dev/disk/by-id/disk1 /dev/disk/by-id/disk2 log mirror /dev/disk/by-id/disk3-part2 /dev/disk/by-id/disk4-part2). Long ago I removed the mirrored log, so right before this issue it was a simple RAID0 pool of two full disks (i.e. zpool create tank /dev/disk/by-id/disk1 /dev/disk/by-id/disk2).

I added the third disk to the pool (i.e. zpool add tank /dev/disk/by-id/disk5). Then I removed the first disk (i.e. zpool remove tank /dev/disk/by-id/disk1) - it succeeded as expected.

Then I tried to remove the second disk (i.e. zpool remove tank /dev/disk/by-id/disk2) - the command itself was killed by the OS (outputting Killed) and the pool broke. Anything related to this pool after that, blocked forever (e.g. zpool status) and never produced any output. Even shutdown -r now blocked forever, and hard reset was required.

After a forced reboot, the pool went up without a hitch, no data lost (at least as far as I could tell). Trying to remove the second disk led to the same result.

After several attempts, I had to recreate the pool (because I needed to physically detach both disks ASAP) and restore data from backup (which was magnitudes slower than simply relocating data from the disk being removed). It took quite a while and was a major PITA.

Long story short, here's the log from system log generated upon executing the removal of the second disk (i.e. zpool remove tank /dev/disk/by-id/disk2).

I don't recall exactly on which ZFS version the pool was created - it was around the times of Fedora 35 IIRC. The ZFS version during removal attempt was 2.2.7 on Fedora 41.

I hope it helps.

zpool-remove.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants