Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool remove of top-level mirror or vdev still no work #13552

Open
haraldrudell opened this issue Jun 13, 2022 · 10 comments
Open

zpool remove of top-level mirror or vdev still no work #13552

haraldrudell opened this issue Jun 13, 2022 · 10 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@haraldrudell
Copy link

haraldrudell commented Jun 13, 2022

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04 focal
Kernel Version 5.13.0-41-generic
Architecture amd64
OpenZFS Version zfs-2.1.4-0york0~20.04, zfs-kmod-2.0.6-1ubuntu2.1

Describe the problem you're observing

zpool remove errors despite documented requirements apparently fulfilled

Describe how to reproduce the problem

use my pool…
zfs provides no ability for more detail, any log or any helpful error messages

For example, zfs could tell me how many more bytes I need, right?

Include any warning/errors/backtraces from the system logs

I am trying to migrate a pool from disk to partition

  • with usb, disks may come back under a new /dev/sd name, so if vdevs are not partitions, a reboot is required

To remove a top-level mirror or simple vdev (I tried both) ashift must be same, no raidz, encryption keys loaded

  • The error message for no encryption key is still the unhelpful “permission denied”

THE ERROR

date --rfc-3339=second && zpool remove zro b7f9667a-343c-4fec-aeb1-2ed5fd9f7319
2022-06-13 00:01:00-07:00
cannot remove b7f9667a-343c-4fec-aeb1-2ed5fd9f7319: out of space

THE FACTS

zdb -C zro | less
path: '/dev/disk/by-partuuid/b7f9667a-343c-4fec-aeb1-2ed5fd9f7319'
ashift: 16
asize: 2000384688128
path: '/dev/disk/by-partuuid/202d6b64-987c-406d-b64e-81e6357a9721'
ashift: 16
asize: 2000393076736

NOTE: 202d… is apparently larger than b7f9… Data used to be stored on bf79… alone, and now it does not fit onto a larger vdev?

zpool list -poalloc zro
        ALLOC
1828179083264

HERE’S THE SIZE OF 202d… as a POOL:

zpool list -posize zrw220612
         SIZE
1992864825344

APPARENT SPACE AVAILABLE:
1992864825344 - 1828179083264 = 164685742080 bytes ≈ 153 GiB

PERCENTAGE SPACE AVAILABLE:
164685742080 / 1992864825344 ≈ 8.2%

#11409 lists some apparently undocumented requirements

#11356 have some more

Because the second device is actually larger, this is probably some other buggy bug hiding behind the out of space error

I noticed you lose about 16 GiB by going to partition from full disk

zdb -C also lists 3 indirect devices that should be removable using zfs remap That command is not available

SUGGESTION
for zpool remove to have a -f flag that ignores these highly inaccurate predictions: if it fails, it fails. remove is already aborted if an i/o error is encountered. All it costs to try is to wait for it to fail

make error message actionable: how many bytes, what are the determined values

@haraldrudell haraldrudell added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jun 13, 2022
@rincebrain
Copy link
Contributor

rincebrain commented Jun 13, 2022

Well, you're running ZFS 2.0.6 kernel modules with ZFS 2.1.4 userland, so surprising things may ensue. Did you install jonathonf's PPA but not install the zfs-dkms package with the others, maybe? That's usually what happened when I see that.

It would be helpful if you included the (full, not just the parts you think are relevant) output of zpool list -vp on the pool you're trying zpool remove on.

Finally, an -f to override the space checks would be a very bad idea, IMO - people wouldn't bother reporting issues or investigating why it refused, they'd just start using -f, and then complain when it broke midway through if they did, indeed, not have sufficient space.

(zfs remap was removed in 6e91a72, but also didn't do what you seem to think it does - it forced the remapping of specific things you used zpool remove on immediately, but it did not delete indirect vdevs entirely; those will, as far as I know, never go away for the lifetime of the pool, they just don't get shown, like when you do zpool remove on cache or log devices, and the indirection table for each will eventually be empty, ideally.)

e: You also don't inherently have to lose ~any space going from "full disk" to "partition", AFAIK - since "full disk" just means "we made the partition table, set a bit on the device metadata that we manage the partition table and can rewrite it, and hide the 'part1' or similar in the zpool status output, you're using a partition either way. What does the partition table look like?

@haraldrudell
Copy link
Author

haraldrudell commented Jun 13, 2022

Retried with fresh install
Ubuntu 22.04 jammy
5.13.0-48-generic
zfs-2.1.2-1ubuntu3 zfs-kmod-2.0.6-1ubuntu2.1

same FAIL outcome

— it had also failed with two sets of mirrors prior

zpool list -vp zro
NAME                                             SIZE          ALLOC           FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zro                                     3985729650688  1828179083264  2157550567424        -         -      3     45   1.00    ONLINE  -
  b7f9667a-343c-4fec-aeb1-2ed5fd9f7319  1992864825344  1828145594368  164719230976        -         -      6     91      -    ONLINE
  202d6b64-987c-406d-b64e-81e6357a9721  1992864825344  33488896  1992831336448        -         -      0      0      -    ONLINE

@rincebrain
Copy link
Contributor

That's the same kind of mismatched kernel module version as before, and 22.04 doesn't ship that kernel version, so I think you installed something else.

@haraldrudell
Copy link
Author

Fixed that, too: unfortunately still there. I will try the second pool, too

date --rfc-3339=second && zpool remove zro b7f9667a-343c-4fec-aeb1-2ed5fd9f7319
2022-06-15 03:21:18-07:00
cannot remove b7f9667a-343c-4fec-aeb1-2ed5fd9f7319: out of space
zpool list -vp zro
NAME                                             SIZE          ALLOC           FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zro                                     3985729650688  1828159422464  2157570228224        -         -      3     45   1.00    ONLINE  -
  b7f9667a-343c-4fec-aeb1-2ed5fd9f7319  1992864825344  1828139237376  164725587968        -         -      6     91      -    ONLINE
  202d6b64-987c-406d-b64e-81e6357a9721  1992864825344  20185088  1992844640256        -         -      0      0      -    ONLINE
echo $(date --rfc-3339=second) $(hostname -s) $(lsb_release --codename --release --short) $(uname --kernel-release) $(uptime --since) && echo $(zfs version)
2022-06-15 03:21:32-07:00 c68z 22.04 jammy 5.15.0-37-generic 2022-06-15 01:57:46
zfs-2.1.2-1ubuntu3 zfs-kmod-2.1.2-1ubuntu3

@haraldrudell
Copy link
Author

haraldrudell commented Jun 15, 2022

The purpose of this exercise was to change the vdevs from being whole disk to instead the single partition on that same disk.
I expected the vdevs to shrink by this, but it turns out vdevs are enlarged by about 9 GiB.
Expecting vdevs to shrink, this is why I used striping on the first pool, causing the trouble.
The second pool, now knowing vdevs are enlarging, I used a single top-level mirror and then resized online to vdev, so that worked.
For the first pool, it is still the case that the second stripe cannot be removed, but there are still activities resilvering that may work.

Unexpectedly, I could add a smaller device to the first stripe making it a mirror. It appears zfs is confused about vdev sizes
If I understand this right, some vdevs are both current and indirect.

The goal below is for zro to consist of the single vdev 202d6b64

I do have off-line backup, so once layout is satisfactory, the data can be verified

date --rfc-3339=second && zpool status zro                                                                 
2022-06-15 11:18:24-07:00                                                                                                   
  pool: zro                                                                                                                 
 state: ONLINE                                                                                                              
status: One or more devices is currently being resilvered.  The pool will                                                   
        continue to function, possibly in a degraded state.                                                                 
action: Wait for the resilver to complete.                                                                                  
  scan: resilver in progress since Wed Jun 15 11:10:25 2022                                                                 
        1.66T scanned at 3.55G/s, 71.4G issued at 153M/s, 1.66T total                                                       
        74.1G resilvered, 4.19% done, 03:02:29 to go                                                                        
remove: Removal of vdev 2 copied 8.25M in 0h0m, completed on Sat Jun 11 11:36:35 2022                                       
        2.98K memory used for removed device mappings                                                                       
config:                                                                                                                     
                                                                                                                            
        NAME                                      STATE     READ WRITE CKSUM                                                
        zro                                       ONLINE       0     0     0                                                
          mirror-0                                ONLINE       0     0     0                                                
            b7f9667a-343c-4fec-aeb1-2ed5fd9f7319  ONLINE       0     0     0                                                
            202d6b64-987c-406d-b64e-81e6357a9721  ONLINE       0     0     0  (resilvering)                                 
          692da33c-0f8e-47ff-a7bf-8d6770772469    ONLINE       0     0     0                                                
                                                                                                                            
errors: No known data errors                                                                                                
zdb -C zro                                                                                                 
                                                                                                                            
MOS Configuration:                                                                                                          
        version: 5000
        name: 'zro'
        state: 0
        txg: 13168461
        pool_guid: 14578202207106007368
        errata: 0
        hostid: 1532761246
        hostname: 'c68z'
        com.delphix:has_per_vdev_zaps
        vdev_children: 5
        vdev_tree:
            type: 'root'
            id: 0
            guid: 14578202207106007368
            create_txg: 4
            children[0]:
                type: 'mirror'
                id: 0
                guid: 5986449157720229279
                whole_disk: 0
                metaslab_array: 136
                metaslab_shift: 34
                ashift: 16
                asize: 2000384688128
                is_log: 0
                create_txg: 4
                com.delphix:vdev_zap_top: 135
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 4608714181282964506
                    path: '/dev/disk/by-partuuid/b7f9667a-343c-4fec-aeb1-2ed5fd9f7319'
                    whole_disk: 0
                    DTL: 531
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 529
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 10232177178490955932
                    path: '/dev/disk/by-partuuid/202d6b64-987c-406d-b64e-81e6357a9721'
                    whole_disk: 0
                    DTL: 2489
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 2406
                    resilver_txg: 13168458
            children[1]:
                type: 'indirect'
                id: 1
                guid: 2079469523163655316
                whole_disk: 0
                metaslab_array: 0
                metaslab_shift: 34
                ashift: 16
                asize: 2000393076736
                create_txg: 4
                com.delphix:vdev_zap_top: 135
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 4608714181282964506
                    path: '/dev/disk/by-partuuid/b7f9667a-343c-4fec-aeb1-2ed5fd9f7319'
                    whole_disk: 0
                    DTL: 531
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 529
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 10232177178490955932
                    path: '/dev/disk/by-partuuid/202d6b64-987c-406d-b64e-81e6357a9721'
                    whole_disk: 0
                    DTL: 2489
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 2406
                    resilver_txg: 13168458
            children[1]:
                type: 'indirect'
                id: 1
                guid: 2079469523163655316
                whole_disk: 0
                metaslab_array: 0
                metaslab_shift: 34
                ashift: 16
                asize: 2000393076736
                is_log: 0                                                                                                   
                com.delphix:indirect_object: 324                                                                            
                com.delphix:indirect_births: 326                                                                            
                create_txg: 13144933                                                                                        
                com.delphix:vdev_zap_top: 5                                                                                 
            children[2]:                                                                                                    
                type: 'indirect'                                                                                            
                id: 2                                                                                                       
                guid: 17684715946461903184                                                                                  
                whole_disk: 0                                                                                               
                metaslab_array: 0                                                                                           
                metaslab_shift: 34                                                                                          
                ashift: 16                                                                                                  
                asize: 2010039975936                                                                                        
                is_log: 0
                com.delphix:indirect_object: 15
                com.delphix:indirect_births: 17
                com.delphix:prev_indirect_vdev: 3
                create_txg: 13145756
                com.delphix:vdev_zap_top: 396
            children[3]:
                type: 'indirect'
                id: 3
                guid: 11721561378520130680
                whole_disk: 0
                metaslab_array: 0
                metaslab_shift: 34
                ashift: 16
                asize: 2000393076736
                is_log: 0
                com.delphix:indirect_object: 176
                com.delphix:indirect_births: 178
                com.delphix:prev_indirect_vdev: 1
                create_txg: 13146201
                com.delphix:vdev_zap_top: 525
            children[4]:
                type: 'disk'
                id: 4
                guid: 4756522072335310867
                path: '/dev/disk/by-partuuid/692da33c-0f8e-47ff-a7bf-8d6770772469'
                whole_disk: 0
                metaslab_array: 405
                metaslab_shift: 34
                ashift: 16
                asize: 2000393076736
                is_log: 0
                DTL: 518
                create_txg: 13146842
                com.delphix:vdev_zap_leaf: 395
                com.delphix:vdev_zap_top: 399
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data
            com.delphix:device_removal

@haraldrudell
Copy link
Author

Now this command fails:

date --rfc-3339=second && zpool remove zro 692da33c-0f8e-47ff-a7bf-8d6770772469                         
2022-06-15 14:20:53-07:00                                                                                                  
cannot remove 692da33c-0f8e-47ff-a7bf-8d6770772469: out of space

when it should work. Below is vdev sizes:

date --rfc-3339=second && zpool list -vp zro
2022-06-15 14:24:01-07:00
NAME                                             SIZE          ALLOC           FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zro                                     3985729650688  1828199137280  2157530513408        -         -      3     45   1.00    ONLINE  -
  202d6b64-987c-406d-b64e-81e6357a9721  1992864825344  1828151361536  164713463808        -         -      6     91      -    ONLINE
  692da33c-0f8e-47ff-a7bf-8d6770772469  1992864825344  47775744  1992817049600        -         -      0      0      -    ONLINE

@behlendorf
Copy link
Contributor

This sounds like a duplicate of #11356. There was some work proposed in PR #11409 to tighten up the space requirements, but this ended up being a bit trickier than expected and the change has not yet been finalized.

@haraldrudell
Copy link
Author

Hey, I managed to fix it!!

I added a random 426 GiB ssd stripe to try to remove the big bulky duplicated vdev

After that, both stripes could be removed, leaving me with the pool organization I wanted

Free space on the final pool is 153 GiB of 1.81 TiB
When there was an extra 426 available, the operation could complete

@haraldrudell
Copy link
Author

After completion, I could verify the data to be OK using:
zpool scrub and
diff --recursive --brief --no-dereference
I had read that checksums are not verified during vdev evacuation via zpool remove

@fredcooke
Copy link

Thank you @haraldrudell for your tip on adding MORE space to then REMOVE space :-D This worked for me, too. I will detail my experience below for @behlendorf and @ahrens to consider:

Context, all my HDDs are in USB boxes outside - this is sub ideal, but it's the best I have for now.

My full set is as follows:

2.0.4 from the PPA on ubu 20.04 - this is what I installed on and I don't want to change without some off-site snapshot sync going on first.

3x zpools:

  1. 4TB SSD nvme unmirrored zpool
  2. 2x 4TB HDD over USB mirrored zpool to receive snapshots from 1
  3. 8TB SSD + 8TB HDD over USB + 10TB HDD over USB = 3 way mirror

I had to shut down the whole machine for a few weeks and brought it up again last night for the first time and had trouble with 3 out of 4 of the USB HDDs.

The two BIG ones had to resilver from the big SSD once I told them to come back online, but it didn't take super long since they're fairly empty. Not ideal, and a function of not doing an offline operation prior to shut down and an improper unmount AFAICT. Work in progress.

One of the 4TB HDDs was fine in the 2-way mirror and allowed that zpool and the FSes within to function in a degraded state. The other 4TB HDD was suddenly called sde rather than the ZFS ID or the disk ID which I set them up with. This one was unhappy and FAULTED. I tried power cycling the drive and other stuff, but no dice.

Remediation process:

I incorrectly thought remove was the command to take a drive out of a mirror - turns out detach is the correct command. Remove didn't work - so I went and bought a 6TB drive and another USB case to help me more easily figure out which was which since they all looked similar in /dev - then I made my second symmetrical mistake and used add to add a top level vdev inside the zpool instead of attach to increase the mirror count, oops. Then I tried to remove the freshly added 6TB: out of space - what? This cannot be, no data was written to this pool since it was a read only snapshot target for duplicating the SSD-only pool anyway. How can there be insufficient space to strip the empty metadata tree from this freshly added drive and abandon it? Turns out, there was ample space, 0.5TB free on the single still working 4TB disk now trusted with all secondary snapshot storage from the SSD. I asked on IRC (libera, RIP freenode) and three characters there assisted me, one linking this ticket, another telling me to detach, and a third in various other ways. It felt like desperation, but I did the following to mimic @haraldrudell's fix:

  1. detach "bad" 4TB from the mirror creating an unmirrored single 4TB disk of data and an empty 6TB single disk as top level vdevs
  2. add the "bad" drive back in as a top level drive to make the bad math work out okay - now 3 top level unmirrored vdevs, 2 empty, one 7/8ths full, 10.5TB free
  3. remove the new 6TB top level vdev - now succeeded and moved, wait for it, 3 megabytes of data off the drive...
  4. attach the new 6TB to the old/good 4TB creating a mirror once again and begin resilvering process on the 6TB disk, success
  5. remove "bad" 4TB top level vdev, 0.5TB is now suddenly enough, 2.5 megabytes of data off the drive...

Which left me where I am with 2 copies of my data and a third recreating on a brand new ironwolf pro 6TB and and the old "bad" 4TB available to do what I please with: add to the SSD to create a hybrid mirror so my snapshot etiquette isn't as critical, similar to the big 3 way mirror.

Observations:

  • The math is off by more than five orders of magnitude.
  • There is no way to tell ZFS zpool remove "trust me" with a -f or similar on remove or even a --manual-spare-space-math-ymmv
  • The error when trying to remove a member of the mirror was unhelpful "hey, idiot, did you mean detach?" would be better - I'd rather be put in my place by smart software than have to put it in its place with dirty work arounds :-D
  • My disk was okay the entire time, but ZFS got super unhappy with it due to pointing at the wrong device or something and it seemed impossible to get it to have a nice chat with the actual disk - no activity while error count spiralling upward

I'm happy now but I would describe this as a really poor user experience triggered by my own shoddy USB set up and my own misuse of commands, however recovering was way harder than it needed to be because of the combo of bad math && no override - please make the math accurate (surely it can just scan the vdev and KNOW exactly how much space it will take to get rid of it?) and/OR let us tell it to try anyway, don't block us with bad math and a lack of override. This is a bad combination.

Thanks @haraldrudell for documenting your experience here, I thought your fix was crazy, but it seemed to make sense, and in the end it worked perfectly for me. Thanks @behlendorf and @ahrens for all your hard work that allowed me to have this stack in the first place. Much appreciated. Hopefully this monologue is in some way useful. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants