Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious re-silver to same device immediately after previous re-silver completed #9378

Closed
HastyMarly opened this issue Sep 30, 2019 · 1 comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@HastyMarly
Copy link

HastyMarly commented Sep 30, 2019

System information

Type Version/Name
Distribution Name Archlinux
Distribution Version Rolling
Linux Kernel 5.3.1-arch1-1-ARCH
Architecture x86_64
ZFS Version 0.8.0-284_g73d7820bb
SPL Version 0.8.0-284_g73d7820bb

Describe the problem you're observing

Spurious resilver to same device immediately after previous resilver completed.

Describe how to reproduce the problem

All I did was run

zpool replace  tank 272646635384699889 /dev/mapper/sdb1crypt

Once the initial resilver completed, it apparently started over from scratch without exiting.

Include any warning/errors/backtraces from the system logs

% zpool status -v
pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Sep 29 18:58:47 2019
1.61T scanned at 254M/s, 784G issued at 121M/s, 2.88T total
784G resilvered, 26.60% done, 0 days 05:05:31 to go
config:

    NAME                      STATE     READ WRITE CKSUM
    tank                      DEGRADED     0     0     0
      mirror-0                DEGRADED     0     0     0
        replacing-0           DEGRADED     0     0     0
          272646635384699889  UNAVAIL      0     0     0  was /dev/mapper/sdb1crypt/old
          sdb1crypt           ONLINE       0     0     0  (resilvering)
        sda1crypt             ONLINE       0     0     0

errors: No known data errors

@jgallag88
Copy link
Contributor

This sounds like it could be the same issue as #9155.

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 20, 2019
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jan 22, 2020
If a device is participating in an active resilver, then it will have a
non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the
resilver to be restarted (or deferred to be restarted later), which is
unnecessary if the DTL is still covered by the current scan range. This
is similar to the logic in vdev_dtl_should_excise() where the DTL can
only be excised if it's max txg is in the resilvered range.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: John Poduska <[email protected]>
Issue openzfs#840
Closes openzfs#9155
Closes openzfs#9378
Closes openzfs#9551
Closes openzfs#9588
tonyhutter pushed a commit that referenced this issue Jan 23, 2020
If a device is participating in an active resilver, then it will have a
non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the
resilver to be restarted (or deferred to be restarted later), which is
unnecessary if the DTL is still covered by the current scan range. This
is similar to the logic in vdev_dtl_should_excise() where the DTL can
only be excised if it's max txg is in the resilvered range.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: John Poduska <[email protected]>
Issue #840
Closes #9155
Closes #9378
Closes #9551
Closes #9588
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants