Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"attempt to access beyond end of device" and devices failing #15932

Closed
i3v opened this issue Feb 25, 2024 · 14 comments
Closed

"attempt to access beyond end of device" and devices failing #15932

i3v opened this issue Feb 25, 2024 · 14 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@i3v
Copy link

i3v commented Feb 25, 2024

System information

Type Version/Name
Distribution Name CentOS Linux release
Distribution Version 7.5.1804 (Core)
Kernel Version 3.10.0-862.9.1.el7.x86_64
Architecture x86_64
OpenZFS Version 0.7.9
Hardware Dell DSS7500

Describe the problem you're observing

I am looking at a system that was not maintained since 2021 or so...
And I'm seeing an endless wall of similar error messages (about ~1GB of them in my /var/log/messages).
The messages are:

Feb 24 22:36:37 dell-storage kernel: sdaq: rw=14, want=14675306560, limit=4294967296
Feb 24 22:36:37 dell-storage kernel: attempt to access beyond end of device
Feb 24 22:36:37 dell-storage kernel: sdac: rw=14, want=14675306696, limit=4294967296
Feb 24 22:36:37 dell-storage kernel: attempt to access beyond end of device
Feb 24 22:36:37 dell-storage kernel: sdaq: rw=14, want=14675306688, limit=4294967296
Feb 24 22:36:37 dell-storage kernel: attempt to access beyond end of device
Feb 24 22:36:37 dell-storage kernel: sdac: rw=14, want=14675306888, limit=4294967296
Feb 24 22:36:37 dell-storage kernel: attempt to access beyond end of device

I know #7906 is an old thread and about a very old zfs version, that no one already cares about...
But, at least, I would like to report that I'm getting similar messages on 0.7.9 too.
This is happening during the nightmarish resilvering:

[root@dell-storage tmp]# ZPOOL_SCRIPTS_AS_ROOT=1 zpool status -c upath,defect,nonmed,ucor,health
  pool: tank30
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Feb 20 23:53:35 2024
        111T scanned out of 235T at 329M/s, 110h18m to go
        9.29T resilvered, 47.04% done
config:

        NAME                                     STATE     READ WRITE CKSUM                                       upath  defect  nonmed                                                                     health
        tank30                                   DEGRADED     0     0 13.4M
          <skipping unrelated vdevs here>
          raidz2-3                               DEGRADED     0     0 26.7M
            scsi-35000c50094bacf9f               DEGRADED     0     0     0  too many errors  (resilvering)    /dev/sdy     825    2862                                                                         OK
            scsi-35000c50094ba731b               DEGRADED     0     0     0  too many errors  (resilvering)    /dev/sdz     250    1204                                                                         OK
            replacing-2                          DEGRADED     0     0 12.3M
              scsi-35000c50094baa9e3             UNAVAIL      0     0     0  (resilvering)                    /dev/sdaa       -       -                                                                          -
              scsi-35000c500a74f4453             ONLINE       0     0     0  (resilvering)                    /dev/sdcl       0     270                                                                         OK
            scsi-35000c50094ba1f2f               DEGRADED     0     0     0  too many errors  (resilvering)   /dev/sdab     149     667                                                                         OK
            spare-4                              ONLINE       0     0 35.7K
              scsi-35000c50094ba6eff             ONLINE   13.4M 13.4M     0  (resilvering)                    /dev/sdac   47648     400  DATA_CHANNEL_IMPENDING_FAILURE_DATA_ERROR_RATE_TOO_HIGH_[asc=5d,_ascq=32]
              scsi-35000c500a75223a3             ONLINE       0     0     0  (resilvering)                    /dev/sdce       0      35                                                                         OK
            spare-5                              DEGRADED     0     0 48.9M
              scsi-35000c50094b5a07b             DEGRADED 13.4M 13.4M     0  too many errors  (resilvering)   /dev/sdaq   16559    2449  DATA_CHANNEL_IMPENDING_FAILURE_DATA_ERROR_RATE_TOO_HIGH_[asc=5d,_ascq=32]
              scsi-35000c500a74a311b             ONLINE       0     0     0  (resilvering)                    /dev/sdcf       0     223                                                                         OK
          <skipping unrelated vdevs here>
          raidz2-6                               DEGRADED     0     0     0
            ata-ST10000VN0008-2JJ101_ZPW0690L    ONLINE       0     0     0                                   /dev/sdbc       -       -                                                                     PASSED
            ata-ST10000VN0008-2JJ101_ZPW07746    ONLINE       0     0     0                                   /dev/sdax       -       -                                                                     PASSED
            ata-ST10000VN0008-2JJ101_ZPW07P9B    ONLINE       0     0     0                                   /dev/sdav       -       -                                                                     PASSED
            ata-ST10000VN0004-1ZD101_ZA28VCRF    ONLINE       0     0     4                                   /dev/sdbg       -       -                                                                     PASSED
            replacing-4                          DEGRADED     0     0     0
              ata-ST10000VN0004-1ZD101_ZA290AX1  FAULTED      0     0     1  too many errors                  /dev/sdbe       -       -                                                                     PASSED
              ata-ST10000VE0008-2KX101_ZHZ5V5VL  ONLINE       0     0     0  (resilvering)                    /dev/sdaz       -       -                                                                     PASSED
            ata-ST10000VN0004-1ZD101_ZA28YF4Z    ONLINE       0     0     1                                   /dev/sdbb       -       -                                                                     PASSED
        spares
          scsi-35000c500a75223a3                 INUSE     currently in use                                   /dev/sdce       0      35                                                                         OK
          scsi-35000c500a74a311b                 INUSE     currently in use                                   /dev/sdcf       0     223                                                                         OK
          scsi-35000c500a74a3ddb                 AVAIL                                                        /dev/sdcg       0     227                                                                         OK
          scsi-35000c500a74f44c3                 AVAIL                                                        /dev/sdch       0     209                                                                         OK
          scsi-35000c500a6b32383                 AVAIL                                                        /dev/sdci       0     117                                                                         OK
          scsi-35000c500a74a3d4b                 AVAIL                                                        /dev/sdcj       0     114                                                                         OK
          scsi-35000c500a74a3a1f                 AVAIL                                                        /dev/sdck       0     346                                                                         OK

errors: 14011285 data errors, use '-v' for a list

Note, that there's a mix of 8TB and 10TB drives in the pool (maybe this is somehow related to this issue as well).

Just before I started that resilvering, things were scary already, but not that bad:

  pool: tank30
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
	repaired.
  scan: scrub repaired 0B in 39h21m with 0 errors on Fri Aug 27 02:40:44 2021
config:

	NAME                                         STATE     READ WRITE CKSUM                         upath  defect  nonmed                                                                      health
	tank30                                       DEGRADED     0     0     0                                                                                                                          
	  <skipping unrelated vdevs here>                                                                                                                                                                
	  raidz2-3                                   DEGRADED     0     0     0                                                                                                                          
	    scsi-35000c50094bacf9f-part1             ONLINE       0     0     0                      /dev/sdy     796    2830                                                                          OK
	    scsi-35000c50094ba731b-part1             ONLINE       0     0     0                      /dev/sdz     248    1199                                                                          OK
	    scsi-35000c50094baa9e3-part1             FAULTED      0     0     0                     /dev/sdaa    7412   48588   DATA_CHANNEL_IMPENDING_FAILURE_DATA_ERROR_RATE_TOO_HIGH_[asc=5d,_ascq=32]
	    scsi-35000c50094ba1f2f-part1             ONLINE       0     0     0                     /dev/sdab     147     663                                                                          OK
	    scsi-35000c50094ba6eff-part1             ONLINE       0     0     0                     /dev/sdac    9315     387   DATA_CHANNEL_IMPENDING_FAILURE_DATA_ERROR_RATE_TOO_HIGH_[asc=5d,_ascq=32]
	    scsi-35000c50094b5a07b-part1             ONLINE       0     0     0                     /dev/sdaq     428    2443                                                                          OK
	  <skipping unrelated vdevs here>                                                                                                                                                                
	  raidz2-6                                   DEGRADED     0     0     0                                                                                                                          
	    ata-ST10000VN0008-2JJ101_ZPW0690L-part1  ONLINE       0     0     0                     /dev/sdbc       -       -                                                                      PASSED
	    ata-ST10000VN0008-2JJ101_ZPW07746-part1  ONLINE       0     0     0                     /dev/sdax       -       -                                                                      PASSED
	    ata-ST10000VN0008-2JJ101_ZPW07P9B-part1  ONLINE       0     0     0                     /dev/sdav       -       -                                                                      PASSED
	    ata-ST10000VN0004-1ZD101_ZA28VCRF-part1  ONLINE       0     0     4                     /dev/sdbg       -       -                                                                      PASSED
	    ata-ST10000VN0004-1ZD101_ZA290AX1-part1  FAULTED      0     0     1  too many errors    /dev/sdbe       -       -                                                                      PASSED
	    ata-ST10000VN0004-1ZD101_ZA28YF4Z-part1  ONLINE       0     0     1                     /dev/sdbb       -       -                                                                      PASSED

errors: No known data errors

thus I "happily" started zpool replace, then added a few spares, then manually started two additional zpool replace (with spare disks this time).

I'm not sure if those messages actually got the same origin as reported in #7906 (for the 0.7.10), because there's something weird with these disks:

[root@dell-storage tmp]# lsblk | egrep "sdaq|sdac|sdy"
sdy              65:128  0   7.3T  0 disk
├─sdy1           65:129  0   7.3T  0 part
└─sdy9           65:137  0     8M  0 part
sdac             65:192  0     2T  0 disk
├─sdac1          65:193  0   7.3T  0 part
└─sdac9          65:201  0     8M  0 part
sdaq             66:160  0     2T  0 disk
├─sdaq1          66:161  0   7.3T  0 part
└─sdaq9          66:169  0     8M  0 part

Note sdaa and asaq are suddenly 2TB disks now, while sdy looks perfectly normal. They were all reporting 7.3T before resilvering started.
The only thing I found that resembles these "weird capacity` is this post. Interestingly, it is also related to zfs.

The sdaa is a completely different story - it just disappeared from the system at about 30% resilvering progress.

I was monitoring the process, and it looked like those attempt to access beyond end of device started few minutes before lsblk stated to show this weird 2TB. Thus, I'm not actually sure what was the cause and what is just the effect here.

All these 3 disks (sdaa, sdac, sdaq) were struggling to read from about 18% of resilvering

  • according to zpool iostat the reading speed frequently dropped to about 1 MB/s (but it mostly was about 50 MB/s)
  • according to iostat -x one of them had ~100% utilization, while all other disks were about idle
  • up to about ~30% of the resilvering (when the last of these 3 disks failed), there were no permanent errors

I guess that CRC errors for the 10TB disks could be resolved with this. But even without that, there's a ton of nonmed errors for many disks, which, AFAIU, means that there could be something wrong with cabling / firmware / etc. (that is, something that could be fixed).


Personal concerns:

  1. There's no backup and the data is valuable.
  2. Now I wish I had started from doing a ddrescue [1], [2] for all disks in raidz2-3, even though some people discourage that [3]. I'm not really sure. And I found no "official guidance" (if any at all) on this.
  3. I stopped zed and prevented any user data access. There's no rush.
  4. For now, I plan to just wait for the resilvering process to finish and then reboot. I know it would just log most remaining files (that happen to be on raidz2-3) as "permanent errors", but I'm afraid I would loose even what's "already resilvered" if I reboot now (because, AFAIU, resilvering would restart from scratch and maybe even discard any results of the previous unfinished resilvering).
    1. I think there's a chance that after a reboot, at least one of the (sdaa, sdac, sdaq) would come online, which would allow the subsequent zfs resilver to rescue the data.
    2. If none of them would come back, I'll be probably looking for some block-by-block data rescue services for them.
    3. Maybe I should upgrade to 0.7.12 after a reboot. I'm afraid to upgrade further for now.
    4. However, I'm afraid that sdaa would be automatically removed from the pool once the resilvering would be finished, so that zfs would just ignore it, even if it would be perfectly working and containing the data that could have been used for good.... I'm not sure if that would be the case, how to work around that if that would happen and if there's a possibility to not let that happen. Theoretically, AFAIU, the original device (if different from replacement) will be removed from the pool, but I've also
    5. I'm not even sure if zfs would actually try to read from any of those 3 HDDs even if they would all come online (or would it consider that it already tried that and already proved that there's no point in that).

Any help is greatly appreciated...

Describe how to reproduce the problem

No idea.

Include any warning/errors/backtraces from the system logs

Some fragments are included above. I can provide more if anyone's interested.

@i3v i3v added the Type: Defect Incorrect behavior (e.g. crash, hang) label Feb 25, 2024
@rincebrain
Copy link
Contributor

rincebrain commented Feb 25, 2024

The last two times I saw something like this, it was because:

  • someone's SSDs were bugging out and resetting into some weird debug mode where they showed up as like, 4M devices or something, where they were only supposed to do that for factory programming, so there's not really much to be done about "the disk effectively is gone and something else took its place"
  • someone had moved a disk from being directly attached to within a USB enclosure that couldn't count as high as the end of the disk's LBA count, so it truncated the size to [total number of sectors] - [max number of sectors it could support], leading to weird things like a 1.67T disk or the like. (As I recall, the top result if you googled that weird disk size was reports of people encountering that case.)

It displaying a disk size of exactly 2^32 suggests something somewhere got confused about the disk size, and that it's below ZFS, since Linux is what's claiming the disk is only that big, at this point, and ZFS doesn't do anything more intimate about disks than "send a discard request" or "write a partition table and wait for the OS to rescan it", really.

So I would suggest you investigate how on earth those disks are reporting being 2T actual size, and whether that matches the specs of the device, since if it's entirely made of 8 and 10T disks, you should not be able to attach a 2T disk no matter how confused ZFS got.

Also, since 0.7 hasn't been updated since 2019, I would strongly doubt anyone is going to look at any bug you find even if it is in ZFS unless you try it on a version like 2.1 or 2.2 that is still getting fixes, to confirm it's not something that is long-fixed.

To be clear, I don't think, at least with the information at hand, this is a bug in ZFS, since the disk themselves appear to be reporting being 2T, and if the disk says it's not big enough, ZFS can't do all that much about it. But separate from that, if you do end up finding a bug in ZFS somewhere along the way, that would be my expectation.

@i3v
Copy link
Author

i3v commented Feb 25, 2024

Indeed, I've seen some discussions about USB disks and 2TB. But there's no USB involved here, and I think no one physically touched any disks since 2021. But the idea that disk entered some "special" mode looks like a very plausible explanation to me. And I just still have a faint hope power-cycling the system would bring it back to the normal mode...

I agree that the 2TB value comes from something way below zfs, but it looks like that weird condition was somehow triggered by zfs doing the resilvering. Maybe this is somewhat similar to Unsuitable SSD/NVMe hardware for ZFS - WD BLACK SN770 and others (except that these ST8000NM0185 is not something that "was not marketed as server-grade and thus not intended for zfs" as one might say for SN770). It is likely that zfs isn't doing anything wrong - it's just the hardware that isn't working properly under specific circumstances. But anyway, this seem to be some sort of a related issue. Maybe it would even manifest itself in some modern zfs version eventually. Thus, maybe, if there's nothing to fix, there are some suggestions about the damage control and data recovery?

Any comments about my plan about "how to proceed with this", in particular?

@rincebrain
Copy link
Contributor

I'd reboot now, since you're just going to have to resilver again once the disks come back anyway, really, and then make sure you for it to not defer resilvering some of them with a forcible zpool resilver if resilver_defer is enabled on the pool.

@i3v
Copy link
Author

i3v commented Feb 25, 2024

OK. thanks for the suggestion! But...

  • I think resilver_defer was introduced in 0.8.0. Before that, the general suggestion was to just stop the ZED daemon, AFAIU.
  • Would I just loose everything if I reboot now and all those disks (sdaa, sdac, sdaq) would turn out to be completely fried? I mean, including the data that the current resilver managed to copy to the new disks... The data of the unfinished resilvering would be just discarded in that old version, wouldn't it?
  • Is it "recommended"/"discouraged" to update zfs in such a situation, with the ongoing resilvering? The package? The pool version? Only a minor version upgrade (0.7.12 or 0.7.13) or all the way up to the modern version (2.2.3)?

BTW, all ST8000NM0185 got PT51 firmware. Dell provides "Urgent" PT55 firmware, dated 15 Mar 2022. They say Fix an issue that can make the drive unresponsive following a Hard Reset during writes and some other stuff that look related. But they explicitly discourage from installing it when having any degraded RAID or something...

@IvanVolosyuk
Copy link
Contributor

I would say cascading failure of disk can be caused by the extra load on PSU due to more and more disk / disk controller activity. I would rather stop / upgrade PSU / ddrescue failing disks. Resilver state should persist on reboot, but I don't know that for sure.

@rincebrain
Copy link
Contributor

Resilver state persists on reboot, yes, it checkpoints every so often depending on whether it's post or pre sequential scrub.

That suggestion makes no sense in this context. That's like saying you should fix your failing disks by doing a dance around them - it has no relation to the problem at hand.

It shouldn't cause any problems to upgrade, and if it does, it's a bug.

@i3v
Copy link
Author

i3v commented Feb 25, 2024

@IvanVolosyuk , thanks for your suggesion!

  • I guess PSUs must be fine...
    • Just from my experience with Dell PSUs in general.
    • Because of the fact that this 90-disk system is only actively spinning with less than 40.
    • These were SMART health messages regarding two particular HDDs even before the resilvering (see listings in the first post)
    • ... the original devices in the raidz2-3 are in slots 24, 25, 26 ,27, 28. 42. This is indeed a thing to check, if there's something wrong with that slots.
  • Although resilvering progress should indeed persist on a normal reboot... AFAIU, it would be restarted because the previously missing device would be re-attached.
    • At least, it seem to be restated when a device is forcefully brought back to ONLINE from FAULTED on zpool clear tank device.

@rincebrain
Copy link
Contributor

Yes, it would restart, but the entire point of my advice was TO FORCIBLY TRIGGER THE RESTART, since that feature otherwise would make you trigger a resilver again after it finished what it was doing, which is usually not desirable, and the authors of that feature really didn't design it well.

@i3v
Copy link
Author

i3v commented Feb 25, 2024

@rincebrain ,
Aren't people usually not wanting "one more resilver pass" just because they do not want to wait for it to finish and because they do not expect any benefit from the additional passes? (BTW, is there any way to see how many "resilver passes" are currently queued?)

I think my situation here is just the opposite:

  1. I want to "secure" what's already done by the first stage of the resilver (at least those first 30% that had been successfully resilvered in the beginning). Maybe more, because some files are probably just entirely on another vdevs (added a bit later). Otherwise, I might loose all that if sdaa, sdac, sdaq prove completely fried.
    • As a valuable bonus, this would also give me a complete list of the faulty files that I would not be able to restore without any of those 3 disks (and all other files would be "secure" already)
  2. I do want a second resilver pass to happen, after I reboot (and, hopefully, some of those failing disks would be online). I expect that any additional passes of the resilvering would not destroy anything that the "first resilvering" gained, but only if the first resilvering would be completely finished.
    • I'm really not sure where I got this idea from. Is it just my imagination?
    • I just hope that sdaa would not detach from the pool just because "replacing" is done. Thus, the second resilvering pass would still have a chance to read whatever there's useful on it. I really wonder, if the disk that I set to replace (with non-spare disk) would be automatically detached. And, if so, if is there a way to "bring it back to the attention of the resilvering process"?
  3. Ideally, if those failing disks would come online, I want to "cherry pick" the files from the "Permanent errors" list, before doing a full second resilver. Assuming that zfs would always try to recover the errors using any available redundancy, I wonder if it would help to just try to read the contents of each file from the list of the "Permanent errors", instead of (or at least before) making the disks to read through all those presumably bad sectors where they already failed once.
    • Actually, I'm a bit puzzled with that something like that isn't recommended as a standard approach...
    • Actually, that topic was more about normal reads, rather than replacing disk/resilvering scenario. I really wonder if the disk-being-replaced is counted as the "available redundancy" in this case.
    • AFAIU, at least in the old versions, there is a possibility to have one resilver just started and one more also getting scheduled (deferred). AFAIU, what you're suggesting is how to get around that pithole, to avoid that second resilver.
  4. I do get your advice to stop the resilvering loop by manually triggering the resilvering once again.... But I just assume that "looping resilver" is not the real problem for me now.
  5. I do get that the more I read from the failing disks the worse their condition gets, and for now I'm just spinning them without actually recovering the data. This is not good.
    • But, according to my experience, failing disks usually degrade really fast only when trying to read from the bad sectors (writing bad sectors kills them even faster). For now, they seem to not struggle with reading anything, in accordance with %util from iostat -x 60, even though the total resilvering speed is somehow pretty slow (not sure why).

Or is any of this (or all of this) a complete nonsense?

@rincebrain
Copy link
Contributor

No, you're not losing progress on resilvering by restarting. Even if the checkpoint were old, it still wrote the new data.

Please start a discussion on the mailing list or the Discussions tab if you want to ask questions about how ZFS works, this is a bug report, and does not appear to be a bug in ZFS.

@i3v
Copy link
Author

i3v commented Feb 25, 2024

OK, fair...
I think this would be the thread for my particular situation.
Thanks for all your input!

@mariaczi
Copy link

@i3v I'm not sure if you solve Your problem, but please check what you have in smartctl report for this drive.

smartctl for my drive(s) reports:

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 42560.73
number of minutes until next internal SMART test = 53

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 2259921988 215 0 2259922203 215 1782696.488 0
write: 0 0 279 279 279 214060.692 0
verify: 3854985165 152 0 3854985317 152 1982400.039 0

Non-medium error count: 901

Maybe newer firmware for your hard disk is available. I have had a kind other issue on ST8000NM0185 with firmware PT51 (https://www.dell.com/support/home/pl-pl/drivers/driversdetails?driverid=6421f)

Seagate PT55 for model number(s) ST8000NM0185., A00

Release Date:
December 15, 2021

Default Log File Name:
6421F_A00

Reboot Required:
No

Description:
This release contains firmware version PT55 for supplier model number ST8000NM0185.

Supported Device(s):

Seagate Makara Plus 8.0TB HDD SAS 12Gbps 3.5 512e ISE
Model Number: ST8000NM0185
Vendor PN: 2FF212-150
Regulatory Model Number: STR007

HD,8TB,2E,S12,3.5,S,MP
Model Number: ST8000NM0185
Regulatory Model Number: STR007
Vendor PN: 2FF212-150

Fixes / Enhancements:
Fixes:

  • Fix an issue that can make the drive unresponsive following a Hard Reset during writes.
  • Fix issues that can cause command timeout.
  • Fix issues that can cause Assert condition.
  • Fix an issue that can cause IOEDC error.

Enhancements:

  • Improve performance for certain workloads.
  • Improve error logging / sense code reporting accuracy.
  • Improve SAS spec compliance.
    Important Device Information:
  • Do not run other applications while executing Update Packages.

@i3v
Copy link
Author

i3v commented Jun 1, 2024

@mariaczi ,

  1. Yep, the HDD firmware is outdated. I've already mentioned PT51 firmware above. Note that Dell does not recommend to update firmware when your disk array is degraded. I tend to agree with them.
  2. Sorry, but I'm not sure what's your idea about smartctl is.
  3. The Resilver with one HDD missing and then resilver again? #15926 is the thread about my particular "data recovery situation".
  4. I managed to recover some data by the "reading" technique I described. After that, the disks had been sent to the data recovery service. They managed to read some data from the dying disks (not everything, but hopefully enough to rebuild), but I still don't have the data here (neither I have those dying HDDs). I hope that after I would plug in the clones of those drives, zfs would be able to rebuild itself. That would prove that the original "attempt to access beyond end of device" was a hardware/firmware issue, not a zfs-level logical issue.

@i3v
Copy link
Author

i3v commented Aug 10, 2024

So, to wrap this up:

  1. We sent 4 HDDs (all from raidz2-3) to a good data recovery company. .
    a. scsi-35000c50094baa9e3 - sdaa - all data recovered. Easy to read.
    b. scsi-35000c50094b5a07b - sdaq - all data recovered
    c. scsi-35000c50094bacf9f - sdy - 29MB not recoverable.
    d. scsi-35000c50094ba6eff - sdac - about 69GB proved to be "not easy to recover". Damaged heads. Never bothered to try harder.
    That is, those drives that were showing 2T and were causing the "attempt to access beyond end of device" actually were in a pretty bad physical shape.

  2. After we installed the 3 clones and initiated the rebuild (btw, I was only able to import the pool with zpool import -f tank30), one more disk (not from raidz2-3) started to fail, which (surprisingly) cause resilver to restart (which looks much like resilvering continually restarts on read error and cannot offline bad drive #6613).
    a. From /var/log/messages:

    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: [sdu] CDB: Read(16) 88 00 00 00 00 02 09 52 de d8 00 00 01 00 00 00
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: handle(0x002f), sas_address(0x5000c50094ba17f9), phy(22)
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure logical id(0x5f01faf0fd3ca17e), slot(22) 
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure level(0x0000), connector name(     )
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: task abort: SUCCESS scmd(ffff911ed2ddebc0)
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: [sdu] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: [sdu] CDB: Read(16) 88 00 00 00 00 02 09 52 de d8 00 00 01 00 00 00
    Jul 28 21:15:32 dell-storage kernel: blk_update_request: I/O error, dev sdu, sector 8746360536
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: attempting task abort! scmd(ffff9119361ecc40)
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: CDB: Mode Sense(6) 1a 00 3f 00 04 00
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: handle(0x002f), sas_address(0x5000c50094ba17f9), phy(22)
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure logical id(0x5f01faf0fd3ca17e), slot(22) 
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure level(0x0000), connector name(     )
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: task abort: SUCCESS scmd(ffff9119361ecc40)
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: attempting task abort! scmd(ffff911ab0ed6680)
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: CDB: Mode Sense(6) 1a 00 3f 00 04 00
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: handle(0x002f), sas_address(0x5000c50094ba17f9), phy(22)
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure logical id(0x5f01faf0fd3ca17e), slot(22) 
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure level(0x0000), connector name(     )
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: task abort: SUCCESS scmd(ffff911ab0ed6680)
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: attempting task abort! scmd(ffff911c7a02c380)
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: CDB: Mode Sense(6) 1a 00 3f 00 04 00
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: handle(0x002f), sas_address(0x5000c50094ba17f9), phy(22)
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure logical id(0x5f01faf0fd3ca17e), slot(22) 
    Jul 28 21:15:32 dell-storage kernel: scsi target0:0:20: enclosure level(0x0000), connector name(     )
    Jul 28 21:15:32 dell-storage kernel: sd 0:0:20:0: task abort: SUCCESS scmd(ffff911c7a02c380)
    Jul 28 21:17:57 dell-storage kernel: sd 0:0:20:0: attempting task abort! scmd(ffff9116a5181500)
    Jul 28 21:17:57 dell-storage kernel: sd 0:0:20:0: [sdu] CDB: Read(16) 88 00 00 00 00 02 14 a2 eb d8 00 00 00 c0 00 00
    

    b. From zpool history:

    2024-07-28.21:17:07 [txg:37096744] scan aborted, restarting errors=14946162 [on dell-storage.slb.com]
    2024-07-28.21:17:12 [txg:37096744] scan setup func=2 mintxg=3 maxtxg=37096743 [on dell-storage.slb.com]
    <…>
    2024-07-28.21:42:38 [txg:37096810] scan aborted, restarting errors=0 [on dell-storage.slb.com]
    2024-07-28.21:42:43 [txg:37096810] scan setup func=2 mintxg=3 maxtxg=37096743 [on dell-storage.slb.com]
    
  3. I had to do zpool offline for that disk. After that, resilver finished normally - zfs reports no damaged files.

I'm happy zfs proved to be able to survive through this "real-life resiliency test".

@i3v i3v closed this as completed Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants