-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS io error when disks are in idle/standby/spindown mode #4713
Comments
@johnkeates thanks for filing this. I spent a little time looking at this so let me post a possible solution to resolve this. This problem should only impact devices which Linux is performing explicitly power management on. Drives which spin up/down due to internal power management shouldn't see this issue. We're going to need to add some code to check the |
Same issue with SATA disks behind SAS controller. Just to illustrate, on my home nas with 6 SATA disks behind mpt3sas (IT mode), it takes 9sec to wake them all in parallel (hdparm --read-sector in paralllel, till I get sector from all of them), and 44 sec to wake sequentially as it happens with ZFS now. 9 vs 44 is quite a difference. |
Hey guys, the same for me, have a look at these issues: #3785 Here I reported some things about my situation and my environment: After hdd standby I also have to wait for a about one minute. I use 8x4tb wd red in a raid-z2 behind a lsi sas3008 (mpt3sas). Greetings Hoppel118 |
Spinning many drives up in parallel may be very unfriendly towards the power supply. To make matters worse, I think it is not uncommon for drive motors to use the +5 V line, which on most (consumer) PSUs has rather low current ratings. Better RAID cards are usually careful during boot to stagger drive spin-up, often by a some seconds at least. |
Same issue too with SATA disks behind a LSI Logic SAS2008, running CentOS 7.1, when disks are in stand-by mode. uname -aLinux prolib 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux modinfo zfsfilename: /lib/modules/3.10.0-514.6.1.el7.x86_64/extra/zfs.ko dmesg109797.582453] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK zpool eventsFeb 16 2017 21:23:05.232852408 ereport.fs.zfs.io zpool events -vFeb 16 2017 21:23:05.265852655 ereport.fs.zfs.io |
I'm having this issue too, so there is no bad consequences despite the terrifying IO error mail? |
Is there any further news on this issue? I replaced one of the drives in my pool and now have been getting roughly hourly emails from root complaining of an IO error. Anyway to quiet this error down, if it hasn't been fixed yet, but still receive actual errors? |
I am also really curious if someone figured out what is going on with this. I don't even use ZFS and am experiencing the same errors (got to this thread via google) I assume it is kernel/ dependent. I just recently bought several LSI SAS2008 controllers and was running CentOS 7.3 and was seeing these errors in my logs every so often. Finally narrowed it down to it being any SATA drive that was in sleep mode that was being accessed and while it was waking the errors would occur. Switched my (CentOS 7.3) boot drive out and went to CentOS 6.8 and the errors never occur. These particular errors seem to go back years, and it looks like no progress has been made to fix this issue? I did find one thread where people with errors switched the firmware on their LSI SAS2008 controllers back down to P19 and were OK. (I am on 20) I will probably try that next. |
I tried P19 but had the same results, I also tried CentOS 6.9 and do not have the issue. The only difference I can see at this point is Kernel 2.X vs 3.x/4.x, I also tried on CentOS 7.3 to install Kernel 4.x and the problem was still there. So I would assume those of you with this issue with ZFS would NOT see this problem on a 2.x based kernel Linux Distro using the LSI SAS2008 with SATA drives. Again my issue is that ANY SATA drive connected to the LSI SAS2008 (running P19 or P20) when the drive is asleep, access to the drive through the OS would create /var/log/messages errors similar to those listed above: [ 3647.748383] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK I also noticed on some of my drives that spin up a tad quicker, sometimes the error does not get generated. |
One other thing I checked was the version of mpt2sas. (modinfo mpt2sas) On CentOS 6.9 where I am unable to reproduce the issue it is version 20.102.00.00, and on CentOS 7.3, where I am able to reproduce this issue, it is the same version 20.102.00.00. Only difference at this point that I can see is kernel 2.x (where it does not occur) and 3.x/4.x where it occurs. |
I don't have any hardware that 2.x runs on, but I have the same with comparable mtp2sas versions. Doesn't seem to matter what version I use. |
John, what HBA(s) model are you using? I assume SATA drives on a SAS/SATA controller? |
Mostly SAS2008 controllers, all with SATA drives. I have one with SAS drives, and that one is not giving any trouble. |
John, do you know what firmware level you are running with your HBA's? And I am not sure if you all know, but if you disable drive spin down, the errors never occur for me on problematic 3.x/4.x kernel, so I assume the same would hold true for ZFS. Not ideal, and a waste of money if you have a large # of disks (reason I went back to 2.x kernel) Most of the time when people report this issue with SAS2008 based HBA's and SATA drives, everyone assumes cable issues or power supply issues. I wonder if that is why no one seems to get to the bottom of this combination issue. |
I solved it in a few ways:
All of them work. But the issue remains: ZFS should not be sad because a disk went in standby. I'm mostly running this combination: LSISAS2308: FWVersion(15.00.00.00), ChipRevision(0x05), BiosVersion(07.29.00.00) |
So basically your just keeping your drives from falling asleep, to avoid the errors. One thing I have seen from numerous threads out there, is that no one has reported issues (any corruption) from this at least on a JBOD level. Just the annoying fact that they are always there if you like to save power/noise/heat. Not sure how it would impact ZFS since it is outside its control. It also seems like it is specific to LSI, I have seen multiple types of HBA's that are all LSI based where people report this. I increased verbosity on SCSI logging on one box I have the 3.x kernel on where it errors out when the drive is accessed, and another where I have the 2.x kernel, where it works fine. In both cases they are drives on a LSI 9211-8i controller with P19 firmware running in IT mode, the drive was put into standby, then I just refreshed the directory to force it to read from the drive. BOTH kernels are using the 20.102.00.00 mpt2sas module. 3.x kernel: (3.10.0-514.26.2.el7.x86_64) 2.x kernel: (2.6.32-696.3.2.el6.x86_64) |
Hey guys, I am not sure about the comment and milestone from @behlendorf at 31 May 2016: Is this issue solved with 0.7.0? When will version 0.7.0 be released for debian jessie? I am using openmediavault3 (debian jessie) with the following zol version:
Greetings Hoppel |
Hey, I'm also still having the same problems like @hoppel118. @behlendorf: Is there any progress for the solution mentioned in #4713 (comment) ? The errors in the kernel log and the constant CHECKSUM errors I'm getting are rather frightening. |
Just wanted to leave another note regarding this. As I said previously this issue does NOT occur with CentOS 6.9 (which uses kernel 2.x) So I assume if you can use ZFS under this environment, this issue would not occur. I can easily reproduce it on CentOS 7.x with Kernel 3.x and 4.x This weekend I took one of my CentOS 6.9 servers and swapped the boot drive and installed Ubuntu 16.04.3 LTS which had Kernel version 4.4.0-87-generic Version of mpt2sas under this verion of Ubuntu was: 12.100.00.00 As soon as my jbod disks go to sleep and I access them, the issue occurs still while they are waking up: Sep 2 16:14:43 misc02 kernel: [ 713.411876] sd 9:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK I feel very confident something was introduced at some point in the Kernel 3.x series that is still present in 4.x causing this issue (that also manifests itself with those of you ZFS users) |
This was added to the 0.8 milestone so we don't loose track of it. @brianmduncan my guess is this was likely introduced in the 3.9 kernel with the CONFIG_PM_RUNTIME option. |
Just wanted to chime in I'm affected by this problem too (and am not a ZFS user either). My kernel (4.13.4-1-ARCH) definitely has both, CONFIG_PM and "[SCSI] Fix 'Device not ready' issue on mpt2sas". I'm going to try and debug my kernel in the next few days. |
Here's the first debug log I was able to produce: https://pastebin.com/Kg0vdCf7 |
Just wanted to let you guys know that I think I've come across a temporary fix until I've figured this out completely. You need to compile your own kernel, but change
This constant is the absolute minimum lifetime a SCSI command gets to do its thing and the default setting of 7 seconds is just too short for most hard drives to spin up. I've now set it to The downside of this is that you might get an unresponsive/laggy system in case of a broken drive that cannot complete any commands anymore. However, since this constant was last changed in git in 2007 (when SGv3 was added for Linux 2.4 and 2.5), I doubt that this is actually the source of this problem. I'm going to keep looking some more. |
Great find!!! If I get a chance this weekend I will restore one of my Linux boxes back to CentOS 7.x and try to compile a custom kernel with the adjustment. I hope you can find the original source. And this can be reported to be patched in the kernels moving forward. This particular issue has been around for years, and I found it odd that no one has squashed this one. Usually people say it is a hardware issue and leave it at that.. I was starting to believe that I would need to find an alternative to my LSI SAS2008 controllers when I needed to move forward to any distro using kernels above 3.x |
@red-scorp I asked because I primary use zfs as NAS storage and when bad disks not mark as "bad" quick enough it lead to degradation of virtual machines that places on it. So setting high timeout it's not quit good idea. and as I wrote perviously, Solaris on the same hardware doesn't have any problem with it. |
@d-helios same usage for me and I use ubuntu 18.04. It doe not drop disks from my array but it gives I/O errors periodically. My solution was to use another controller and forget about the built-in SAS3008 chip. |
Quick update from me, trying out different fix, disk APM settings using hdparm -B flag. Around same time of upgrading to 4.x kernel system (OMV4) I also added a startup script to configure the disks such as set NCQ, idle spin down time (60 min) etc on the disks. |
@chinesestunna, can you post your log please.
Did you try to enable debug ?
#define MPT_DEBUG <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG> 0x00000001
#define MPT_DEBUG_MSG_FRAME <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_MSG_FRAME> 0x00000002
#define MPT_DEBUG_SG <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_SG> 0x00000004
#define MPT_DEBUG_EVENTS <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_EVENTS> 0x00000008
#define MPT_DEBUG_EVENT_WORK_TASK <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_EVENT_WORK_TASK> 0x00000010
#define MPT_DEBUG_INIT <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_INIT> 0x00000020
#define MPT_DEBUG_EXIT <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_EXIT> 0x00000040
#define MPT_DEBUG_FAIL <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_FAIL> 0x00000080
#define MPT_DEBUG_TM <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_TM> 0x00000100
#define MPT_DEBUG_REPLY <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_REPLY> 0x00000200
#define MPT_DEBUG_HANDSHAKE <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_HANDSHAKE> 0x00000400
#define MPT_DEBUG_CONFIG <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_CONFIG> 0x00000800
#define MPT_DEBUG_DL <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_DL> 0x00001000
#define MPT_DEBUG_RESET <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_RESET> 0x00002000
#define MPT_DEBUG_SCSI <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_SCSI> 0x00004000
#define MPT_DEBUG_IOCTL <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_IOCTL> 0x00008000
#define MPT_DEBUG_SAS <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_SAS> 0x00020000
#define MPT_DEBUG_TRANSPORT <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_TRANSPORT> 0x00040000
#define MPT_DEBUG_TASK_SET_FULL <https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_TASK_SET_FULL> 0x00080000
… On 29 Aug 2018, at 19:15, chinesestunna ***@***.***> wrote:
Quick update from me, trying out different fix, disk APM settings using hdparm -B flag. Around same time of upgrading to 4.x kernel system (OMV4) I also added a startup script to configure the disks such as set NCQ, idle spin down time (60 min) etc on the disks.
Among these commands is disk APM, reading the man pages and guides of using hdparm to set APM settings, I believe I misunderstood as most of them state APM values > 127 will not allow spin down. I of course wanted my disks to spin down so I've set APM to 127, rather low from a performance standpoint. One anecdotal observation I have is that since upgrade array spin up from sleep seems "slow" and of course gets stuck with disk i/o are stuck.
Anyways I think the low APM setting has an impact on how long it take a disk to spin up and respond to commands. I've tested higher APM settings and if a disk will still go to sleep and now am running at 254 which is in theory highest before APM is turned off. Disks still go to sleep after 60 min of idle which is good but I will report back if the issue continues
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#4713 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AP2L3njHFmLlpIMKpAy9l4DerQbn1WB1ks5uVr41gaJpZM4IpYmi>.
|
@d-helios I have not tried enabling debug, I'll capture the syslog next time something seems wrong or drive drop for analysis. Generally it's similar to what others have posted here, except I have an expander that gets reset when it seems the disks take too long to wake up |
server VM has been running since Wednesday, so far the logs still show sprinkle of i/o read errors and one expander reset. Seems like same issues as before so it'll be a matter of time before a disk doesn't "respond" quickly enough and drops: |
Replicating this in 4.14 with mpt2sas too. Using 0.8.1 (latest revision stable). |
Seeing this in 5.3 with mpt3sas version 29.100, SAS2008 with firmware P20, ZoL 0.8.3. |
Interestingly enough, I don't see it any more on 5.2 |
This issue still seems to happen on random read/write stuff. I got a 9207-8i with P20 firmware, Ubuntu 19.10, 5.3.0-40-generic. I bought the HBA to try to fix an error I was getting with the motherboard onboard SATA: ASRock X570 PRO4 I have a suspicion it's the hard drives somehow. It's very easy to get the error to happen by using either fio or rsyncing a large directory. I've attached error logs for both the built in SATA as well as the SAS HBA... About to try rebooting with mpt3sas.msix_disable=1 after the thousandth scrub of this pool... Honestly, I've had pretty terrible experiences with zfs so far. It really doesn't like USB hard drive pools either, I think the second log has a couple instances of one of those crashing as well. (I have a 3-way mirror external HD pool I was using to hold files while transferring data) Maybe that's just the USB drive sucks. It's always the same one that seems to drop out, sooo.. |
FWIW, I've not had the issue anymore with mpt3sas 33.100, kernel >=5.4.0-31 (Ubuntu 20.04), on current ZoL master. |
I'm reliably reproducing a similar issue. Easy way for me to repro:
Steps taken so far for troubleshooting:
Errors are consistently repeatable across all above configurations and occur ~7-9 seconds after a request is sent to a sleeping device. This happens regardless of any of the above changes being made. Here's an example of the errors received whenever a read request hits a sleeping drive:
Multipath should not be the cause here as the errors occur even without multipath. If multipath is in use, it will immediately drop the associated link when either error occurs. All relevant multipath timeouts (dev_loss_tmo , checker_timeout) are set to 30. This happens regardless of path_selector mode. Sometimes no paths drop, sometimes one, sometimes both. Below is an example of multipath in use and both paths dropping when a data request attempts to wake a drive:
The 4246's also throw a
(note: I had changed polling_interval to 2 for this example. I was 10 for the earlier examples. This is why the path came back after 2 / 10 secs respectively). So far I'm stumped on this one... |
Your drive is aborting the command and returning ASC/ASCQ 0x98/0x6 or 0x98/0x1. These do not seem to be registered with T10 at https://www.t10.org/lists/asc-num.htm |
That may be the Netapp 4486 enclosure/sleds issuing those aborts. It has dual drive sleds which have a pair of SATA (He12) behind a Marvell 88SF9210, which bridges to SAS multipath. The other enclosure I tried was a Netapp 4246, which shouldn't have as much 'in the way', but I didn't do read tests to trigger the errors in that config (only experienced the resets at the same 7-8 seconds after requests, in that case smartctl -x). I'll try to trigger some read timeouts and see what happens there. |
I am still on Debian Stretch. I have to update my os to buster and check if the problem is solved with the latest versions. That will take „some“ time. ;) Regards Hoppel |
It's not solved. Just happened to me last night. |
Hm.... Bummer... |
(This is a copy&paste from my comment in issue #4638 just in case someone finds this issue through a search engine, looking for a workaround) I got hit by this problem as well, running ZFS 0.8.4 on Linux 5.4 (Arch Linux LTS-kernel) with eight 14 TB SATA disks in RAIDZ2 behind an LSI 2308 controller flashed to IT mode. Whenever I turn on hd-idle and let it spin down the disks (they sit idle 20h per day), ZFS will complain loudly in kernel log during wakeup. After a couple of days of testing many read and, most worringly, also write and even checksum errors occurred (zpool status). Scrub could correct all problems, but this needed to be fixed asap. I solved the problem by doing away with the LSI and buying a JMicron JMB585 5-port SATA controller card instead. These chips exist since about 2018 so relatively new. No extra driver is needed, the card will run with any even remotely recent stock AHCI driver. Since the switch no more errors have occurred at all, even though I aggressively put disks into standby when not in use. As far as I can see the card also has no PCIe bottleneck, because it can use PCIe 3.0 with two lanes, supposedly reaching 1700 MByte/s transfer rates. Should be good enough for 5 modern HDDs. There are mostly chinese no-names out there, US$ 30-40 in 2020, I recommend getting a card with largish black heatsink though, to preclude thermal issues. There appear to be no electrolytic capacitors on these cards, so these might even be very long term stable (10+years). |
I'm having the same problem with the onboard SAS3008 of my Supermicro X11SSL-CF motherboard. I'm on Debian Buster with zfs 0.8.6-1~bpo10+1 from buster-backports. One thing I plan to do is flash the SAS3008 to P16-V17 firmware which is the latest posted by Supermicro.
|
@RichieB2B the problems disappeared for me on 5.8 or newer kernels from backports. I suggest giving them a try. |
Thanks @kobuki I upgraded the firmware to P16 and Linux kernel to
|
Based on positive result to what I have been seeing in this thread I
decided to give it a try.
Yes the problem is gone from me running Ubuntu with kernel
5.9.0-050900-generic.
But none of my drives go to sleep now on my LSI 9211 controller.
So at least for me it is no different than running a different distro with
an older kernel version if I just keep my drives awake I never see those
errors.
Now even though I set my drives to fall asleep, they never seem to. So of
course none of these errors.
At least that was my experience.
…On Thu, Feb 11, 2021 at 3:13 PM Richie B2B ***@***.***> wrote:
Thanks @kobuki <https://github.com/kobuki> I upgraded the firmware to P16
and Linux kernel to 5.9.0-0.bpo.5-amd64 from buster-backports and I have
not seen the errors since.
# sas3ircu 0 DISPLAY
Avago Technologies SAS3 IR Configuration Utility.
Version 17.00.00.00 (2018.04.02)
Copyright (c) 2009-2018 Avago Technologies. All rights reserved.
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller type : SAS3008
BIOS version : 8.37.00.00
Firmware version : 16.00.10.00
Channel description : 1 Serial Attached SCSI
# modinfo mpt3sas
filename: /lib/modules/5.9.0-0.bpo.5-amd64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko
alias: mpt2sas
version: 34.100.00.00
license: GPL
description: LSI MPT Fusion SAS 3.0 Device Driver
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4713 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABD2A5V4BQSRBRVKLPRI5IDS6RB7JANCNFSM4CFFRGRA>
.
|
Had good results running 5.11.8, so far much better than 5.4.86. Running P19 on the LSI SAS 9201-16i & SAS 9211-8i controllers. Thanks for the updates @kobuki @RichieB2B and @brianmduncan. |
@jonathan-molyneux : Scrubbing was not a problem. Waking up of derives was always an issue. Let your drives sleep and see whether they waking up good or not. |
Maybe not related with your problems ....but if you are using Supermicro servers + LSI 3008 + sas3ircu, please check your backplane (BPN-SAS3-216EL1) firmware: |
I have the same issue on FreeBSD for years. Fixable most times by disabling hard drive APM (advanced power management) and/or EPC (extended power conditions) options. A really good How-to disable/modify drive power management can be found here: https://serverfault.com/questions/1047331/how-do-i-disable-hard-disk-spin-down-or-head-parking-in-freebsd But still. disabling apm/epc doesn't count as valid solution. |
I'm seeing this issue on RHEL 8.5 4.18.0-348.7.1.el8_5 with LSI 9300-8e HBAs and SuperMicro 846 shelves. Heavy IO to the drives will slowly increase the read/write errors in zpool status until ZFS steps in and starts resilvering. For the amount of drives I have in RAIDZ2 resilvering daily isn't really feasible because it takes longer than a day to resilver. I need to update my 9300-8e's (SAS 3008) firmware but haven't been able to yet. |
Installing linux-generic-hwe-20.04 (upgrades the kernel from 5.4.x to 5.11.x) on my Ubuntu 20.04.3 fixed the issue (I think, haven't seen an error after resilver and scrub). I still have no idea what exactly changed in these versions that caused this to magically fix itself. I was actually getting errors in I've tried swapping the SAS2SATA cable (but kept the position of the drive on the same cable #, which might be why). And I'm also using an LSI card, which might've been doing something funky with the driver in older kernels. Either way, the problem is hopefully solved, I'm just curious as to why. |
Whenever one or more disks in one of my pools is sleeping because it was idle ZFS (via ZED) spams me with IO errors (via email, because that's how I set it up).
It's always this kind of error with only the vpath, vguid and eid changing:
dmesg shows:
Scrubbing gives no 'fixes', as there is no data corruption or pools that get unhappy, just a few errors. As far as I can see, either ZFS isn't waiting long enough for the disks to spin up (they actually spin up on access), or tries some command before checking the disk is ready for it.
The pool status:
I can disable spindown/standby, but not all pools are always in use, some are only archives. I enabled standby timeouts before the IO errors came along, so to me, it sounds like ZFS or ZoL doesn't deal with spindown or standby very well?
Additional data:
Using version 0.6.5.6-2 and with Linux 4.5.0-2-amd64.
The text was updated successfully, but these errors were encountered: