-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.6.5.6 - I/O timeout during disk spin up #4638
Comments
Hey guys, I also see this error for my pool. I also only see this error in combination with my zfs hdds in the syslog. It always happens, when my hdds have to wake up after a spindown (127). There are no errors in the hdd smart informations.
My hardware specs: Mainboard:Supermicro X11SSH-CTF My HBA is pci passed through to the kvm. The mpt3sas modules are blacklisted on the host system. Host-OS - "Proxmox":
Guest-OS (KVM) - Openmediavault3.0.41:
As you can see, I also use the proxmox kernel in the kvm. I use the following zfs packages which have a dependency to the openmediavault-plugin "openmediavault-zfs":
If you need other informations, please tell me what you need. Thanks and greetings Hoppel |
After disabling spindown and rebooting the kvm I don't see this messages anymore. But I want to spindown my hdds. |
OK, I tried another thing. I use 8x4TB WD Red hdds behind my lsi sas3008 controller. I read that there is a tool to deactivate the automatic spindown in the hdds firmware. So I downloaded the "Idle3-tools" to my openmediavault (debian jessie) kvm. The default value for my disks was:
So I decided to deactivate the default spindown with the following command for all 8 disks:
I power cycled the server completely, started again and had a look at the result with the following command:
At this stage of configuration I don't have any issues/errors in the syslog while opening a samba-share with a zfs file system as a basis. After that I configured my "/etc/hdparm.conf" with the openmediavault webui the following way:
This way openmediavault has the privileges to spindown the disk after 20 minutes. Now I see the following on the command line:
20 minutes later I see the following on the command line:
So we see that the spindown controlled by openmediavault works fine. Now I opened a file from one of my samba-shares with zfs as a file system. I can see the disks spinning up with hdparm and I see the following messages in the logfile again: complete syslog: http://pastebin.com/9A300u3R
So that didn't help at all and i brought it back to default values:
What do you think about this? A last check can be to clone the kvm to bare metal and to check the whole thing again. Maybe it has something to do with kvm or with pci passthrough. But for this I need some time. Greetings Hoppel |
These issues describe the same thing: #4713 Greetings Hoppel |
I also encountered this issues. |
There might be problem between zfs and "mpt3sas" driver. Is it possible for you to reduce your spin up time in your controller bios? Maybe it's possible for you to stagger spin up two or three disks at a time. This should be possible if your controller is flashed to "it mode" and if your psu is powerful enough. For me it's not possible to check this at the moment, because I use a beta firmware from supermicro, where the option to spin up some disks at a time is not available. How many disks do you use behind your sas2008 controller for zfs? How long do you have to wait until all disks got up? Greetings Hoppel |
can't test bios, server should be rebooted, |
Same problem on Z87 Extreme11/ac -> 22 x SATA3 (16 x SAS3 12.0 Gb/s + 6 x SATA3 6.0 Gb/s) from LSI SAS 3008 Controller+ 3X24R Expander OS: Ubuntu 18.04 dev
ZFS hangs on spin-up of SATA HDDs. So I assume It's a problem between LSI controller driver and ZFS. I'll try BIOS updates, let's see if it fix the problem UPDATE: I've updated MB BIOS and flash SAS Contoller to IT mode with a newest available FW from 9300 card. This did not help with the disk spin-up problem. funny enough it's not only the ZFS freezes but hddtemp and smartctl too. This issue might be related not to ZFS but to misbehavior of mpt3sas itself. Please let me know if you found any solution or workarounds to the freezing disk on spin-up? |
I have the same issue with SAS drives. my configuration:
kernel parameters:
Notice:
|
@d-helios one ugly hack to paper over the issue is to tweak
(tweak as appropriate for you). You probably do not need the modprobe as the module will normally be loaded by this time, but for testing (systemctl stop/start w/ rmmod) it is needed. |
Would it make sense to "enhance" What would me the 'right thing' to do when we reach the end of a timeout and it's possible to import and pool in a degraded manner? |
This is manifesting with a X10SDV based system (mp2sas controller built-in) for me. Testing a kernel with no CONFIG_PM but thus far messing with the controller BIOS setting might be the only way. |
@red-scorp I've been troubleshooting a very similar issue over here. Did you ever resolve yours? If so, how? Thanks in advance for any tips/info... |
Nope, I did switch to another controller. Unfortunately, this bug is not fixed till now. I do use this controller for linux md-raid. There it works also with some disk sleep issues. Basically I had to disable sleep on disks all together. |
Could you give me some more detail on your expander setup? What is the actual hardware / enclosure in use? Are you using multipath? Did it happen on the SAS drives / SATA / both? My issues might be pointing in the direction of the SAS-SATA multipath bridges in this repurposed Netapp gear I'm using. |
As you know, the Z87 Extreme11/ac does use LSI SAS 3008 Controller+ 3X24R Expander. The board itself uses non usual in this case SATA connectors for this setup. I had to use cable converting SATA to SFF8087 and very primitive Chinese Rack-mounted RAID case which as I know does only routing of signals from SFF-connector to the drives itself. I pretty sure the case and it's backplanes are really simple though you may assume the drives are connected directly to the SATA ports on MB. I also use only SATA drives in this setup. All of them are 3TB and some of them are about a 8+ years old, some were new at the time of experimenting. This a naturally growing system which holds my personal data since ages. Now I use https://www.amazon.de/dp/B00YHE2IPU/ref=pe_3044161_185740101_TE_item (SATA-card) and https://www.amazon.de/gp/product/B0050SLTPC (SAS-card) to communicate to 24 3TB HDDs with ZFS without any issues. The mentioned LSI SAS setup is attach linux MD-RAID10 with 16 cheap SSDs w/ disabled sleep. As SW I use Ubuntu 18.04 LTE at the moment. At the moment of my writing above it was one of earlier releases. |
Same here :( Proxmox 6.2 (based on Debian 10), Kernel 5.4.44, ZoL 0.8.4 During disk spin up I get this:
"zpool status" shows errors after this I am using SAS disks on a LSI/Broadcom SAS3416 controller. |
I am still interested in a solution. Regards Hoppel |
In case someone finds this issue through a search engine, looking for a workaround: I got hit by this problem as well, running ZFS 0.8.4 on Linux 5.4 (Arch Linux LTS-kernel) with eight 14 TB SATA disks in RAIDZ2 behind an LSI 2308 controller flashed to IT mode. Whenever I turn on hd-idle and let it spin down the disks (they sit idle 20h per day), ZFS will complain loudly in kernel log during wakeup. After a couple of days of testing many read and, most worringly, also write and even checksum errors occurred (zpool status). Scrub could correct all problems, but this needed to be fixed asap. I solved the problem by doing away with the LSI and buying a JMicron JMB585 5-port SATA controller card instead. These chips exist since about 2018 so relatively new. No extra driver is needed, the card will run with any even remotely recent stock AHCI driver. Since the switch no more errors have occurred at all, even though I aggressively put disks into standby when not in use. As far as I can see the card also has no PCIe bottleneck, because it can use PCIe 3.0 with two lanes, supposedly reaching 1700 MByte/s transfer rates. Should be good enough for 5 modern HDDs. There are mostly chinese no-names out there, US$ 30-40 in 2020, I recommend getting a card with largish black heatsink though, to preclude thermal issues. There appear to be no electrolytic capacitors on these cards, so these might even be very long term stable (10+years). |
see #3856 for further details. Bug is still present:
i'm still experiencing this bug with 0.6.5.6 on Ubuntu 16.04. with same chipset (SAS2008).
This is what i did:
created a zpool on the device and sent it to sleep (hdparm -y). Then started writing a file to it. Result was:
[59526.359997] sd 0:0:1:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59526.360003] sd 0:0:1:0: [sda] CDB:
[59526.360006] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59526.360022] blk_update_request: I/O error, dev sda, sector 824769380
i created a ext3 filesystem on the device and sent it to sleep. Then started the fily copy again. Result: No messages in dmesg.
I also checked the original file with the copied one - they are identical. So this bug has to do with ZFS and is not closed.Any Ideas? Need further Information @behlendorf ?
The text was updated successfully, but these errors were encountered: