Odd MD pseudo-crash related to ZFS memory issue? #1619

cousins · 2013-07-30T20:48:10Z

Just checking to see if anybody else has had anything similar to this happen:

I have a large (60 4TB disks in 6 groups of 10 disk raidz2's) zfs pool that I have been rsyncing data to from a couple of other systems. While on vacation and checking in (I know, always a bad idea) I noticed one of the rsyncs had hung but the others were still going. While investigating I found that certain commands would give me "I/O error"s. Then I found that the mirrored OS volume was degraded but with /dev/sda2 having been thrown out. sda1 and sda3 were still active in their mirrors though. /usr/bin and /usr/sbin showed I/O errors and my vacation wasn't much fun for a while.

I eventually booted from a DVD and poked around. The md devices were fine. The underlying hardware was fine. The file system was fine. I booted into the OS (Centos 6.4) again and everything is fine again. I didn't have to add /dev/sda2 back into the mirror and have it sync. It was just fine.

My guess (along with a tech-support person from the vendor we bought the hardware from) is that ZFS somehow tromped on memory, and put the root volume in a very weird state. Looking at the logs, I don't see any entries since the 19th which.

Has anyone seen anything similar to this?

Thanks,

Steve

tomposmiko · 2013-07-30T22:05:50Z

On 07/30/2013 10:48 PM, cousins wrote:

Just checking to see if anybody else has had anything similar to this happen:

I have a large (60 4TB disks in 6 groups of 10 disk raidz2's) zfs pool that I have been rsyncing
data to from a couple of other systems. While on vacation and checking in (I know, always a bad
idea) I noticed one of the rsyncs had hung but the others were still going. While investigating I
found that certain commands would give me "I/O error"s. Then I found that the mirrored OS volume was
degraded but with /dev/sda2 having been thrown out. sda1 and sda3 were still active in their mirrors
though. /usr/bin and /usr/sbin showed I/O errors and my vacation wasn't much fun for a while.

I eventually booted from a DVD and poked around. The md devices were fine. The underlying hardware
was fine. The file system was fine. I booted into the OS (Centos 6.4) again and everything is fine
again. I didn't have to add /dev/sda2 back into the mirror and have it sync. It was just fine.

My guess (along with a tech-support person from the vendor we bought the hardware from) is that ZFS
somehow tromped on memory, and put the root volume in a very weird state. Looking at the logs, I
don't see any entries since the 19th which.

Why do you suspect zfs?

Has anyone seen anything similar to this?

I saw similar issues many times with md raid.

tamas

cousins · 2013-07-30T22:20:31Z

Hi Tamas,

We've been having a fair amount of trouble with ZFSonLinux that make me a bit gun-shy: #1179 and openzfs/spl#247.

I've used MD for over 10 years on many systems and I've never seen this behavior before.

I admit that I have no proof that ZFS had anything to do with this. That is why I'm asking if anyone has seen anything like this. Just trying to get more information.

Steve

tomposmiko · 2013-07-30T22:50:06Z

I saw similar (not exactly the same) issues, when there was crappy HDD (check smart), or bad SATA/power connector, crappy HBA or its driver.
When I rebooted that machines, everything worked fine for a couple of hours/days, sometimes weeks.

In similar case with HW raid I saw timeouts both in linux and controller logs.

In such a case disabling HDD cache can help (if it's a HW raid array).

behlendorf · 2013-11-14T22:35:47Z

Even if this was caused by ZFS without additional information to go on there's not much which can be done.

behlendorf closed this as completed Nov 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd MD pseudo-crash related to ZFS memory issue? #1619

Odd MD pseudo-crash related to ZFS memory issue? #1619

cousins commented Jul 30, 2013

tomposmiko commented Jul 30, 2013

cousins commented Jul 30, 2013

tomposmiko commented Jul 30, 2013

behlendorf commented Nov 14, 2013

Odd MD pseudo-crash related to ZFS memory issue? #1619

Odd MD pseudo-crash related to ZFS memory issue? #1619

Comments

cousins commented Jul 30, 2013

tomposmiko commented Jul 30, 2013

cousins commented Jul 30, 2013

tomposmiko commented Jul 30, 2013

behlendorf commented Nov 14, 2013