You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a test machine running Ubuntu 14.04LTS, using the zfs-native-stable PPA (0.6.3-2~trusty). It's used to test zfs stuff, error handling, stability etc and the flakey old 750G Seagate drives I found in a corner are rather perfectly suited for this ;)
This issue is related to the other disk-goes-bad-lockup issues, but I think this is a different bug.
I have a raidz1 test pool, so naturally things goes bad when I hit double failures (these drives can be relied upon being unreliable).
However, this time it was resilvering when I hit what I suspect was a double-fault. The kernel log says:
[138671.416334] SPLError: 1850:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'test750G' has encountered an uncorrectable I/O failure and has been suspended.
Which is perfectly okay, I expect that to happen in a single-parity raid when I loose too much redundancy.
However, zpool status says:
pool: test750G
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Jun 25 09:04:26 2014
1.57T scanned out of 5.35T at 65.5M/s, 16h49m to go
199G resilvered, 29.32% done
config:
NAME STATE READ WRITE CKSUM
test750G DEGRADED 57 0 0
raidz1-0 DEGRADED 233 0 0
slot9 ONLINE 0 0 0 (resilvering)
slot10 ONLINE 24 3 0
slot11 ONLINE 0 0 0 (resilvering)
slot12 ONLINE 0 0 0 (resilvering)
slot13 ONLINE 0 0 0 (resilvering)
slot14 ONLINE 0 0 0
slot15 ONLINE 0 0 0 (resilvering)
replacing-7 FAULTED 0 0 0
old FAULTED 48 398 0 too many errors
slot16 ONLINE 0 0 0 (resilvering)
errors: 31 data errors, use '-v' for a list
If I remember correctly, DEGRADED is not a suspended state.
I don't really know if this is a side effect of something deadlocked somewhere. I have a lot of task blocked for more than 120 seconds logged after the pool suspended message, but they might as well be a side effect of the pool being suspended:
@ZNikke We have a similar issue open but I don't mind keeping this open as a duplicate for now. It would be helpful if you could post all the console stacks somewhere for analysis.
All stack traces are logged after the SPLError and are identical to the one I posted here, the only difference is the timestamp. I think it's the kernel that triggers on 120s-hung-io and then dumps a stacktrace as part of the error message, ish.
Everything logged before the SPLError is scsi noise about the HDD behaving badly.
It's like the pool got suspended, but something (the resilver perhaps?) wants to sync state or something.
We have a test machine running Ubuntu 14.04LTS, using the zfs-native-stable PPA (0.6.3-2~trusty). It's used to test zfs stuff, error handling, stability etc and the flakey old 750G Seagate drives I found in a corner are rather perfectly suited for this ;)
This issue is related to the other disk-goes-bad-lockup issues, but I think this is a different bug.
I have a raidz1 test pool, so naturally things goes bad when I hit double failures (these drives can be relied upon being unreliable).
However, this time it was resilvering when I hit what I suspect was a double-fault. The kernel log says:
Which is perfectly okay, I expect that to happen in a single-parity raid when I loose too much redundancy.
However, zpool status says:
If I remember correctly, DEGRADED is not a suspended state.
I don't really know if this is a side effect of something deadlocked somewhere. I have a lot of task blocked for more than 120 seconds logged after the pool suspended message, but they might as well be a side effect of the pool being suspended:
If this is a duplicate of a known issue, I'm sorry for the noise.
However, if this is a new issue, let me know if I can provide more information for you to be able to pinpoint it.
The text was updated successfully, but these errors were encountered: