-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checksum errors may not be counted #11545
Comments
I like the idea of having operations in (2) and (3) clear out the duplicate tracking cache so that new errors can be detected and counted. |
Both (2) and (3) make sense to me. As for (1) I agree with @don-brady that we'll need to make sure that reporting all duplicates won't make the FMA / ZED code trigger happy. This is a little out of scope, but @ahrens use of the term "recent" errors reminded me of this. It's come up a couple of times that the documentation isn't clear that the READ, WRITE, and CKSUM errors are in no way persistent. That is they're lost after an export/import. This is the long standing behavior, but I can see how users could find it unexpected . It'd be nice to update the documentation along with this change to make it clear only "recent errors" are reported and the counters are not persistent. |
Fix regression seen in issue #11545 where checksum errors where not being counted or showing up in a zpool event. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #11609
Fix regression seen in issue openzfs#11545 where checksum errors where not being counted or showing up in a zpool event. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes openzfs#11609
Fix regression seen in issue openzfs#11545 where checksum errors where not being counted or showing up in a zpool event. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes openzfs#11609
System information
Describe the problem you're observing
If a block is damaged after being repaired once, when it is repaired for the second time, the checksum error is not reported. This causes confusion (e.g. while testing) because there is no visibility into the checksum errors that are being detected (and potentially corrected).
This is a change in behavior caused by #10861. I understand the desire to limit the rate of event generation since we keep so few of them. However:
vs_checksum_errors
)- it doesn't cost anything to count to a large number.zpool scrub
) or errors are discarded (zpool clear
), it would be reasonable to report the error again (even to generate another event).I'd suggest that we make at least one (and perhaps all) of the following changes:
zpool clear
is runDescribe how to reproduce the problem
zpool create ... raidz ...
silently damage one disk (
dd of=/dev/dsk/...
)zpool scrub
Scrub reports that it repaired some space, and vdev reports some checksum errors:
silently damage one disk AGAIN (
dd of=/dev/dsk/...
)zpool scrub
AGAINScrub reports that it repaired some space, BUT vdev reports no checksum errors:
Include any warning/errors/backtraces from the system logs
@don-brady @behlendorf
The text was updated successfully, but these errors were encountered: