Handle block pointers with a corrupt logical size #2685

behlendorf · 2014-09-10T19:20:54Z

The general strategy used by ZFS to verify that blocks are is to
checksum everything. This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block. If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum. However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case. If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf [email protected]
Issue #2678

behlendorf · 2014-09-11T18:19:00Z

@ryao @dweeezil can I get your thoughts on this.

dweeezil · 2014-09-11T20:39:36Z

@behlendorf I think it's a rather clever idea. How about doing the same thing if the lsize isn't a power of 2? There are likely other places throughout the code where we could do similar sanity testing.

dweeezil · 2014-09-11T20:48:30Z

@behlendorf EDIT: ... do the same thing if lsize isn't a multiple of 512.

EDIT2: I guess that's not likely to happen unless another memory error occurred after loading it.

dweeezil · 2014-09-11T21:45:22Z

@behlendorf I made a pool with a bunch bogus-lsize'd blkptrs to see what would happen with this patch. In addition to hitting the new ECKSUM error in arc_read(), it also hit the ASSERT at the bottom of dbuf_read() because the dbuf's state was still DB_READ. Clearly we can't handle all types of damage bug I'll look into it a bit further to see whether there's a way to recover in this case.

Regarding my initial comments about additional size checking, I had overlooked the fact that "size" had just been computed as by shifting the on-disk value.

behlendorf · 2014-09-12T00:03:26Z

Clearly we can't handle all types of damage

Thanks for the quick feedback. I completely agree. And I wouldn't want to go overboard with this sort of thing. But I think there are a few key places where we can handle things more gracefully. There's no good reason this particular damage needs to crash your system.

ASSERT at the bottom of dbuf_read()

Good catch, I think we should be able to handle this just by doing a better job of simulating the checksum failure. Calling the passed back with a dummy zio should kick in the existing error handling here. Let me propose an update patch. If you could try it on your damaged pool that would be appreciated.

additional size checking

Yeah, unless you look carefully at the code it is not obvious that this is a computed value based on SPA_MINBLOCKSHIFT. We're also going to have to be careful to revisit this code when the large block support is merged.

prakashsurya · 2014-09-13T04:07:07Z

How do you guys feel about something like this: https://gist.github.com/anonymous/f19e25b4ad1108082ad6

dweeezil · 2014-09-13T14:03:46Z

I like the concept and it's exactly the type of additional sanity checking I had in mind. As an assist to non-ECC systems, it might be a good idea to call zfs_blkptr_verify in the write path as well.

prakashsurya · 2014-09-13T17:43:02Z

That patch isn't yet in illumos, but I can try and expedite up streaming it, if that'd help.

Having both, this pull request and the zfs_blkptr_verify patch might not be a bad idea, though.

prakashsurya · 2014-09-13T17:44:54Z

And yea, I agree @dweeezil, I'm all for hardening non-ECC systems with minimal effort like that.

prakashsurya · 2014-09-13T17:51:25Z

My only concern with the write path, is that would add more overhead to a path performance critical code path. With that said, I don't think a few conditionals is something to worry about, and it's well worth the performance cost for data integrity.

Are there no sanity checks that already do that in the write path? I'm not at my terminal so I can't check myself.

behlendorf · 2014-09-13T20:53:44Z

@prakashsurya I like the idea of more rigorously sanity checking the contents of a block pointer. I was tempted to write a function just like this and use it in arc_read() to detect the damaged logical block size. So I think zfs_blkptr_verify() could nicely complement the patch proposed in this issue.

Reworking https://gist.github.com/anonymous/f19e25b4ad1108082ad6 so that it could be used in this way would be a nice bit of infrastructure. We could then leverage it in various places to more gracefully handle this case. I'd suggest moving it to blkptr.c and added it the headers so it could be called elsewhere.

It would also be nice if zfs_blkptr_verify() behaved the same way as dnode_verify(). Either only enable then when ZFS_DEBUG is set, or have both of them use zfs_panic_recover().

Compared to cost of the other things which can happen in the write path I don't have any real concerns adding this to zio_write().

The general strategy used by ZFS to verify that blocks are valid is to checksum everything. This has the advantage of being extremely robust and generically applicable regardless of the contents of the block. If a blocks checksum is valid then its contents are trusted by the higher layers. This system works exceptionally well as long as bad data is never written with a valid checksum. If this does somehow occur due to a software bug or a memory bit-flip on a non-ECC system it may result in kernel panic. One such place where this could occur is if somehow the logical size stored in a block pointer exceeds the maximum block size. This will result in an attempt to allocate a buffer greater than the maximum block size causing a system panic. To prevent this from happening the arc_read() function has been updated to detect this specific case. If a block pointer with an invalid logical size is passed it will treat the block as if it contained a checksum error. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#2678

behlendorf · 2014-10-21T16:34:27Z

@dweeezil @prakashsurya I'd like to merge this, can I add your signed-off-by. A little additional hardening may help quite a few people.

dweeezil · 2014-10-22T21:21:27Z

@behlendorf LGTM as a nice bit of hardening.

I'll be interested to see what becomes of the zfs_blkptr_verify() code shown in https://gist.github.com/anonymous/f19e25b4ad1108082ad6.

behlendorf · 2014-10-23T16:30:18Z

Me too, I'd definitely like to see additional hardening like zfs_blkptr_verify(). But for this moment this is one small step we can take, it's been merged as:

5f6d0b6 Handle block pointers with a corrupt logical size

behlendorf mentioned this pull request Sep 10, 2014

ZFS kernel panic during scrub and following imports - CentOS #2678

Closed

behlendorf force-pushed the issue-2678 branch from 4a7480a to 4c50818 Compare September 12, 2014 00:06

behlendorf force-pushed the issue-2678 branch from 4c50818 to 0ff2d8c Compare September 18, 2014 23:13

behlendorf closed this Oct 23, 2014

behlendorf deleted the issue-2678 branch April 19, 2021 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle block pointers with a corrupt logical size #2685

Handle block pointers with a corrupt logical size #2685

behlendorf commented Sep 10, 2014

behlendorf commented Sep 11, 2014

dweeezil commented Sep 11, 2014

dweeezil commented Sep 11, 2014

dweeezil commented Sep 11, 2014

behlendorf commented Sep 12, 2014

prakashsurya commented Sep 13, 2014

dweeezil commented Sep 13, 2014

prakashsurya commented Sep 13, 2014

prakashsurya commented Sep 13, 2014

prakashsurya commented Sep 13, 2014

behlendorf commented Sep 13, 2014

behlendorf commented Oct 21, 2014

dweeezil commented Oct 22, 2014

behlendorf commented Oct 23, 2014

Handle block pointers with a corrupt logical size #2685

Handle block pointers with a corrupt logical size #2685

Conversation

behlendorf commented Sep 10, 2014

behlendorf commented Sep 11, 2014

dweeezil commented Sep 11, 2014

dweeezil commented Sep 11, 2014

dweeezil commented Sep 11, 2014

behlendorf commented Sep 12, 2014

prakashsurya commented Sep 13, 2014

dweeezil commented Sep 13, 2014

prakashsurya commented Sep 13, 2014

prakashsurya commented Sep 13, 2014

prakashsurya commented Sep 13, 2014

behlendorf commented Sep 13, 2014

behlendorf commented Oct 21, 2014

dweeezil commented Oct 22, 2014

behlendorf commented Oct 23, 2014