Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle block pointers with a corrupt logical size #2685

Closed
wants to merge 1 commit into from

Conversation

behlendorf
Copy link
Contributor

The general strategy used by ZFS to verify that blocks are is to
checksum everything. This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block. If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum. However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case. If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf [email protected]
Issue #2678

@behlendorf
Copy link
Contributor Author

@ryao @dweeezil can I get your thoughts on this.

@dweeezil
Copy link
Contributor

@behlendorf I think it's a rather clever idea. How about doing the same thing if the lsize isn't a power of 2? There are likely other places throughout the code where we could do similar sanity testing.

@dweeezil
Copy link
Contributor

@behlendorf EDIT: ... do the same thing if lsize isn't a multiple of 512.

EDIT2: I guess that's not likely to happen unless another memory error occurred after loading it.

@dweeezil
Copy link
Contributor

@behlendorf I made a pool with a bunch bogus-lsize'd blkptrs to see what would happen with this patch. In addition to hitting the new ECKSUM error in arc_read(), it also hit the ASSERT at the bottom of dbuf_read() because the dbuf's state was still DB_READ. Clearly we can't handle all types of damage bug I'll look into it a bit further to see whether there's a way to recover in this case.

Regarding my initial comments about additional size checking, I had overlooked the fact that "size" had just been computed as by shifting the on-disk value.

@behlendorf
Copy link
Contributor Author

Clearly we can't handle all types of damage

Thanks for the quick feedback. I completely agree. And I wouldn't want to go overboard with this sort of thing. But I think there are a few key places where we can handle things more gracefully. There's no good reason this particular damage needs to crash your system.

ASSERT at the bottom of dbuf_read()

Good catch, I think we should be able to handle this just by doing a better job of simulating the checksum failure. Calling the passed back with a dummy zio should kick in the existing error handling here. Let me propose an update patch. If you could try it on your damaged pool that would be appreciated.

additional size checking

Yeah, unless you look carefully at the code it is not obvious that this is a computed value based on SPA_MINBLOCKSHIFT. We're also going to have to be careful to revisit this code when the large block support is merged.

@prakashsurya
Copy link
Member

How do you guys feel about something like this: https://gist.github.com/anonymous/f19e25b4ad1108082ad6

@dweeezil
Copy link
Contributor

I like the concept and it's exactly the type of additional sanity checking I had in mind. As an assist to non-ECC systems, it might be a good idea to call zfs_blkptr_verify in the write path as well.

@prakashsurya
Copy link
Member

That patch isn't yet in illumos, but I can try and expedite up streaming it, if that'd help.

Having both, this pull request and the zfs_blkptr_verify patch might not be a bad idea, though.

@prakashsurya
Copy link
Member

And yea, I agree @dweeezil, I'm all for hardening non-ECC systems with minimal effort like that.

@prakashsurya
Copy link
Member

My only concern with the write path, is that would add more overhead to a path performance critical code path. With that said, I don't think a few conditionals is something to worry about, and it's well worth the performance cost for data integrity.

Are there no sanity checks that already do that in the write path? I'm not at my terminal so I can't check myself.

@behlendorf
Copy link
Contributor Author

@prakashsurya I like the idea of more rigorously sanity checking the contents of a block pointer. I was tempted to write a function just like this and use it in arc_read() to detect the damaged logical block size. So I think zfs_blkptr_verify() could nicely complement the patch proposed in this issue.

Reworking https://gist.github.com/anonymous/f19e25b4ad1108082ad6 so that it could be used in this way would be a nice bit of infrastructure. We could then leverage it in various places to more gracefully handle this case. I'd suggest moving it to blkptr.c and added it the headers so it could be called elsewhere.

It would also be nice if zfs_blkptr_verify() behaved the same way as dnode_verify(). Either only enable then when ZFS_DEBUG is set, or have both of them use zfs_panic_recover().

Compared to cost of the other things which can happen in the write path I don't have any real concerns adding this to zio_write().

The general strategy used by ZFS to verify that blocks are valid is
to checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long as bad data is never
written with a valid checksum.  If this does somehow occur due to
a software bug or a memory bit-flip on a non-ECC system it may
result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
@behlendorf
Copy link
Contributor Author

@dweeezil @prakashsurya I'd like to merge this, can I add your signed-off-by. A little additional hardening may help quite a few people.

@dweeezil
Copy link
Contributor

@behlendorf LGTM as a nice bit of hardening.

I'll be interested to see what becomes of the zfs_blkptr_verify() code shown in https://gist.github.com/anonymous/f19e25b4ad1108082ad6.

@behlendorf
Copy link
Contributor Author

Me too, I'd definitely like to see additional hardening like zfs_blkptr_verify(). But for this moment this is one small step we can take, it's been merged as:

5f6d0b6 Handle block pointers with a corrupt logical size

@behlendorf behlendorf closed this Oct 23, 2014
@behlendorf behlendorf deleted the issue-2678 branch April 19, 2021 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants