-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify checksum of the ZFS module text and rodata before each transaction group commit #2832
Labels
Type: Feature
Feature request or new feature
Comments
ryao
changed the title
Verify the modules before each transaction group commit
Verify the ZFS module text before each transaction group commit
Oct 26, 2014
ryao
changed the title
Verify the ZFS module text before each transaction group commit
Verify a checksum of the ZFS module text before each transaction group commit
Oct 26, 2014
ryao
changed the title
Verify a checksum of the ZFS module text before each transaction group commit
Verify checksum of the ZFS module text before each transaction group commit
Oct 26, 2014
ryao
changed the title
Verify checksum of the ZFS module text before each transaction group commit
Verify checksum of the ZFS module text and rodata before each transaction group commit
Oct 26, 2014
behlendorf
added
Difficulty - Medium
Type: Feature
Feature request or new feature
labels
Oct 27, 2014
Does this apply to linux
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It is conceivable that bit flips, misdirected writes by malfunctioning hardware capable of DMA, misdirected writes by other portions of the kernel, strange bugs like #2805, etcetera alter program text (or the initial values for a checksum routine) in a way that creates "phantom disk format changes". Systems that already have ZFS pools might catch the problem through an immediate failure that disappears after the module is reloaded (e.g. through a reboot) and be able to rollback transaction groups to a time before any damage occurred should any significant damage be committed to disk.
However, that is not necessarily the case when things like #2094 are introduced via corruption of program text. Systems that do not import existing pools and create pools after the in-core "phantom disk format change" occurs would exhibit no problems until the module is reloaded. At that time, the complete loss of any pools created with the "phantom disk format change" would occur.
We could try to detect and prevent this by verifying a checksum of the ZFS module text and rodata at module load and at each txg_group commit. This checksum should be generated at module creation rather than at module load time to avoid risking a runtime race where a bitflip could damage the module before the checksum is generated. If we detect an error, we should panic the system to protect data.
It should be noted that tracers such as DTrace and Ftrace alter the program text during normal operation, such that we will need to devise some way to avoid false positives caused by tracing routines. This could be done by copying the program text to a special region in memory, zeroing each of the trace points (which are typically noops) and doing the checksum then. That would allow us to verify our own logic without interfering with tracers. If the tracers suffer bitflips (or have bugs) that cause undefined behavior, we will not be able to do anything about it, but in all likelihood, the transaction group commit will hang under such circumstances and a system watchdog will panic us, rather than allowing us to continue.
The text was updated successfully, but these errors were encountered: