-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the bio_empty_barrier() check #1318
Conversation
To determine whether the kernel is capable of handling empty barrier BIOs, we check for the presence of the bio_empty_barrier() macro, which was introduced in 2.6.24. If this macro is defined, then we can flush disk vdevs; if it isn't, then flushing is disabled. Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37, even though the kernel is still capable of handling empty barrier BIOs. As a result, flushing is effectively disabled on kernels >= 2.6.37, meaning that starting from this kernel version, zfs doesn't use barriers to guarantee on-disk data consistency. This is quite bad and can lead to potential data corruption on power failures. This patch fixes the issue by removing the configure check for bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore. Thanks to Richard Kojedzinszky for catching this nasty bug.
@behlendorf: I feel very bad knowing that some kernel change that went unnoticed just happened to disable critical corruption protection code in ZoL. When a new kernel is released, we should definitely do a diff check on |
@dechamps Bad, bad, bad. Thank you for jumping on this right away. I should have noticed the Both patches look good to me. I'm running them through the automated testing now then I'll do some manual testing to make sure it's working as expected before I merge it. |
Yep. As I said, I didn't have time to test the patch (I just made sure it builds successfully). I'm not entirely sure, but it seems Richard is having issues with msync() when this patch is in place. This resembles #907. |
The automated testing went well and using This fix shouldn't have a huge impact of
|
@dechamps I doubt that missing write barriers caused phoronix's criticisms. As far as I can tell, those criticisms involve benchmarks that measure read performance. With that said, thanks for catching this. This should be fixed ASAP. With that said, the uberblock history was devised to enable ZFS to recover on hardware that does not properly honor barriers, so kernels where barriers are currently broken still have that protection. I have done numerous unclean reboots without problems using ashift=12 with kernels >2.6.37, so that feature appears to be working properly. However, this might explain why corruption has been observed at ashift values greater than 13. At ashift=14, we are limited to an uberblock history of 8, which is probably not big enough to recover when barriers are broken. |
Sure, ZFS is able to recover from broken flushes with regard to file system consistency. It still breaks application data consistency, however, as the recently synced data won't find their way to the on-disk ZIL in the event of a power failure. In other words, fsync() is broken, as Richard's test shows. This is a big issue for reliable transaction (e.g. database) systems. |
Does this affect all users? Is there any workaround other than recompiling? |
All users running kernel >= 2.6.37 and using write cache on disk vdevs (which is usually the case).
You could disable the write cache on your disk vdevs. |
To determine whether the kernel is capable of handling empty barrier BIOs, we check for the presence of the bio_empty_barrier() macro, which was introduced in 2.6.24. If this macro is defined, then we can flush disk vdevs; if it isn't, then flushing is disabled. Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37, even though the kernel is still capable of handling empty barrier BIOs. As a result, flushing is effectively disabled on kernels >= 2.6.37, meaning that starting from this kernel version, zfs doesn't use barriers to guarantee on-disk data consistency. This is quite bad and can lead to potential data corruption on power failures. This patch fixes the issue by removing the configure check for bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore. Thanks to Richard Kojedzinszky for catching this nasty bug. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#1318
NOTE: this branch needs 962020a in order to build correctly. See #1317. In addition, I did not check whether the resurrected flush code actually works correctly.
To determine whether the kernel is capable of handling empty barrier BIOs, we check for the presence of the
bio_empty_barrier()
macro, which was introduced in 2.6.24. If this macro is defined, then we can flush disk vdevs; if it isn't, then flushing is disabled.Unfortunately, the
bio_empty_barrier()
macro was removed in 2.6.37, even though the kernel is still capable of handling empty barrier BIOs.As a result, flushing is effectively disabled on kernels >= 2.6.37, meaning that starting from this kernel version, zfs doesn't use barriers to guarantee on-disk data consistency. In other words, it behaves as if
zfs_nocacheflush
was set. This is quite bad and can lead to potential data corruption on power failures.This patch fixes the issue by removing the configure check for
bio_empty_barrier()
, as we don't support kernels <= 2.6.24 anymore.Thanks to Richard Kojedzinszky for catching this nasty bug. Note that this could also explain why Phoronix was so skeptical in their benchmarks: