-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable direct IO on Btrfs based storage #9488
Comments
My understanding is that Btrfs used to be more prone to checksum errors with direct I/O mainly because direct I/O exercises substantially different parts of the Btrfs codebase that had some bugs. Since then more attention has been paid to the combination, bugs have been fixed, and it's pretty calm nowadays. OTOH, besides bugs in the Btrfs codebase itself, these checksum errors also happen(ed?) when a "misbehaving (...) application modifies a buffer in-flight when doing an O_DIRECT write". In our case the application is xen-blkback on a loop device, so I don't know if that's relevant? Personally I haven't experienced any such corruption. I've been running Btrfs with direct I/O enabled on my main Qubes OS notebook since R4.1, hacked into the old block script.
It's plausible that direct I/O leads to different allocation patterns, which could be worse or even pathological in some cases, maybe linked to specific SSD hardware (handling fragmentation especially badly) or maybe not. But if that's a common problem to the point where it makes direct I/O a bad trade-off, shouldn't we have seen way more reports of such a regression with R4.2 from users who had R4.1 working fine with Btrfs?
Huh, that's an important caveat (previously stated on the linux-btrfs mailing list) I didn't have on my radar, because I never use compression. Maybe the not-script should have a mechanism to disable direct I/O. That same mechanism could also be useful to override the 512 byte logical block size. Or I guess for disabling direct I/O it could also automatically look at the file to see if it's hosted on Btrfs and has compression enabled? |
Thank you for your hint. I did indeed encounter this issue. After some testing, it turned out that using the Xen PV Block driver in Windows qubes would cause a CSUM error, but Linux qubes and those not using the Xen PV Block driver wouldn't have such an issue. I have attached the error logs for reference:
Cool! To compare, can you reveal your Btrfs configuration? My Btrfs mount uses compress=zstd:1. Additionally, my dom0 shutdown time is longer than the LVM configuration on same hardware. Journalctl shows that file system synchronization took a long time and systemd had to kill corresponding processes in order for it to shut down. Maybe I should try not using compression? Or is using the option "autodefrag" helpful in alleviating this problem? Also, I want to know why the default behavior of Qubes OS is to use COW on disk image files while Fedora and openSUSE disable COW for libvirtd's disk images.
Making users have more choices is better indeed. Many virtual machine management programs support adjusting the cache mode. So it's good to have this feature in Qubes OS. |
Looks like the untrusted qube side can cause checksum errors on the trusted dom0 side. I was hoping that the dom0 backend prevents this. Can you easily reproduce the errors with a new Windows qube?
Sure: kernel-latest, default mount options plus I don't use Windows qubes though.
Try shutting down the qubes individually and/or take a look at the
There's currently an inherent amount of COW required by the Qubes OS storage API: #8767 So if a nocow file is reflinked into another, Btrfs still has to do a COW operation whenever a block is then modified for the first time in one of the files. nocow only avoids subsequent COW operations after that first write per block - but also totally disables data checksums (protection against bit rot) :( |
Is this a security vulnerability from an upstream Xen PoV? |
It seems that the
Maybe not that bad? As far as I know, a CSUM error will prevent further read access, making the qube unstable. (But not damage the filesystem.) However, after I turned off the Windows qube,
Yes. Here are steps to reproduce:
|
I can reproduce this as well in Windows 10 qube with testsigning on:
And latest xenbus and xenvbd drivers installed from here: |
It would allow a malicious VM to DOS any backup procedure that (like the Qubes OS one) fails at the first read error and then doesn't back up the rest of the system (e.g. remaining VMs). Although the more alarming angle might be dom0 causing a VM to unintentionally corrupt its internal data. Even if that's not necessarily a security vulnerability. |
IMHO we should indeed disable direct I/O in the not-script at the moment. @DemiMarie @marmarek It's difficult to figure out who's technically wrong here in this stack of blkfront-on-Windows, blkback, loop devices, and Btrfs. Not to mention various manpages/docs that could all be more explicit about interactions with direct I/O. But in practice, it looks like direct I/O can interfere with system backups, nullify compression, and break Windows VMs. And it can't be ruled out that technically misused (by whatever part of the stack) direct I/O could cause real data corruption beyond a benign(?) checksum mismatch, especially considering that file-reflink can be hosted by XFS/ZFS/bcachefs too or even any filesystem, in degraded mode. There's also the still not quite dead legacy 'file' driver that's compatible with arbitrary filesystems, edit: does it use the not-script? |
@rustybird I can have the not-script check the filesystem type (via |
Do we know that it is always safe for XFS and ext4 though? That Btrfs commit said it's simply application misbehavior to modify the buffer during a direct I/O write. (Which isn't explictly mentioned for O_DIRECT in open(2), but it makes sense. Come to think of it, blkback modifying the buffer during a write doesn't seem great even without direct I/O, right?) As non data checksumming filesystems, they have one less reason to look at the buffer twice. But maybe they do in some circumstances, e.g. depending on journalling options. It's silly but I like to treat filesystems as if they're developed by ruthless "language lawyers" who will see the slightest incorrect use as an opportunity to deploy an optimization destroying the maximum allowable amount of data. So now I'm a bit spooked about direct I/O until it's been diagnosed what exactly is going on here. |
Modifying a buffer that is in-flight is a straightforward data race and undefined behavior. No application should ever do this. The only difference here is that when the junk data is read, BTRFS returns an error, while XFS/ext4 return garbage. Theoretically, what BTRFS is doing is better. However, most applications don’t handle I/O errors well, so the BTRFS behavior is a persistent denial of service. |
Pending diagnosis of QubesOS/qubes-issues#9488
Pending diagnosis of QubesOS/qubes-issues#9488 (cherry picked from commit 8cdd664)
Will linux-utils v4.3.5 be available for Qubes 4.2 or do we have to wait for 4.3 to be released? I think I just experienced disk corruption (recoverable, fortunately) due to the usage of DIO, for the second time in my last 1 months' worth of Qubes usage. |
The change was cherry-picked for the R4.2 branch three weeks ago, but the qubes-utils package version hasn't been bumped yet. @marmarek
This happened with Windows VMs both times? |
Linux AppVMs. Technically I am not fully sure that this issue is caused by DIO (no meaningful logs), but the AppVMs are pretty standard and I have not made any significant modifications to Qubes other than installing it on BTRFS, so it could be due to that. Also, maybe unrelated, but installing the Xen driver (through QWT) on Windows 10 causes so many r/w errors that the VM bluescreens in a couple of minutes. May be related. |
You might want to keep an eye on this even after updating to the dom0 qubes-utils-4.2.18 package, which disables direct I/O. Maybe Btrfs correctly alerted you to a hardware problem. (Just recently I experienced something similar: A faulty power supply had damaged my mainboard, and Btrfs was the first to complain via checksum errors.)
That part seems like it could be direct I/O related. |
A week or two ago I've updated by setting dom0 repositories to testing, it seems to be fixed, compression works. I'll test out the Windows bugs once the new QWT are released. Any idea as to when I'll be able to set the updates back to stable or security-testing without risking a downgrade? |
How to file a helpful issue
Qubes OS release
Qubes Release 4.2.3 (R4.2)
Brief summary
There shouldn't be a use of direct-io on Btrfs because it can lead to various problems.
This would cause CSUM error, which is described in the documentation of PROXMOX and there's also an related issue on the Bugzilla for Fedora. What's more, there are also related discussions on the Btrfs mailing list.
Even if direct IO does not cause errors, this setting would also prevent the built-in compression functionality of Btrfs from working properly.
Steps to reproduce
Run
sudo losetup --list
on a Btrfs based Qubes OS to see that DIO is listed as having a value of 1.Expected behavior
Direct IO should be disabled.
Actual behavior
It's enabled on Btrfs based installation.
Note
The commit states that this behavior has been reverted, but I assume it's still being used since the not-script.c introduced in R4.2, which uses
LO_FLAGS_DIRECT_IO
. (I'm unfamiliar with the code of Qubes OS, so this might be incorrect.)And maybe it leads to some regression on Btrfs, making qubes on Btrfs slow. I made this conjecture because the file "not-script.c" does not exist in R4.1. I'm not sure about this, because I can't simply disable the DIO for testing purposes; if someone could tell me how to do it, I'd be happy to try.
The text was updated successfully, but these errors were encountered: