Disable direct IO on Btrfs based storage #9488

ImBearChild · 2024-10-02T03:23:07Z

Qubes OS release

Qubes Release 4.2.3 (R4.2)

Brief summary

There shouldn't be a use of direct-io on Btrfs because it can lead to various problems.

This would cause CSUM error, which is described in the documentation of PROXMOX and there's also an related issue on the Bugzilla for Fedora. What's more, there are also related discussions on the Btrfs mailing list.

Even if direct IO does not cause errors, this setting would also prevent the built-in compression functionality of Btrfs from working properly.

Steps to reproduce

Run sudo losetup --list on a Btrfs based Qubes OS to see that DIO is listed as having a value of 1.

Expected behavior

Direct IO should be disabled.

Actual behavior

It's enabled on Btrfs based installation.

Note

The commit states that this behavior has been reverted, but I assume it's still being used since the not-script.c introduced in R4.2, which uses LO_FLAGS_DIRECT_IO. (I'm unfamiliar with the code of Qubes OS, so this might be incorrect.)

And maybe it leads to some regression on Btrfs, making qubes on Btrfs slow. I made this conjecture because the file "not-script.c" does not exist in R4.1. I'm not sure about this, because I can't simply disable the DIO for testing purposes; if someone could tell me how to do it, I'd be happy to try.

The text was updated successfully, but these errors were encountered:

marmarek · 2024-10-02T11:38:00Z

@DemiMarie @rustybird ?

rustybird · 2024-10-02T12:20:32Z

My understanding is that Btrfs used to be more prone to checksum errors with direct I/O mainly because direct I/O exercises substantially different parts of the Btrfs codebase that had some bugs. Since then more attention has been paid to the combination, bugs have been fixed, and it's pretty calm nowadays.

OTOH, besides bugs in the Btrfs codebase itself, these checksum errors also happen(ed?) when a "misbehaving (...) application modifies a buffer in-flight when doing an O_DIRECT write". In our case the application is xen-blkback on a loop device, so I don't know if that's relevant?

Personally I haven't experienced any such corruption. I've been running Btrfs with direct I/O enabled on my main Qubes OS notebook since R4.1, hacked into the old block script.

And maybe it leads to some regression on Btrfs, making qubes on Btrfs slow.

It's plausible that direct I/O leads to different allocation patterns, which could be worse or even pathological in some cases, maybe linked to specific SSD hardware (handling fragmentation especially badly) or maybe not. But if that's a common problem to the point where it makes direct I/O a bad trade-off, shouldn't we have seen way more reports of such a regression with R4.2 from users who had R4.1 working fine with Btrfs?

Even if direct IO does not cause errors, this setting would also prevent the built-in compression functionality of Btrfs from working properly.

Huh, that's an important caveat (previously stated on the linux-btrfs mailing list) I didn't have on my radar, because I never use compression.

Maybe the not-script should have a mechanism to disable direct I/O. That same mechanism could also be useful to override the 512 byte logical block size.

Or I guess for disabling direct I/O it could also automatically look at the file to see if it's hosted on Btrfs and has compression enabled?

ImBearChild · 2024-10-02T16:52:59Z

OTOH, besides bugs in the Btrfs codebase itself, these checksum errors also happen(ed?) when a "misbehaving (...) application modifies a buffer in-flight when doing an O_DIRECT write". In our case the application is xen-blkback on a loop device, so I don't know if that's relevant?

Thank you for your hint. I did indeed encounter this issue. After some testing, it turned out that using the Xen PV Block driver in Windows qubes would cause a CSUM error, but Linux qubes and those not using the Xen PV Block driver wouldn't have such an issue. I have attached the error logs for reference:

dom0 kernel: BTRFS warning (device dm-1): csum failed root 256 ino 255232 off 4297732096 csum 0xc73039cd expected csum 0xaf05393c mirror 1
dom0 kernel: BTRFS warning (device dm-1): csum failed root 256 ino 255232 off 4297728000 csum 0xc73039cd expected csum 0xbea01aaf mirror 1
dom0 kernel: BTRFS error (device dm-1): bdev /dev/mapper/luks-fedc3ffe-b16b-45ec-b968-2a929b8f5a81 errs: wr 0, rd 0, flush 0, corrupt 268806, gen 0
dom0 kernel: BTRFS error (device dm-1): bdev /dev/mapper/luks-fedc3ffe-b16b-45ec-b968-2a929b8f5a81 errs: wr 0, rd 0, flush 0, corrupt 268807, gen 0
dom0 kernel: BTRFS warning (device dm-1): direct IO failed ino 255232 op 0x0 offset 0x1002a2000 len 4096 err no 10
dom0 kernel: BTRFS warning (device dm-1): direct IO failed ino 255232 op 0x0 offset 0x1002a3000 len 4096 err no 10
dom0 kernel: I/O error, dev loop28, sector 8393992 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 2

I've been running Btrfs with direct I/O enabled on my main Qubes OS notebook since R4.1, hacked into the old block script.

Cool! To compare, can you reveal your Btrfs configuration? My Btrfs mount uses compress=zstd:1. Additionally, my dom0 shutdown time is longer than the LVM configuration on same hardware. Journalctl shows that file system synchronization took a long time and systemd had to kill corresponding processes in order for it to shut down. Maybe I should try not using compression? Or is using the option "autodefrag" helpful in alleviating this problem?

Also, I want to know why the default behavior of Qubes OS is to use COW on disk image files while Fedora and openSUSE disable COW for libvirtd's disk images.

Maybe the not-script should have a mechanism to disable direct I/O. That same mechanism could also be useful to override the 512 byte logical block size.

Making users have more choices is better indeed. Many virtual machine management programs support adjusting the cache mode. So it's good to have this feature in Qubes OS.

rustybird · 2024-10-02T19:40:29Z

"misbehaving (...) application modifies a buffer in-flight when doing an O_DIRECT write". In our case the application is xen-blkback on a loop device

it turned out that using the Xen PV Block driver in Windows qubes would cause a CSUM error, but Linux qubes and those not using the Xen PV Block driver wouldn't have such an issue.

Looks like the untrusted qube side can cause checksum errors on the trusted dom0 side. I was hoping that the dom0 backend prevents this.

Can you easily reproduce the errors with a new Windows qube?

To compare, can you reveal your Btrfs configuration?

Sure: kernel-latest, default mount options plus noatime and since R4.2 discard=async (the default on modern kernels, but the installer adds discard which unfortunately now means discard=sync for Btrfs). Also since R4.2 I create the filesystem with the --checksum=xxhash.

I don't use Windows qubes though.

Additionally, my dom0 shutdown time is longer than the LVM configuration on same hardware. Journalctl shows that file system synchronization took a long time and systemd had to kill corresponding processes in order for it to shut down.

Try shutting down the qubes individually and/or take a look at the qubesd journal to see which .img files are causing long delays and could benefit from manual defragmentation. (Definitely don't use autodefrag for file-reflink pools. On Btrfs, defragmentation duplicates data shared across snapshots/reflinks so it must be done selectively.)

Also, I want to know why the default behavior of Qubes OS is to use COW on disk image files while Fedora and openSUSE disable COW for libvirtd's disk images.

There's currently an inherent amount of COW required by the Qubes OS storage API: #8767

So if a nocow file is reflinked into another, Btrfs still has to do a COW operation whenever a block is then modified for the first time in one of the files. nocow only avoids subsequent COW operations after that first write per block - but also totally disables data checksums (protection against bit rot) :(

DemiMarie · 2024-10-03T01:03:35Z

"misbehaving (...) application modifies a buffer in-flight when doing an O_DIRECT write". In our case the application is xen-blkback on a loop device

it turned out that using the Xen PV Block driver in Windows qubes would cause a CSUM error, but Linux qubes and those not using the Xen PV Block driver wouldn't have such an issue.

Looks like the untrusted qube side can cause checksum errors on the trusted dom0 side. I was hoping that the dom0 backend prevents this.

Is this a security vulnerability from an upstream Xen PoV?

ImBearChild · 2024-10-03T05:04:19Z

Try shutting down the qubes individually and/or take a look at the qubesd journal to see which .img files are causing long delays and could benefit from manual defragmentation.

It seems that the qubesd shutdown process is relatively quick on my machine; from the first "Refilinked file" message to "qubesdb-daemon: terminating," it takes only 8 seconds. However, systemd complains about "Syncing filesystems and block devices" taking too long (30 seconds), which causes it to be killed. Manual defragmentation does not seem to affect this "syncing" issue, but it has helped shorten my qubesd shutdown time.

dom0 systemd-shutdown[1]: Syncing filesystems and block devices.
dom0 systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL to PID 17446.

Is this a security vulnerability from an upstream Xen PoV?

Maybe not that bad? As far as I know, a CSUM error will prevent further read access, making the qube unstable. (But not damage the filesystem.) However, after I turned off the Windows qube, btrfs scrub and btrfs check show no errors found.

Can you easily reproduce the errors with a new Windows qube?

Yes. Here are steps to reproduce:

Prepare a new Windows qube
Install XEN PV Block driver from qubes tools installer and reboot
See Btrfs CSUM error in dmesg.

apparatius · 2024-10-03T05:10:48Z

I can reproduce this as well in Windows 10 qube with testsigning on:

bcdedit.exe /set testsigning on

And latest xenbus and xenvbd drivers installed from here:
https://xenbits.xenproject.org/pvdrivers/win/

rustybird · 2024-10-03T09:02:17Z

Looks like the untrusted qube side can cause checksum errors on the trusted dom0 side. I was hoping that the dom0 backend prevents this.

Is this a security vulnerability from an upstream Xen PoV?

It would allow a malicious VM to DOS any backup procedure that (like the Qubes OS one) fails at the first read error and then doesn't back up the rest of the system (e.g. remaining VMs).

Although the more alarming angle might be dom0 causing a VM to unintentionally corrupt its internal data. Even if that's not necessarily a security vulnerability.

rustybird · 2024-10-03T15:58:29Z

IMHO we should indeed disable direct I/O in the not-script at the moment. @DemiMarie @marmarek

It's difficult to figure out who's technically wrong here in this stack of blkfront-on-Windows, blkback, loop devices, and Btrfs. Not to mention various manpages/docs that could all be more explicit about interactions with direct I/O.

But in practice, it looks like direct I/O can interfere with system backups, nullify compression, and break Windows VMs.

And it can't be ruled out that technically misused (by whatever part of the stack) direct I/O could cause real data corruption beyond a benign(?) checksum mismatch, especially considering that file-reflink can be hosted by XFS/ZFS/bcachefs too or even any filesystem, in degraded mode. There's also the still not quite dead legacy 'file' driver that's compatible with arbitrary filesystems, edit: does it use the not-script?

DemiMarie · 2024-10-03T17:00:10Z

@rustybird I can have the not-script check the filesystem type (via fstatfs(2)) and enable direct I/O only on ext4 or XFS (where it is a free performance win). There might be other filesystems (bcachefs?) where it is safe. ZFS users should be using the ZFS storage driver, which doesn’t have this issue.

rustybird · 2024-10-04T12:31:20Z

Do we know that it is always safe for XFS and ext4 though? That Btrfs commit said it's simply application misbehavior to modify the buffer during a direct I/O write. (Which isn't explictly mentioned for O_DIRECT in open(2), but it makes sense. Come to think of it, blkback modifying the buffer during a write doesn't seem great even without direct I/O, right?) As non data checksumming filesystems, they have one less reason to look at the buffer twice. But maybe they do in some circumstances, e.g. depending on journalling options.

It's silly but I like to treat filesystems as if they're developed by ruthless "language lawyers" who will see the slightest incorrect use as an opportunity to deploy an optimization destroying the maximum allowable amount of data. So now I'm a bit spooked about direct I/O until it's been diagnosed what exactly is going on here.

DemiMarie · 2024-10-05T01:35:58Z

Modifying a buffer that is in-flight is a straightforward data race and undefined behavior. No application should ever do this. The only difference here is that when the junk data is read, BTRFS returns an error, while XFS/ext4 return garbage.

Theoretically, what BTRFS is doing is better. However, most applications don’t handle I/O errors well, so the BTRFS behavior is a persistent denial of service.

Pending diagnosis of QubesOS/qubes-issues#9488

Pending diagnosis of QubesOS/qubes-issues#9488 (cherry picked from commit 8cdd664)

Atrate · 2024-11-24T22:38:23Z

Will linux-utils v4.3.5 be available for Qubes 4.2 or do we have to wait for 4.3 to be released? I think I just experienced disk corruption (recoverable, fortunately) due to the usage of DIO, for the second time in my last 1 months' worth of Qubes usage.

rustybird · 2024-11-25T10:52:15Z

The change was cherry-picked for the R4.2 branch three weeks ago, but the qubes-utils package version hasn't been bumped yet. @marmarek

I think I just experienced disk corruption (recoverable, fortunately) due to the usage of DIO, for the second time in my last 1 months' worth of Qubes usage.

This happened with Windows VMs both times?

Atrate · 2024-11-25T17:21:06Z

The change was cherry-picked for the R4.2 branch three weeks ago, but the qubes-utils package version hasn't been bumped yet. @marmarek

I think I just experienced disk corruption (recoverable, fortunately) due to the usage of DIO, for the second time in my last 1 months' worth of Qubes usage.

This happened with Windows VMs both times?

Linux AppVMs. Technically I am not fully sure that this issue is caused by DIO (no meaningful logs), but the AppVMs are pretty standard and I have not made any significant modifications to Qubes other than installing it on BTRFS, so it could be due to that.

Also, maybe unrelated, but installing the Xen driver (through QWT) on Windows 10 causes so many r/w errors that the VM bluescreens in a couple of minutes. May be related.

rustybird · 2024-11-26T15:29:59Z

I think I just experienced disk corruption [...]

This happened with Windows VMs both times?

Linux AppVMs.

You might want to keep an eye on this even after updating to the dom0 qubes-utils-4.2.18 package, which disables direct I/O. Maybe Btrfs correctly alerted you to a hardware problem. (Just recently I experienced something similar: A faulty power supply had damaged my mainboard, and Btrfs was the first to complain via checksum errors.)

installing the Xen driver (through QWT) on Windows 10 causes so many r/w errors that the VM bluescreens in a couple of minutes.

That part seems like it could be direct I/O related.

Atrate · 2024-12-10T12:19:07Z

A week or two ago I've updated by setting dom0 repositories to testing, it seems to be fixed, compression works. I'll test out the Windows bugs once the new QWT are released. Any idea as to when I'll be able to set the updates back to stable or security-testing without risking a downgrade?

ImBearChild added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Oct 2, 2024

andrewdavidwong added needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. C: storage affects-4.2 This issue affects Qubes OS 4.2. labels Oct 2, 2024

andrewdavidwong added P: major Priority: major. Between "default" and "critical" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Oct 4, 2024

rustybird added a commit to rustybird/qubes-linux-utils that referenced this issue Oct 8, 2024

qubes-block: disable direct I/O for now

8cdd664

Pending diagnosis of QubesOS/qubes-issues#9488

rustybird mentioned this issue Oct 8, 2024

qubes-block: disable direct I/O for now QubesOS/qubes-linux-utils#118

Merged

marmarek pushed a commit to QubesOS/qubes-linux-utils that referenced this issue Nov 6, 2024

qubes-block: disable direct I/O for now

e0c25b6

Pending diagnosis of QubesOS/qubes-issues#9488 (cherry picked from commit 8cdd664)

qubesos-bot mentioned this issue Nov 6, 2024

linux-utils v4.3.5 (r4.3) QubesOS/updates-status#5182

Closed

ImBearChild closed this as completed Nov 21, 2024

qubesos-bot mentioned this issue Nov 25, 2024

linux-utils v4.2.18 (r4.2) QubesOS/updates-status#5271

Closed

andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Nov 29, 2024

andrewdavidwong added the diagnosed Technical diagnosis has been performed (see issue comments). label Nov 29, 2024

qubesos-secondary-bot mentioned this issue Dec 8, 2024

linux-utils v4.2.18 (r4.2) QubesOS/updates-status#5296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable direct IO on Btrfs based storage #9488

Disable direct IO on Btrfs based storage #9488

ImBearChild commented Oct 2, 2024

marmarek commented Oct 2, 2024

rustybird commented Oct 2, 2024 •

edited

Loading

ImBearChild commented Oct 2, 2024 •

edited

Loading

rustybird commented Oct 2, 2024 •

edited

Loading

DemiMarie commented Oct 3, 2024

ImBearChild commented Oct 3, 2024

apparatius commented Oct 3, 2024

rustybird commented Oct 3, 2024 •

edited

Loading

rustybird commented Oct 3, 2024 •

edited

Loading

DemiMarie commented Oct 3, 2024

rustybird commented Oct 4, 2024

DemiMarie commented Oct 5, 2024

Atrate commented Nov 24, 2024

rustybird commented Nov 25, 2024

Atrate commented Nov 25, 2024

rustybird commented Nov 26, 2024 •

edited

Loading

Atrate commented Dec 10, 2024

Disable direct IO on Btrfs based storage #9488

Disable direct IO on Btrfs based storage #9488

Comments

ImBearChild commented Oct 2, 2024

Qubes OS release

Brief summary

Steps to reproduce

Expected behavior

Actual behavior

Note

marmarek commented Oct 2, 2024

rustybird commented Oct 2, 2024 • edited Loading

ImBearChild commented Oct 2, 2024 • edited Loading

rustybird commented Oct 2, 2024 • edited Loading

DemiMarie commented Oct 3, 2024

ImBearChild commented Oct 3, 2024

apparatius commented Oct 3, 2024

rustybird commented Oct 3, 2024 • edited Loading

rustybird commented Oct 3, 2024 • edited Loading

DemiMarie commented Oct 3, 2024

rustybird commented Oct 4, 2024

DemiMarie commented Oct 5, 2024

Atrate commented Nov 24, 2024

rustybird commented Nov 25, 2024

Atrate commented Nov 25, 2024

rustybird commented Nov 26, 2024 • edited Loading

Atrate commented Dec 10, 2024

rustybird commented Oct 2, 2024 •

edited

Loading

ImBearChild commented Oct 2, 2024 •

edited

Loading

rustybird commented Oct 2, 2024 •

edited

Loading

rustybird commented Oct 3, 2024 •

edited

Loading

rustybird commented Oct 3, 2024 •

edited

Loading

rustybird commented Nov 26, 2024 •

edited

Loading