Atheros 928x PCI passthrough not working #3609

awokd · 2018-02-19T13:17:49Z

Qubes OS version:

R4.0

Affected TemplateVMs:

Steps to reproduce the behavior:

Try to attach AR9280 to sys-net or other HVM. AR9287 also reported to have same behavior.

Expected behavior:

ath9k driver loads without crashing

Actual behavior:

ath9k driver crashes HVM with

(XEN) AMD-Vi: Setup I/O page table: device id = 0x200, type = 0x1, root
table = 0x264921000, domain = 9, paging mode = 3
...
(XEN) svm.c:1540:d9v0 SVM violation gpa 0x000000f2020040, mfn 0xf0100, type 5
(XEN) domain_crash called from svm.c:1541

General notes:

Filing this here because passing through the same device on the same hardware to an HVM on Xen 4.8.2 and 4.8.3 on Fedora 26 works, as does using it in dom0 in that configuration and under stock Debian Stretch. Not sure if it affects a "broad range" of users as much as Intel wireless, though if there's a bug in handling this type of PCI device it could also affect other similar devices under Qubes. The fix could well be to buy a new device, but it might be helpful to understand why it doesn't work.

https://stackoverflow.com/questions/38387504/xen-guest-atheros-wifi-driver-load-causes-memory-paging-failure has a good description of the problem. He was encountering it under Xen 4.6 instead of Qubes, but I had the same issue (kernel crash instead of domU) when trying to pass it through to a PV under Qubes:

it seems the iomap of PCI BAR for the device returns a a mapping f which first 0x1000 bytes are read only and that causes access violation when trying to write registers mapped to this area (all the regs with offset < 0x1000) - why this happens i still don't know. Register writes with offsets > 0x1000 are fine.

According to the datasheet, this device uses a PCI Express 1.0a Configuration space of 0x00-0x62, DMA accessed registers from 0x0000-0x0FFC, and other registers from 0x1000-0x98FC. For example, the offset 0x40 PCI Express Configuration space register is used for Power Management Capability, while offset 0x0040 DMA device register is used for MIB Control. It has a single 64K BAR and no defined I/O port.

It's that first page of DMA registers that is causing problems. From Xen's perspective, the VM is trying to do an IO write to a page flagged as memory mapped (if I understand the error right), so it crashes. I verified this by commenting out the first couple register writes that were to offsets <0x1000 in the ath9k driver and recompiling it. The crash then occurred later in the driver initialization, but at a different <0x1000 location. Multiple writes to >0x1000 locations during driver initialization were processed successfully.

Related issues:

The text was updated successfully, but these errors were encountered:

awokd · 2018-02-19T13:20:00Z

Currently attempting to get Qubes 4.0's xen-hvm-stubdom-linux running on Xen 4.8.3 to see if it's a stubdom issue.

schnurentwickler · 2018-02-19T15:19:33Z

With atheros I had my issues as well. atheros was not usable even after reboots if the computer was in standby mode. Only a shutdown and even WITH power supply attached at boot brought it back to work.
The power supply issue I could not solve, but the standby issue I managed with nohwcrypt as module option. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/568090
Maybe Xen does have heavier problems to load and assign a device with strange responses even for normal linux setups.
I could not get an atheros device to work in qubes 3.2. Should be noted in qubes first information page for a release to avoid atheros device modules.

awokd · 2018-02-19T17:00:16Z

It's not all Atheros devices; I have a 9565 that works with Qubes 4.0 (although I never tested suspend). But you are probably right and the list of not working ones is longer than just 928x. I know Intel has issues with sleep mode too.

marmarek · 2018-02-20T17:03:17Z

This is weird. The difference with plain Fedora setup may be usage of stubdomain at all. Running linux-based stubdomain require some libxl patching, but mini-os based one should work out of the box on non-qubes system.
Another thing we do differently, is enabling e820_host option in guest configuration - you can disable it with qvm-features sys-net pci-e820-host ''. I doubt it will help, but those are differences I'm aware of.

/cc @HW42

awokd · 2018-02-21T16:24:56Z

Tried qvm-features sys-net pci-e820-host '' but unfortunately, no effect.

Not sure if it's relevant, but the working version appears to be using MSI-X but it's MSI or legacy under Qubes. Basing this observation on the IRQ numbers only, not entirely positive how to decrypt the lspci -vvv output.

To summarize:

Version	Result
Debian 9	works
Xen 4.6 PV	fails (per Stackoverflow link)
Xen 4.8.3 dom0	works
Xen 4.8.3 PV	spent 6 hours trying to get it to boot and xl console to connect, will try again later (not a tech support request but the learning curve sure is steep)
Xen 4.8.3 HVM	works except can't scan wireless networks
Xen 4.8.3 HVM traditional stubdomain	fault inside the stubdomain even with a very basic config and nothing passed through
Qubes 4.0 PV	fails similarly to Stackoverflow link
Qubes 4.0 HVM	fails with svm.c domain_crash
Qubes 4.0 HVM w/9565	works

HW42 · 2018-02-21T18:32:03Z

@awokd: Could you please post lspci -vvv -xxxx -s XX:XX.X (replace XX:XX.X with the device) from both dom0 as well as from inside the VM.

HW42 · 2018-02-21T18:40:18Z

FWIW: The ath9k card I have laying around (AR9287 according to lspci) works for me.

HW42 · 2018-02-21T18:42:16Z

@awokd: You wrote "Xen 4.8.3 HVM" works. Could you try to pass pci=nomsi to the VM kernel a see if it still works?

awokd · 2018-02-21T23:22:54Z

Attached the files- I'm able to boot qubes domu with the ath9k module blacklisted. My AR9280 is on a corebooted AMD and the other user that told me about the AR9287 not working was as well. Tried pci=nomsi on the xen domu and it had no effect- verified it on the boot log options line and the IRQ was still 36. Had also tried that before on the qubes domu with no change, still the svm.c crash.

qdom0.txt
qdomu.txt
xdom0.txt
xdomu.txt

awokd · 2018-02-21T23:50:08Z

I should clarify what I mean by "works" for the Xen HVM- the ath9k driver loads without crashing and I can poke at the card with iw commands and set and get data. Can't actually scan wireless networks but it looks like that's a common problem with multiple possible solutions, so I haven't spent much time on it yet.

HW42 · 2018-02-22T00:15:52Z

My AR9280 is on a corebooted AMD and the other user that told me about the AR9287 not working was as well.

AFAIK @h01ger also has problems with an ath9k card on a coreboot machine. That has a Intel CPU. So this sounds like a coreboot problem. Can you try this on an non-coreboot machine (or even better stock BIOS on the same machine)?

h01ger · 2018-02-22T00:21:48Z

On Wed, Feb 21, 2018 at 04:15:53PM -0800, HW42 wrote: AFAIK @h01ger also has problems with an ath9k card on a coreboot machine. That has a Intel CPU. So this sounds like a coreboot problem. Can you try this on an non-coreboot machine (or even better stock BIOS on the same machine)?

the^wone problem with thinkpads is, that they only allow intel wlan cards with the stock bios. IOW, you need coreboot to use those ath9k cards, and with pure debian they work well. I gave one ath9k card to marmarek, but I think he wasnt able to test it just yet. my ath9k card is also not inside a laptop right now, but I hope to change this soon.

…

-- cheers, Holger

HW42 · 2018-02-22T00:29:04Z

one problem with thinkpads is, that they only allow intel wlan cards with the stock bios.

Ugh.

Let's see what @awokd reports.

awokd · 2018-02-22T04:24:11Z

Yes, it's a Lenovo too with a whitelist firmware, so I couldn't run this card on it if I flash it back. But should the domU's lspci output differ between Qubes and Xen?
qdomu: Capabilities blocks 40, 50, 60, legacy INT(?)
xdomu: Capability block 40, MSI-X

awokd · 2018-02-22T05:12:48Z

This could be an edge case too, in which case I apologize for wasting everyone's time. But I've seen similar reports of MSI interrupts being flaky on some devices under Qubes over the past few months I've been working on this (not solidly, but still...). Maybe it's a duplicate issue?

PS I've edited the test results table above with additional results I forgot to include.

one example
and #3217

[ 2.361791] iwlwifi 0000:00:01.0: Xen PCI mapped GSI17 to IRQ27
[ 2.365431] iwlwifi 0000:00:01.0: pci frontend enable msi failed for dev 0:8
[ 2.365465] iwlwifi 0000:00:01.0: Xen PCI frontend error: -22!
[ 2.365694] iwlwifi 0000:00:01.0: pci_enable_msi failed - -22

and #3235

Oct 27 07:56:09 sys-net kernel: iwlwifi 0000:00:00.0: Xen PCI mapped GSI18 to IRQ26
Oct 27 07:56:09 sys-net kernel: iwlwifi 0000:00:00.0: pci frontend enable msi failed for dev 0:0
Oct 27 07:56:09 sys-net kernel: iwlwifi 0000:00:00.0: Xen PCI frontend error: -22!
Oct 27 07:56:09 sys-net kernel: iwlwifi 0000:00:00.0: pci_enable_msi failed - -22

HW42 · 2018-02-22T08:50:06Z

But should the domU's lspci output differ between Qubes and Xen?

That's expected since vanilla Xen doesn't use a stubdom by default (and we have a custom Linux based stubdom).

qdomu: Capabilities blocks 40, 50, 60, legacy INT(?)
xdomu: Capability block 40, MSI-X

Are you sure you didn't swap xdomu.txt and qdomu.txt? I would expect them the other way around.

Also I don't see MSI-X in neither (in Qubes that's expected). Why do you think it's using MSI-X?

awokd · 2018-02-22T09:12:32Z

Yes, I'm sure I didn't swap them. Note the lack of Kernel driver in use: ath9k in the Qubes one.

Because it's on IRQ 36. My understanding is Legacy interrupt values go up to 16, MSI up to 32, and MSI-X up to 2048 (but maybe that is folklore).

HW42 · 2018-02-22T09:33:04Z

Anyway, I think it's rather not interrupt related but:

Region 0: Memory at f0100000 (64-bit, non-prefetchable) [disabled] [size=64K]

Note the disabled. Please post xl dmesg (ideally with loglvl=all. Dom0 dmesg also doesn't hurt but probably not needed)

marmarek · 2018-02-22T10:18:00Z

I gave one ath9k card to marmarek, but I think he wasnt able to test it just yet.

I've tried and the card isn't even visible on lspci in dom0. But it may be something with my laptop...

awokd · 2018-02-22T10:47:23Z

That [disabled] is interesting. I'd assumed it was an artefact of Qubes hiding PCI devices but when I tested Xen with xen-pciback.hide=(02:00.0) just now, it continued to be enabled.
Attaching the xl dmesg from both.

qdmesg.txt
xdmesg.txt

h01ger · 2018-02-22T11:09:00Z

On Thu, Feb 22, 2018 at 10:18:02AM +0000, Marek Marczykowski-Górecki wrote: > I gave one ath9k card to marmarek, but I think he wasnt able to test it just yet. I've tried and the card isn't even visible on lspci in dom0. But it may be something with my laptop...

what model is that? iirc it was an x230 with coreboot, right? that would be quite strange indeed...

…

-- cheers, Holger

marmarek · 2018-02-22T11:42:57Z

Yes. I'll try another card in that slot (the slot that is working with the intel wifi is too small for this one).

awokd · 2018-02-24T20:39:20Z

@HW42 : Noticed something else in that qdomu.txt file- it has the 50 and 60 MSI capabilities but only the standard PCI configuration space (the -xxxx dump only goes up to 0xff). In qdom0 it shows the PCIe extended config space in the dump. Attempting to follow the logic in xen-4.8.3/tools/qemu-xen/hw/pci/pcie.c was uninformative, so not sure if one has anything to do with the other (or if I'm even in the right area). Could this also be related to the [disabled] memory?
The "missing" configuration space also seem to line up with the range of memory registers the driver crashes on when it attempts to write.

marmarek · 2018-03-07T21:32:36Z

Ok, I've tried the card in another slot and it is visible. And crashes sys-net very similar way: EPT violation (-w-/r-x). When I switch sys-net to PV, it also crashes, but with more useful message, very similar to the one from stackoverflow:

[    4.324539] BUG: unable to handle kernel paging request at ffffc90001c70040
[    4.324585] IP: iowrite32+0x2b/0x30
[    4.324607] PGD 18818067 P4D 18818067 PUD 18817067 PMD 11beb067 PTE 80100000f1500075
[    4.324665] Oops: 0003 [#1] SMP NOPTI
[    4.324688] Modules linked in: ath9k(+) ath9k_common ath9k_hw mac80211 ath cfg80211 rfkill e1000e ptp pps_core intel_rapl x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel intel_rapl_perf pcspkr xen_pcifront xenfs xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn u2mfn(O) xen_blkfront
[    4.324842] CPU: 0 PID: 233 Comm: kworker/0:2 Tainted: G           O    4.14.18-1.pvops.qubes.x86_64 #1
[    4.324891] Workqueue: events work_for_cpu_fn
[    4.324918] task: ffff88001059db80 task.stack: ffffc900019d4000
[    4.324952] RIP: e030:iowrite32+0x2b/0x30
[    4.324973] RSP: e02b:ffffc900019d7cc0 EFLAGS: 00010296
[    4.325008] RAX: 0000000000000000 RBX: ffff880010f78028 RCX: 0000000000000005
[    4.325048] RDX: ffffc90001c70040 RSI: ffffc90001c70040 RDI: 0000000000000000
[    4.325077] RBP: ffff880010f78078 R08: 0000000000000000 R09: 00000000ffffff90
[    4.325090] R10: 000000000000003f R11: 0000000000000000 R12: ffffffffc03467d0
[    4.325104] R13: 0000000000000002 R14: 0000000000000100 R15: ffff880010f78028
[    4.325127] FS:  0000000000000000(0000) GS:ffff880013a00000(0000) knlGS:0000000000000000
[    4.325142] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.325153] CR2: ffff80000078a800 CR3: 0000000010a58000 CR4: 0000000000042660
[    4.325175] Call Trace:
[    4.325195]  ath9k_enable_mib_counters+0x4a/0x80 [ath9k_hw]
[    4.325212]  ath9k_hw_init+0x632/0xb00 [ath9k_hw]
[    4.325226]  ? __queue_work+0x420/0x420
[    4.325241]  ath9k_init_device+0x5fb/0xdb0 [ath9k]
[    4.325256]  ? request_threaded_irq+0xfa/0x160
[    4.325272]  ath_pci_probe+0x20e/0x3d0 [ath9k]
[    4.325287]  local_pci_probe+0x3f/0x90
[    4.325297]  ? __schedule+0x3d3/0x850
[    4.325307]  work_for_cpu_fn+0x10/0x20
[    4.325318]  process_one_work+0x181/0x390
[    4.325328]  worker_thread+0x1d7/0x3c0
[    4.325337]  kthread+0xfc/0x130
[    4.325347]  ? process_one_work+0x390/0x390
[    4.325357]  ? kthread_create_on_node+0x70/0x70
[    4.325368]  ret_from_fork+0x35/0x40
[    4.325378] Code: 48 81 fe ff ff 03 00 48 89 f2 77 1f 48 81 fe 00 00 01 00 76 07 0f b7 d6 89 f8 ef c3 48 c7 c6 5c 8d 0d 82 48 89 d7 e9 95 fe ff ff <89> 3e c3 66 90 48 81 ff ff ff 03 00 77 28 48 81 ff 00 00 01 00 
[    4.325431] RIP: iowrite32+0x2b/0x30 RSP: ffffc900019d7cc0
[    4.325441] CR2: ffffc90001c70040
[    4.325452] ---[ end trace 4c9dd820b875aec9 ]---
[    4.325460] Kernel panic - not syncing: Fatal exception
[    4.325472] Kernel Offset: disabled

h01ger · 2018-03-10T18:58:33Z

I pointed @nbd168 at this and this is what he said:

I dont believe that the drivers writes into wrong memory areas
I rather think that the pci ranges are not set up correctly
which is why legitimate accesses are blocked
but I know too little about pci to know what exactly happens there
but the register writes on addr < 0x1000 are definitly valid
who/what is setting up those pci ranges?
i think the BARs which the pci driver reads from the config registers
so either the BARs are broken themselves, or they are interpreted differently

awokd · 2018-03-10T20:54:58Z

https://lists.gt.net/xen/devel/439033?page=last

Is this BAR the same BAR which has the MSI-X table in? For safety, Xen
has to trap and emulate updates to the MSI/MSI-X configuration. It is
possible that that logic has gone wrong.

Looks like that thread might be from the same Stackoverflow poster. Seems like his MSI-X interrupts might have been disabled as well. Can I force them somewhere in Qubes? Maybe it's an upstream bug that only shows up with legacy interrupts, but I still don't get why my device and others' are falling back to using legacy ints under Qubes HVM but not Xen.

marmarek · 2018-03-10T23:32:21Z

MSI/MSI-X is broken in PV mode (#3217). But on Qubes HVM, MSI should work...
Relevant changes (possibly breaking MSI for PV) were part of XSA-237. But it was only about explicit enabling MSI/MSI-X by a hypercall, not direct config space write. The point about some trap on config space seems plausible.
There is possibly related code in Xen sources in arch/x86/hvm/vmsi.c, especially functions listed in msixtbl_mmio_ops structure.
I don't have that card plugged in anywhere right now to verify that hypothesis or collect more info. If you have, try collecting lspci -vv output before inserting the module. And also look at the address at which write fails. If that matches MSI address from lspci output, that's probably it.

awokd · 2018-03-14T22:21:51Z

Looking at the PCI bridge in front of the empty slot, it says the same thing under Xen and Qubes:

I/O behind bridge: 0000f000-00000fff [empty]
Memory behind bridge: fff00000-000fffff [empty]
Prefetchable memory behind bridge: fff00000-000fffff [empty]

Crash is at:

(XEN) svm.c:1540:d9v0 SVM violation gpa 0x000000f2020040, mfn 0xf0100, type 5

A different PCI bridge reports Cap [a0] with an MSI address, but the one associated with 02:00.0 reports no MSI capabilities (at least without a module installed.) Oddly, when I put in a different (Express v2) module, it gets an MSI-X interrupt assigned inside the Qubes HVM, the bridge still reports no MSI capabilities, but the device works perfectly.

I'll keep digging, thank you for the suggestions!

awokd · 2021-02-06T15:44:28Z

Ended up working around the issue by switching to a slightly newer model of Atheros. Suspect this older one has a draft implementation of PCIe which confuses Xen et. al.

h4xor666 · 2021-07-11T03:04:03Z

Ended up working around the issue by switching to a slightly newer model of Atheros. Suspect this older one has a draft implementation of PCIe which confuses Xen et. al.

Can I ask which one you got? I'm having literally the exact same issue.

awokd · 2021-07-11T21:03:00Z

Can I ask which one you got? I'm having literally the exact same issue.

AR5BHB116/AR9382

andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: other labels Feb 20, 2018

andrewdavidwong added this to the Release 4.0 milestone Feb 20, 2018

andrewdavidwong modified the milestones: Release 4.0, Release 4.0 updates Mar 31, 2018

awokd closed this as completed Feb 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atheros 928x PCI passthrough not working #3609

Atheros 928x PCI passthrough not working #3609

awokd commented Feb 19, 2018

awokd commented Feb 19, 2018

schnurentwickler commented Feb 19, 2018 •

edited

Loading

awokd commented Feb 19, 2018

marmarek commented Feb 20, 2018

awokd commented Feb 21, 2018 •

edited

Loading

HW42 commented Feb 21, 2018

HW42 commented Feb 21, 2018

HW42 commented Feb 21, 2018

awokd commented Feb 21, 2018

awokd commented Feb 21, 2018

HW42 commented Feb 22, 2018

h01ger commented Feb 22, 2018 via email

HW42 commented Feb 22, 2018

awokd commented Feb 22, 2018 •

edited

Loading

awokd commented Feb 22, 2018 •

edited

Loading

HW42 commented Feb 22, 2018

awokd commented Feb 22, 2018

HW42 commented Feb 22, 2018

marmarek commented Feb 22, 2018

awokd commented Feb 22, 2018

h01ger commented Feb 22, 2018 via email

marmarek commented Feb 22, 2018

awokd commented Feb 24, 2018 •

edited

Loading

marmarek commented Mar 7, 2018

h01ger commented Mar 10, 2018

awokd commented Mar 10, 2018 •

edited

Loading

marmarek commented Mar 10, 2018

awokd commented Mar 14, 2018

awokd commented Feb 6, 2021

h4xor666 commented Jul 11, 2021

awokd commented Jul 11, 2021

Atheros 928x PCI passthrough not working #3609

Atheros 928x PCI passthrough not working #3609

Comments

awokd commented Feb 19, 2018

Qubes OS version:

Affected TemplateVMs:

Steps to reproduce the behavior:

Expected behavior:

Actual behavior:

General notes:

Related issues:

awokd commented Feb 19, 2018

schnurentwickler commented Feb 19, 2018 • edited Loading

awokd commented Feb 19, 2018

marmarek commented Feb 20, 2018

awokd commented Feb 21, 2018 • edited Loading

HW42 commented Feb 21, 2018

HW42 commented Feb 21, 2018

HW42 commented Feb 21, 2018

awokd commented Feb 21, 2018

awokd commented Feb 21, 2018

HW42 commented Feb 22, 2018

h01ger commented Feb 22, 2018 via email

HW42 commented Feb 22, 2018

awokd commented Feb 22, 2018 • edited Loading

awokd commented Feb 22, 2018 • edited Loading

HW42 commented Feb 22, 2018

awokd commented Feb 22, 2018

HW42 commented Feb 22, 2018

marmarek commented Feb 22, 2018

awokd commented Feb 22, 2018

h01ger commented Feb 22, 2018 via email

marmarek commented Feb 22, 2018

awokd commented Feb 24, 2018 • edited Loading

marmarek commented Mar 7, 2018

h01ger commented Mar 10, 2018

awokd commented Mar 10, 2018 • edited Loading

marmarek commented Mar 10, 2018

awokd commented Mar 14, 2018

awokd commented Feb 6, 2021

h4xor666 commented Jul 11, 2021

awokd commented Jul 11, 2021

schnurentwickler commented Feb 19, 2018 •

edited

Loading

awokd commented Feb 21, 2018 •

edited

Loading

awokd commented Feb 22, 2018 •

edited

Loading

awokd commented Feb 22, 2018 •

edited

Loading

awokd commented Feb 24, 2018 •

edited

Loading

awokd commented Mar 10, 2018 •

edited

Loading