Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCI passthrough not working for HVM domains #1659

Closed
esheltone opened this issue Jan 19, 2016 · 90 comments
Closed

PCI passthrough not working for HVM domains #1659

esheltone opened this issue Jan 19, 2016 · 90 comments

Comments

@esheltone
Copy link

There have been multiple reports that PCI passthrough does not work for HVM domains using the qubes software:

https://groups.google.com/d/msg/qubes-users/cmPRMOkxkdA/gIV68O0-CQAJ (reporting passthrough not working via libvirt, but that passthrough still could be done using Xen xl)
https://groups.google.com/d/msg/qubes-users/ExMvykCyYiY/M3nHxweRFAAJ (confirmation by Marek that passthrough was not working on R3)
https://groups.google.com/d/msg/qubes-users/ppKj_YWqr94/l2gHv6uJAgAJ

This issue appears to have started with use of the HAL in Qubes R3. PCI passthrough continues to work fine for PV-based Qubes VMs, such as sys-net.

Marek guessed that it could be a qemu issue (see second linked post). However, in the first linked post, PCI passthrough was done to an HVM domain via 'xl' using "device_model_version = 'qemu-xen-traditional'", so this may rule out qemu as the culprit.

@marmarek marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: core C: Xen P: major Priority: major. Between "default" and "critical" in severity. labels Jan 19, 2016
@marmarek marmarek added this to the Release 3.1 milestone Jan 19, 2016
@marmarek
Copy link
Member

@rootkovska @mfc I've assigned priority "major" but maybe it deserve some higher?

@rootkovska
Copy link
Member

... or lower, rather? (as we currently we don't support HVM-based net/usb VMs, so this affects very few users)

@esheltone
Copy link
Author

I would say the affected users fall into two main categories: (1) users trying to get GPU passthrough working for their Windows HVMs, to be able to do real 3D graphics applications, etc., and (2) users wanting to pass through a USB controller, so they can use a webcam or other devices with Windows. I would like to be able to do things like pass through a storage adapter and network adapter directly to a FreeBSD HVM, but that is admittedly an even more specialized use case affecting almost no one else.

I think it would be good to clearly document that passthrough is not available for HVM domains and perhaps also remove the capability from Qubes Manager for HVM domains until this issue is resolved.

@esheltone
Copy link
Author

Another use case: wanting to be able to have sound output in Windows (currently unavailable by any means): https://groups.google.com/forum/#!topic/qubes-users/BBWF9wguP-0

@ideologysec
Copy link

Would be great to have PCI passthrough on a dual-GPU laptop, for gaming purposes in Windows without sacrificing the isolation that Qubes provides. (audio via device passthrough is nice but really a working Qubes Windows Tools audio driver for better interaction with the rest of the system and not specialized applications, would be great).

@wphowell
Copy link

There is actually a 3rd category: disk controllers. It isn't possible to rip disks without the raw device being available to the VM.

@marmarek
Copy link
Member

Some tests reveals that the problem is in stubdomain - the very same config (xl create directly) with device_model_version = 'qemu-xen-traditional' does work, but with device_model_stubdomain_override=1 does not.
Next thing to check: unpatched libxl to rule out our patches breaking it.

@marmarek
Copy link
Member

Using libxl (xen packages 4.6.1) from Fedora 24 it does work, even with stubdomain...

@tirrorex
Copy link

Though fedora 24 wasn't due until april?
When you say it is working, you mean flawless with every device just like kvm?

@marmarek
Copy link
Member

Though fedora 24 wasn't due until april?

Yes, I've used packages from rawhide - to have the same version (F23 has Xen 4.5).

When you say it is working, you mean flawless with every device just like kvm?

I've just tried one sample device and it is properly discovered in the VM. With our libxl/stubdom packages it doesn't show at all.

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Feb 26, 2016
SystemTestsMixin.prepare_hvm_system_linux creates minimal Linux
installation necessary to launch simple shell script. It installs:
 - grub2
 - kernel from dom0 (the same as the running one)
 - dracut based initramfs, with provided script set as pre-pivot hook

Done in preparation for QubesOS/qubes-issues#1659 test
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Feb 26, 2016
A simple test which checks if the device is visible there at all.
Device set with QUBES_TEST_PCIDEV env variable is used - it should be
some unimportant device which can be freely detached from dom0.

QubesOS/qubes-issues#1659
@marmarek
Copy link
Member

Made automated test for this issue (annotated with "expected failure" for now). Should ease debugging (for example bisection).
Also, I'm unable to reproduce the success with unmodified xen-4.6.0 toolstack (+qemu) compiled manually. Maybe that success was previous because of some Fedora patch. Or it is some race condition (Fedora was running from USB stick, while Qubes from fast SSD disk). Or something totally different...

@esheltone
Copy link
Author

The race condition idea seems unlikely, since the problem we are chasing is seen across all systems on Qubes.

There is a surprising number of patches in Fedora for Xen. However, at a glance, the only patches that seem to touch on code that would relate to this kind of problem are the patches for XSAs 154, 164, and 170 - assuming one of these patches is responsible.

@jaspertron
Copy link

@marmarek, is this an accurate summary of your testing?

stubdomain 'qemu-xen-traditional'
Xen 4.6.0 with Qubes patches broken working
Xen 4.6.0 (unmodified) broken working
Xen 4.6.1 with Fedora patches working working

Also, how did you create the xl config file for testing? I tried doing
virsh -c xen:/// domxml-to-native xen-xl /etc/libvirt/libxl/my-test-hvm.xml
but that gives me

error: Disconnected from xen:/// due to I/O error
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor

@marmarek
Copy link
Member

I would rather say "Fedora 24" instead of "Xen 4.6.1 with Fedora patches" - this may be broken by some other package than Xen (some library used or so). Additionally, I wasn't able to reproduce the success when running Qubes but launching domain using Fedora 24 binaries (from chroot). Which is another hint it isn't about just toolstack/qemu.

As for config file - something like this. And indeed it seems to be crashing libvirtd... It looks like the bug is triggered by lack of <graphics type='vnc'/> entry (which is intentional on Qubes). Anyway adding it produced some config file. Then, to enable stubdomain you need to add device_model_stubdomain_override=1. And probably set vnc = 0 ;)

I guess the whole problem may have something to do with disabled qemu in dom0 (in addition to stubdomain). This isn't fully consistent with test results, but there may be some other factors.

@marmarek
Copy link
Member

Found configs from those tests: https://gist.github.com/marmarek/794305496557cc679fced21e252e05b4
May contain some later changes though...

@jaspertron
Copy link

It looks like the bug is triggered by lack of entry (which is intentional on Qubes). Anyway adding it produced some config file.

Thanks, that did the trick.
I'm getting the same results as you; it only works without a stubdomain.

Additionally, I wasn't able to reproduce the success when running Qubes but launching domain using Fedora 24 binaries (from chroot). Which is another hint it isn't about just toolstack/qemu.

Could the xen-pciback kernel module be to blame? Does Qubes make any modifications to it?

@marmarek
Copy link
Member

Could the xen-pciback kernel module be to blame? Does Qubes make any modifications to it?

No, we don't have any modifications there.

It was working in Qubes R2, but there are a lot of differences:

  • Xen version (was 4.1) - all the parts: hypervisor, qemu, toolstack
  • Kernel version (was 3.12, probably irrelevant)
  • Libvirt usage vs xl directly (this should be excluded by above tests)

Next thing I'd check is qemu in stubdomain - simply get stubdomain binary from R2 and try it on R3.x. It is in /usr/lib/xen/boot/ioemu-stubdom.gz, which is shipped in xen-hvm rpm.

@jaspertron
Copy link

Next thing I'd check is qemu in stubdomain - simply get stubdomain binary from R2 and try it on R3.x. It is in /usr/lib/xen/boot/ioemu-stubdom.gz, which is shipped in xen-hvm rpm.

Ok, I replaced /usr/lib/xen/boot/ioemu-stubdom.gz with the ioemu-stubdom.gz from Qubes-R2-x86_64-DVD.iso. Unfortunately it doesn't want to start:

[user@dom0 ~]$ sudo xl create pcihvm.xl 
Parsing config from pcihvm.xl
libxl: error: libxl_dm.c:1671:stubdom_xswait_cb: Stubdom 13 for 12 startup: startup timed out
libxl: error: libxl_create.c:1339:domcreate_devmodel_started: device model did not start: -9
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [10199] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [10197] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
libxl: error: libxl.c:1606:libxl__destroy_domid: non-existant domain 12
libxl: error: libxl.c:1564:domain_destroy_callback: unable to destroy guest with domid 12
libxl: error: libxl.c:1491:domain_destroy_cb: destruction of domain 12 failed

Here's some of the output from /var/log/xen/qemu-dm-pcihvm.log:

---snip---
Register xen platform.
Done register platform.
xs_watch(/local/domain/12/log-throttling, /local/domain/12/log-throttling)
platform_fixed_ioport: changed ro/rw state of ROM memory area. now is rw state.
qubes_gui/init: 660
qubes_gui/init: 669
qubes_gui/init: 672
qubes_gui/init: 681
xs_daemon_open -> 9, 0x1609f8
evtchn_open() -> 10
xc_evtchn_bind_unbound_port(0) = 0
xs_write(device/vchan/6000/ring-ref): EACCES
close(10)
libvchan_server_init: 
close(0)
GPF rip: 0xfc514, error_code=0
Thread: main
RIP: e030:[<00000000000fc514>] 
RSP: e02b:00000000005ef8a8  EFLAGS: 00010202
RAX: 2f302f6e69616d6f RBX: 0000002002c087c0 RCX: 0000000000001055
RDX: 000000000000000a RSI: 00000000005ef798 RDI: 2f302f6e69616d6f
RBP: 00000000005ef8a8 R08: 000000000000000a R09: 0000000000576000
R10: 000000000000104b R11: 0000000000000ffa R12: 0000000000000000
R13: 00000000001635f0 R14: 0000000000000000 R15: 0000000000163558
base is 0x5ef8a8 caller is 0xe2a78
base is 0x5ef908 caller is 0xe2635
base is 0x5ef918 caller is 0xddc61
base is 0x5ef938 caller is 0x1047ed
base is 0x5ef958 caller is 0xfad7d
base is 0x5ef968 caller is 0xf4510
base is 0x5ef998 caller is 0xf45ac
base is 0x5ef9b8 caller is 0xf6379
base is 0x5efa08 caller is 0xf4b2e
base is 0x5efa18 caller is 0xf4474
base is 0x5efa38 caller is 0x24790
base is 0x5efa58 caller is 0x24002
base is 0x5efa78 caller is 0x8ea6
base is 0x5efe08 caller is 0xd7129
base is 0x5effe8 caller is 0x33da

5ef890: a8 f8 5e 00 00 00 00 00 2b e0 00 00 00 00 00 00
5ef8a0: 01 00 00 00 00 00 00 00 08 f9 5e 00 00 00 00 00
5ef8b0: 78 2a 0e 00 00 00 00 00 01 00 00 00 00 00 00 00
5ef8c0: 1a 00 00 00 00 00 00 00 f8 f8 5e 00 00 00 00 00

5ef890: a8 f8 5e 00 00 00 00 00 2b e0 00 00 00 00 00 00
5ef8a0: 01 00 00 00 00 00 00 00 08 f9 5e 00 00 00 00 00
5ef8b0: 78 2a 0e 00 00 00 00 00 01 00 00 00 00 00 00 00
5ef8c0: 1a 00 00 00 00 00 00 00 f8 f8 5e 00 00 00 00 00

fc500: ca 48 85 f2 74 ea eb 0c 0f 1f 84 00 00 00 00 00
fc510: 48 83 c0 01 80 38 00 75 f7 48 29 f8 5d c3 66 90
fc520: 55 48 89 f8 48 89 f9 a8 07 48 89 e5 75 56 48 8b
fc530: 0f 49 ba ff fe fe fe fe fe fe fe 49 89 c8 4c 01

Any ideas?

@marmarek
Copy link
Member

vchan library is different in R2 than in R3.x... You can try to fake the old one, execute:

xenstore-write /local/domain/`xl domid pcihvm-dm`/device/vchan ''
xenstore-chmod /local/domain/`xl domid pcihvm-dm`/device/vchan n`xl domid pcihvm-dm`

But you need to be very fast with this, as you need to make it before stubdom reach this GUI initialization. Maybe xl create -p will help, but AFAIR it only keep the target domain paused, not stubdomain.

@marmarek
Copy link
Member

Automated announcement from builder-github

The package xen_4.6.3-24+deb8u1 has been pushed to the r3.2 testing repository for the Debian jessie template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing jessie-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@marmarek
Copy link
Member

Automated announcement from builder-github

The package xen_4.6.3-24+deb9u1 has been pushed to the r3.2 testing repository for the Debian stretch template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@marmarek
Copy link
Member

Automated announcement from builder-github

The package xen-4.6.3-24.fc20 has been pushed to the r3.1 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@marmarek
Copy link
Member

marmarek commented Dec 4, 2016

Automated announcement from builder-github

The package xen-4.6.3-24.fc21 has been pushed to the r3.1 stable repository for the Fedora fc21 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

@marmarek
Copy link
Member

marmarek commented Dec 4, 2016

Automated announcement from builder-github

The package xen_2001:4.6.3-24+deb8u1 has been pushed to the r3.2 stable repository for the Debian jessie template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@marmarek
Copy link
Member

marmarek commented Dec 4, 2016

Automated announcement from builder-github

The package xen-4.6.3-24.fc22 has been pushed to the r3.1 stable repository for the Fedora fc22 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

@marmarek
Copy link
Member

marmarek commented Dec 4, 2016

Automated announcement from builder-github

The package xen-4.6.3-24.fc23 has been pushed to the r3.1 stable repository for the Fedora fc23 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

@marmarek
Copy link
Member

marmarek commented Dec 5, 2016

Automated announcement from builder-github

The package xen_4.6.3-24+deb8u1 has been pushed to the r3.1 testing repository for the Debian jessie template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing jessie-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@marmarek
Copy link
Member

marmarek commented Dec 5, 2016

Automated announcement from builder-github

The package xen_4.6.3-24+deb9u1 has been pushed to the r3.1 testing repository for the Debian stretch template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@marmarek
Copy link
Member

marmarek commented Dec 5, 2016

Automated announcement from builder-github

The package xen_4.6.3-24+deb7u1 has been pushed to the r3.1 testing repository for the Debian wheezy template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing wheezy-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package xen_2001:4.6.3-24+deb9u1 has been pushed to the r3.2 stable repository for the Debian stretch template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants