Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network adapter driver crash with kernel 5.x in HVM #6424

Closed
antarisherdeng4 opened this issue Feb 24, 2021 · 3 comments
Closed

Network adapter driver crash with kernel 5.x in HVM #6424

antarisherdeng4 opened this issue Feb 24, 2021 · 3 comments
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: kernel eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@antarisherdeng4
Copy link

antarisherdeng4 commented Feb 24, 2021

Qubes OS version

R4.0

Affected component(s) or functionality

kernel 5.x driver for network adapter
Ethernet controller: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02)

Brief summary

I have an issue with PCI passthrough network adapter in debian HVM AppVM with qubes kernel 5.4/5.10. It works fine with kernel 4.19, but fails with newer kernel.
I've tried to run this HVM with dom0 kernel 5.10.16-1.fc25.qubes.x86_64 and using kernel installed in VM 5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1 the result is the same - the network driver crashes when trying to bring interface up. The logs are below.
I've tested this network adapter on another machine with Debian and kernel 5.10 and it works fine:
Linux debian 5.10.0-0.bpo.3-amd64 #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) x86_64 GNU/Linux
So it has to be a problem with Aquantia driver in kernel 5.x in HVM.

How Reproducible

Happens when network adapter is going up with:
sudo ip link set ens6 up

To Reproduce

Try to bring interface up with:
sudo ip link set ens6 up

Expected behavior

Network adapter works.

Actual behavior

Network adapter is not working.

Screenshots

Additional context

Crash log with dom0 kernel 5.10.16-1.fc25.qubes.x86_64:

Feb 20 14:03:09 localhost kernel: [ 2945.902672] ------------[ cut here ]------------
Feb 20 14:03:09 localhost kernel: [ 2945.902690] WARNING: CPU: 1 PID: 892 at /home/user/rpmbuild/BUILD/kernel-latest-5.10.16/linux-5.10.16/drivers/pci/msi.c:1273 pci_irq_vector+0x5f/0x80
Feb 20 14:03:09 localhost kernel: [ 2945.902715] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack nft_counter nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink joydev drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel drm_kms_helper cec snd_pcm ghash_clmulni_intel snd_timer snd atlantic drm ehci_pci soundcore pcspkr serio_raw ehci_hcd macsec ata_generic pata_acpi i2c_piix4 floppy nfsd auth_rpcgss nfs_acl xen_netback lockd grace u2mfn(O) xen_gntdev xen_gntalloc xen_blkback nfs_ssc xen_evtchn parport_pc ppdev sunrpc xenfs xen_privcmd lp parport ip_tables overlay xen_blkfront
Feb 20 14:03:09 localhost kernel: [ 2945.902853] CPU: 1 PID: 892 Comm: ip Tainted: G           O      5.10.16-1.fc25.qubes.x86_64 #1
Feb 20 14:03:09 localhost kernel: [ 2945.902873] Hardware name: Xen HVM domU, BIOS 4.8.5-29.fc25 02/18/2021
Feb 20 14:03:09 localhost kernel: [ 2945.902888] RIP: 0010:pci_irq_vector+0x5f/0x80
Feb 20 14:03:09 localhost kernel: [ 2945.902900] Code: 48 39 fa 75 f1 0f 0b b8 ea ff ff ff c3 a8 10 75 0d 85 f6 75 21 8b 87 ac 03 00 00 01 f0 c3 48 8b 87 f0 02 00 00 3b 70 14 72 eb <0f> 0b b8 ea ff ff ff c3 8b 42 10 c3 0f 0b b8 ea ff ff ff c3 0f 1f
Feb 20 14:03:09 localhost kernel: [ 2945.902938] RSP: 0018:ffffbdc9808373b0 EFLAGS: 00010246
Feb 20 14:03:09 localhost kernel: [ 2945.902951] RAX: ffff967d4a2e0080 RBX: ffff967d41366000 RCX: ffffffffc05f9f40
Feb 20 14:03:09 localhost kernel: [ 2945.902967] RDX: ffff967d46e15000 RSI: 0000000000000001 RDI: ffff967d41366000
Feb 20 14:03:09 localhost kernel: [ 2945.902983] RBP: ffff967d46e15940 R08: ffff967d484f2000 R09: ffff967d484f2028
Feb 20 14:03:09 localhost kernel: [ 2945.902999] R10: 000000000000000c R11: ffffffff90744648 R12: 0000000000000001
Feb 20 14:03:09 localhost kernel: [ 2945.903015] R13: ffff967d484f2028 R14: ffff967d46e15000 R15: ffff967d484f2000
Feb 20 14:03:09 localhost kernel: [ 2945.903033] FS:  0000721b8e892e40(0000) GS:ffff967d7d700000(0000) knlGS:0000000000000000
Feb 20 14:03:09 localhost kernel: [ 2945.903049] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 14:03:09 localhost kernel: [ 2945.903063] CR2: 00006208f015cff0 CR3: 000000000ced6000 CR4: 00000000003506e0
Feb 20 14:03:09 localhost kernel: [ 2945.903080] Call Trace:
Feb 20 14:03:09 localhost kernel: [ 2945.903093]  aq_pci_func_alloc_irq+0x3e/0xd0 [atlantic]
Feb 20 14:03:09 localhost kernel: [ 2945.903109]  ? aq_vec_ring_alloc+0xe0/0xe0 [atlantic]
Feb 20 14:03:09 localhost kernel: [ 2945.903124]  ? aq_vec_ring_alloc+0xe0/0xe0 [atlantic]
Feb 20 14:03:09 localhost kernel: [ 2945.903137]  aq_nic_start+0x1be/0x360 [atlantic]
Feb 20 14:03:09 localhost kernel: [ 2945.903151]  aq_ndev_open+0x3d/0x60 [atlantic]
Feb 20 14:03:09 localhost kernel: [ 2945.903165]  __dev_open+0x101/0x190
Feb 20 14:03:09 localhost kernel: [ 2945.903175]  __dev_change_flags+0x1ae/0x1f0
Feb 20 14:03:09 localhost kernel: [ 2945.903185]  dev_change_flags+0x23/0x60
Feb 20 14:03:09 localhost kernel: [ 2945.903195]  do_setlink+0x3d7/0x1010
Feb 20 14:03:09 localhost kernel: [ 2945.903206]  __rtnl_newlink+0x5c7/0x910
Feb 20 14:03:09 localhost kernel: [ 2945.903218]  rtnl_newlink+0x47/0x70
Feb 20 14:03:09 localhost kernel: [ 2945.903230]  ? ns_capable_common+0x27/0x50
Feb 20 14:03:09 localhost kernel: [ 2945.903253]  rtnetlink_rcv_msg+0x166/0x380
Feb 20 14:03:09 localhost kernel: [ 2945.903263]  ? rtnl_calcit.isra.34+0x130/0x130
Feb 20 14:03:09 localhost kernel: [ 2945.903274]  netlink_rcv_skb+0xc4/0x100
Feb 20 14:03:09 localhost kernel: [ 2945.903284]  netlink_unicast+0x1bb/0x280
Feb 20 14:03:09 localhost kernel: [ 2945.903293]  netlink_sendmsg+0x323/0x460
Feb 20 14:03:09 localhost kernel: [ 2945.903302]  sock_sendmsg+0x5b/0x60
Feb 20 14:03:09 localhost kernel: [ 2945.903310]  ____sys_sendmsg+0x277/0x2a0
Feb 20 14:03:09 localhost kernel: [ 2945.903319]  ? copy_msghdr_from_user+0x6e/0xa0
Feb 20 14:03:09 localhost kernel: [ 2945.903330]  ___sys_sendmsg+0xa6/0xf0
Feb 20 14:03:09 localhost kernel: [ 2945.903340]  ? __sys_sendmsg+0x8a/0xd0
Feb 20 14:03:09 localhost kernel: [ 2945.903348]  __sys_sendmsg+0x8a/0xd0
Feb 20 14:03:09 localhost kernel: [ 2945.903358]  do_syscall_64+0x33/0x40
Feb 20 14:03:09 localhost kernel: [ 2945.903367]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 20 14:03:09 localhost kernel: [ 2945.903378] RIP: 0033:0x721b8ec43914
Feb 20 14:03:09 localhost kernel: [ 2945.903387] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
Feb 20 14:03:09 localhost kernel: [ 2945.903422] RSP: 002b:00007ffe22e55508 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
Feb 20 14:03:09 localhost kernel: [ 2945.903437] RAX: ffffffffffffffda RBX: 000000006031169e RCX: 0000721b8ec43914
Feb 20 14:03:09 localhost kernel: [ 2945.903451] RDX: 0000000000000000 RSI: 00007ffe22e55570 RDI: 0000000000000003
Feb 20 14:03:09 localhost kernel: [ 2945.903466] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffe22e568d1
Feb 20 14:03:09 localhost kernel: [ 2945.903480] R10: fffffffffffff484 R11: 0000000000000246 R12: 0000000000000001
Feb 20 14:03:09 localhost kernel: [ 2945.903495] R13: 000058a024ddc580 R14: 0000000000000000 R15: 00007ffe22e55d48
Feb 20 14:03:09 localhost kernel: [ 2945.903510] ---[ end trace 12ab25a489cd7283 ]---

Crash log with kernel installed in VM 5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1:

Feb 20 12:17:12 localhost kernel: [   25.729316] ------------[ cut here ]------------
Feb 20 12:17:12 localhost kernel: [   25.729321] WARNING: CPU: 0 PID: 944 at drivers/pci/msi.c:1273 pci_irq_vector+0x5f/0x80
Feb 20 12:17:12 localhost kernel: [   25.729322] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_state xt_conntrack nft_counter nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink crc32_pclmul ghash_clmulni_intel joydev bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper aesni_intel libaes crypto_simd cec cryptd glue_helper drm evdev pcspkr serio_raw button nfsd xen_netback u2mfn(OE) auth_rpcgss xen_gntdev xen_gntalloc xen_blkback nfs_acl xen_evtchn parport_pc lockd grace ppdev xenfs xen_privcmd lp sunrpc parport ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid dm_snapshot dm_bufio dm_mod ata_generic ata_piix crct10dif_pclmul crct10dif_common libata ehci_pci ehci_hcd scsi_mod atlantic usbcore macsec xen_blkfront crc32c_intel ptp pps_core psmouse usb_common i2c_piix4 floppy
Feb 20 12:17:12 localhost kernel: [   25.729359] CPU: 0 PID: 944 Comm: ip Tainted: G           OE     5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1
Feb 20 12:17:12 localhost kernel: [   25.729360] Hardware name: Xen HVM domU, BIOS 4.8.5-29.fc25 01/04/2021
Feb 20 12:17:12 localhost kernel: [   25.729361] RIP: 0010:pci_irq_vector+0x5f/0x80
Feb 20 12:17:12 localhost kernel: [   25.729363] Code: 48 39 d7 75 f1 0f 0b b8 ea ff ff ff c3 a8 10 75 0d 85 f6 75 21 8b 87 a4 03 00 00 01 f0 c3 48 8b 87 f0 02 00 00 39 70 14 77 eb <0f> 0b b8 ea ff ff ff c3 8b 42 10 c3 0f 0b b8 ea ff ff ff c3 66 66
Feb 20 12:17:12 localhost kernel: [   25.729364] RSP: 0018:ffffafd000c0b4c8 EFLAGS: 00010246
Feb 20 12:17:12 localhost kernel: [   25.729365] RAX: ffff9d5bf5c42000 RBX: ffff9d5bfdd97000 RCX: ffffffffc024bcc0
Feb 20 12:17:12 localhost kernel: [   25.729365] RDX: ffff9d5bf51e8000 RSI: 0000000000000001 RDI: ffff9d5bfdd97000
Feb 20 12:17:12 localhost kernel: [   25.729366] RBP: ffff9d5bf51e8940 R08: ffff9d5bf262a000 R09: ffff9d5bf262a028
Feb 20 12:17:12 localhost kernel: [   25.729366] R10: 000000000000000b R11: ffffffff88acb528 R12: ffff9d5bf51e8000
Feb 20 12:17:12 localhost kernel: [   25.729367] R13: 0000000000000001 R14: ffff9d5bf262a028 R15: ffff9d5bf262a000
Feb 20 12:17:12 localhost kernel: [   25.729370] FS:  00007f2f1a481e40(0000) GS:ffff9d5bfdc00000(0000) knlGS:0000000000000000
Feb 20 12:17:12 localhost kernel: [   25.729370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 12:17:12 localhost kernel: [   25.729371] CR2: 000055f3547677c0 CR3: 00000000027c8000 CR4: 00000000003506f0
Feb 20 12:17:12 localhost kernel: [   25.729373] Call Trace:
Feb 20 12:17:12 localhost kernel: [   25.729381]  aq_pci_func_alloc_irq+0x3e/0xd0 [atlantic]
Feb 20 12:17:12 localhost kernel: [   25.729385]  ? aq_vec_ring_alloc+0xe0/0xe0 [atlantic]
Feb 20 12:17:12 localhost kernel: [   25.729389]  ? aq_vec_ring_alloc+0xe0/0xe0 [atlantic]
Feb 20 12:17:12 localhost kernel: [   25.729392]  aq_nic_start+0x1a5/0x340 [atlantic]
Feb 20 12:17:12 localhost kernel: [   25.729396]  aq_ndev_open+0x3d/0x60 [atlantic]
Feb 20 12:17:12 localhost kernel: [   25.729398]  __dev_open+0xe8/0x180
Feb 20 12:17:12 localhost kernel: [   25.729400]  __dev_change_flags+0x1a7/0x210
Feb 20 12:17:12 localhost kernel: [   25.729402]  dev_change_flags+0x21/0x60
Feb 20 12:17:12 localhost kernel: [   25.729403]  do_setlink+0x328/0x10e0
Feb 20 12:17:12 localhost kernel: [   25.729405]  ? __nla_validate_parse+0x5f/0xb00
Feb 20 12:17:12 localhost kernel: [   25.729407]  __rtnl_newlink+0x541/0x8e0
Feb 20 12:17:12 localhost kernel: [   25.729410]  ? get_page_from_freelist+0x110b/0x1330
Feb 20 12:17:12 localhost kernel: [   25.729414]  ? _cond_resched+0x15/0x30
Feb 20 12:17:12 localhost kernel: [   25.729416]  ? kmem_cache_alloc_trace+0x319/0x430
Feb 20 12:17:12 localhost kernel: [   25.729417]  rtnl_newlink+0x43/0x60
Feb 20 12:17:12 localhost kernel: [   25.729419]  rtnetlink_rcv_msg+0x12c/0x380
Feb 20 12:17:12 localhost kernel: [   25.729420]  ? rtnl_calcit.isra.39+0x110/0x110
Feb 20 12:17:12 localhost kernel: [   25.729422]  netlink_rcv_skb+0x50/0x100
Feb 20 12:17:12 localhost kernel: [   25.729423]  netlink_unicast+0x1a5/0x280
Feb 20 12:17:12 localhost kernel: [   25.729425]  netlink_sendmsg+0x23d/0x470
Feb 20 12:17:12 localhost kernel: [   25.729426]  sock_sendmsg+0x5b/0x60
Feb 20 12:17:12 localhost kernel: [   25.729428]  ____sys_sendmsg+0x1ef/0x260
Feb 20 12:17:12 localhost kernel: [   25.729429]  ? copy_msghdr_from_user+0x5c/0x90
Feb 20 12:17:12 localhost kernel: [   25.729430]  ? mntput_no_expire+0x47/0x240
Feb 20 12:17:12 localhost kernel: [   25.729432]  ___sys_sendmsg+0x7c/0xc0
Feb 20 12:17:12 localhost kernel: [   25.729434]  ? tomoyo_path_number_perm+0x68/0x1e0
Feb 20 12:17:12 localhost kernel: [   25.729436]  ? __mod_memcg_lruvec_state+0x21/0x100
Feb 20 12:17:12 localhost kernel: [   25.729437]  ? kmem_cache_free+0x239/0x410
Feb 20 12:17:12 localhost kernel: [   25.729439]  ? var_wake_function+0x20/0x20
Feb 20 12:17:12 localhost kernel: [   25.729440]  ? fsnotify_grab_connector+0x46/0x80
Feb 20 12:17:12 localhost kernel: [   25.729441]  ? __mod_memcg_lruvec_state+0x21/0x100
Feb 20 12:17:12 localhost kernel: [   25.729442]  ? kmem_cache_free+0x239/0x410
Feb 20 12:17:12 localhost kernel: [   25.729443]  ? mntput_no_expire+0x47/0x240
Feb 20 12:17:12 localhost kernel: [   25.729444]  __sys_sendmsg+0x57/0xa0
Feb 20 12:17:12 localhost kernel: [   25.729446]  do_syscall_64+0x33/0x80
Feb 20 12:17:12 localhost kernel: [   25.729448]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 20 12:17:12 localhost kernel: [   25.729449] RIP: 0033:0x7f2f1a832914
Feb 20 12:17:12 localhost kernel: [   25.729451] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
Feb 20 12:17:12 localhost kernel: [   25.729451] RSP: 002b:00007fff0b0a1a28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
Feb 20 12:17:12 localhost kernel: [   25.729452] RAX: ffffffffffffffda RBX: 000000006030fdc8 RCX: 00007f2f1a832914
Feb 20 12:17:12 localhost kernel: [   25.729453] RDX: 0000000000000000 RSI: 00007fff0b0a1a90 RDI: 0000000000000003
Feb 20 12:17:12 localhost kernel: [   25.729453] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fff0b0a38d1
Feb 20 12:17:12 localhost kernel: [   25.729454] R10: fffffffffffff484 R11: 0000000000000246 R12: 0000000000000001
Feb 20 12:17:12 localhost kernel: [   25.729454] R13: 000055a644393580 R14: 0000000000000000 R15: 00007fff0b0a2268
Feb 20 12:17:12 localhost kernel: [   25.729456] ---[ end trace 2eba8960e48cd369 ]---

Solutions you've tried

I've tried to install firmware-linux-nonfree as suggested here:
sudo apt-get install -t buster-backports firmware-linux-nonfree
https://www.qubes-os.org/doc/pci-troubleshooting/#network-adapter-does-not-work
But it didn't change anything.

Relevant documentation you've consulted

Related, non-duplicate issues

https://qubes-os.discourse.group/t/sys-net-not-functional-on-kernel-5-x/2544/23
As stated by @marmarek:
https://qubes-os.discourse.group/t/sys-net-not-functional-on-kernel-5-x/2544/26
Should be related to missing multi-vector MSI support:
https://www.mail-archive.com/[email protected]/msg04757.html

@antarisherdeng4 antarisherdeng4 added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Feb 24, 2021
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Feb 24, 2021
@andrewdavidwong andrewdavidwong added the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Feb 24, 2021
@seonwoolee
Copy link

Has there been any progress on this? It seems the diagnosis is multi vector msi support: https://www.mail-archive.com/[email protected]/msg04757.html

@andrewdavidwong
Copy link
Member

Is this still a problem in 4.1?

@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023
@andrewdavidwong andrewdavidwong added eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Dec 7, 2024
Copy link

github-actions bot commented Dec 7, 2024

This issue is being closed because:

If anyone believes that this issue should be reopened, please leave a comment saying so.
(For example, if a bug still affects Qubes OS 4.2, then the comment "Affects 4.2" will suffice.)

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: kernel eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

3 participants