Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB devices not available after update to 12.3 on Yellow. Also preventing boot. #3347

Open
mmarc opened this issue May 8, 2024 · 17 comments
Labels
board/yellow Home Assistant Yellow bug

Comments

@mmarc
Copy link

mmarc commented May 8, 2024

Describe the issue you are experiencing

Updated Home Assistant OS from 12.2 to 12.3 on my Yellow and it did not come back online afterwards.
After a manual reboot the Yellow is online again but all connected USB devices are missing.

When rebooting several times it seems there is a 50:50 chance it boots at all and if it boots, USB is missing.

Downgrade to 12.2 solves the issue.

What operating system image do you use?

yellow (Home Assistant Yellow)

What version of Home Assistant Operating System is installed?

12.3

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

Just update from 12.2 to 12.3

Anything in the Supervisor logs that might be useful for us?

Unfortunately too late since I already downgraded to 12.2 again.

Anything in the Host logs that might be useful for us?

Unfortunately too late since I already downgraded to 12.2 again.

System information

No response

Additional information

No response

@agners
Copy link
Member

agners commented May 8, 2024

Hm, I have two Yellow's here, and double checked, both run on HAOS 12.3 successfully, with USB devices detected.

What (type of) device is it which is missing?

The ha logs commands have a boot parameter nowadays which allow to get logs from previous boot, e.g.

ha host logs --boot -1 --lines 10000

@agners agners added the board/yellow Home Assistant Yellow label May 8, 2024
@mmarc
Copy link
Author

mmarc commented May 8, 2024

Installed the update again.
Host is pingable afterwards but connection on port 8123 is refused and SSH also not reachable.
Powercycled Yellow and now online but this time USB devices (BLE dongle and Homematic RF dongle) are available.

  OS Version:               Home Assistant OS 12.3
  Home Assistant Core:      2024.5.2

  Home Assistant URL:       http://homeassistant.local:8123
  Observer URL:             http://homeassistant.local:4357
~ # lsusb
Bus 001 Device 001: ID 1d6b:0002
Bus 001 Device 004: ID 2fe3:000b
Bus 001 Device 002: ID 1a40:0101
Bus 001 Device 003: ID 1b1f:c020

Previously (in the error case) it only showed a single USB device, which was

Bus 001 Device 001: ID 1d6b:0002

if I remember correctly.

Attached the result of ha host logs --boot -1 --lines 20000 , not sure if there is something visible in there:
boot.log.gz

@agners
Copy link
Member

agners commented May 8, 2024

Host is pingable afterwards but connection on port 8123 is refused and SSH also not reachable.

Hm, sounds like Core did not get started then 🤔 ha supervisor logs might be helpful in this case.

Ideally the host dmesg would be helpful here, especially in the non working case. It seems that too much got logged already, the log is not 2000 lines, and the first entry is a cleanup entry from journald 😢

Seems 2fe3:000b is a Zephyr dev device? We did move to Linux 6.6 for Yellow with this release, the first time. We previously had quite some problems with some USB bus enumeration changes, however, from what I can tell most of them are reverted for Yellow as well (references #2995 and #3224).

If you can reproduce the problem, can you use dmesg in the SSH/Terminal?

@mmarc
Copy link
Author

mmarc commented May 9, 2024

2fe3:000b is a Nordics DK with HCI firmware for usage with BLE.

@sairon
Copy link
Member

sairon commented May 9, 2024

Coincidentally, I had an nRF DK with the HCI firmware lying around, so I tried booting my Yellow with that. However, out of ~30 boots so far, I encountered the issue once along the first couple of tries and I can't trigger it again. The cause seems to be the same as in #2257, the USB hub is not enumerated because of an unhandled interrupt. Just like in raspberrypi/linux#5064, it is a dwc2 USB interrupt:

[    6.598626] dwc2 fe980000.usb: irq 41, io mem 0xfe980000
(...)
[    7.331480] irq 41: nobody cared (try booting with the "irqpoll" option)
[    7.338199] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C         6.6.28-haos-raspi #1
[    7.346551] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[    7.350598] usb 1-1: new high-speed USB device number 2 using dwc2
[    7.352982] Call trace:
[    7.361598]  dump_backtrace+0xa0/0x100
[    7.365349]  show_stack+0x20/0x38
[    7.368659]  dump_stack_lvl+0x48/0x60
[    7.372319]  dump_stack+0x18/0x28
[    7.375628]  __report_bad_irq+0x40/0xf0
[    7.379461]  note_interrupt+0x330/0x388
[    7.383292]  handle_irq_event+0xa4/0xc0
[    7.387126]  handle_fasteoi_irq+0xac/0x240
[    7.391219]  generic_handle_domain_irq+0x34/0x58
[    7.395834]  gic_handle_irq+0x4c/0xd8
[    7.399491]  call_on_irq_stack+0x24/0x58
[    7.403411]  do_interrupt_handler+0x88/0x98
[    7.407591]  el1_interrupt+0x34/0x68
[    7.411162]  el1h_64_irq_handler+0x18/0x28
[    7.415255]  el1h_64_irq+0x64/0x68
[    7.418651]  default_idle_call+0x5c/0x170
[    7.422657]  do_idle+0x204/0x238
[    7.425884]  cpu_startup_entry+0x40/0x50
[    7.429804]  rest_init+0xec/0xf8
[    7.433028]  arch_call_rest_init+0x18/0x20
[    7.437122]  start_kernel+0x528/0x670
[    7.440780]  __primary_switched+0xbc/0xd0
[    7.444787] handlers:
[    7.447053] [<0000000048434357>] dwc2_handle_common_intr [dwc2]
[    7.452996] [<00000000d0dace6f>] dwc2_hsotg_irq [dwc2]
[    7.458147] [<00000000a7d505ef>] usb_hcd_irq
[    7.462417] Disabling IRQ #41

Attaching the full dmesg for reference: yellow-usb-fail-dmesg.txt

@mmarc can you try connecting your Yellow to a PC with USB-C connector switched to the USB-UART mode (see Linux/Mac or Windows instructions) and checking the boot log and dmesg directly there?

(Update 15 boots later - the issue occurred again with the same stack trace. rmmod dwc2 && modprobe dwc2 made the hub and attached device available again.)

@bkvargyas
Copy link

I have two yellows, both with the same z-wave stick inserted. Have not updated the 2nd one yet, but the first one had the same issues here as described. Luckly for me, I have remote PoE power cycle capability, and was able to get it back online after a power cycle, with the z-wave stick. I have 3 more yellows in a box, I just need to assemble and test in the lab.

@jbytheway
Copy link

jbytheway commented Jun 30, 2024

I just upgraded my Home Assistant Blue from 12.2 to 12.4 and am experiencing what may be the same issue. It's booted twice since the upgrade, both times the system came up fine, but my Zigbee USB stick (CC2652R1) was not detected.

Normally I'd expect my Zigbee stick to appear at /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0, but now I'm seeing no /dev/serial at all, and no /dev/ttyUSB* devices either.

Here's what lsusb sees.

~ lsusb
Bus 002 Device 007: ID 05e3:0620
Bus 001 Device 001: ID 1d6b:0002
Bus 001 Device 007: ID 05e3:0610
Bus 002 Device 001: ID 1d6b:0003

I believe those are all just the internal USB hubs.

There are no obvious errors in dmesg. In particular I do not see an unhandled interrupt message like @sairon is reporting.

Also nothing obvious in the supervisor or host logs.

When I unplug and re-plug the Zigbee stick, there are no messages in dmesg about anything being connected or disconnected.

Let me know if there's anything else I can usefully provide.

@jbytheway
Copy link

jbytheway commented Jun 30, 2024

Downgrading to 12.2 did not fix the issue for me (even though that version was working fine before). I'm still observing the same symptoms. So I'm now less confident I have the same issue.

I thought perhaps the Zigbee stick was just broken, but I plugged it into my desktop and it connects as /dev/ttyUSB1 just fine there. I also tried pluggin into different USB ports on the HA Blue box, but no luck.

I looked at dmesg output again. Here's the section where it detects the USB hubs:

[    1.557606] dwc3-meson-g12a ffe09000.usb: USB2 ports: 2
[    1.558017] dwc3-meson-g12a ffe09000.usb: USB3 ports: 1
[    1.565709] dwc2 ff400000.usb: supply vusb_d not found, using dummy regulator
[    1.570349] dwc2 ff400000.usb: supply vusb_a not found, using dummy regulator
[    1.577569] dwc2 ff400000.usb: EPs: 7, dedicated fifos, 712 entries in SPRAM
[    1.584812] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[    1.589930] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[    1.597605] xhci-hcd xhci-hcd.0.auto: hcc params 0x0228fe6c hci version 0x110 quirks 0x0000008000000010
[    1.606913] xhci-hcd xhci-hcd.0.auto: irq 26, io mem 0xff500000
[    1.612790] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[    1.618249] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[    1.625878] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
[    1.632665] hub 1-0:1.0: USB hub found
[    1.636112] hub 1-0:1.0: 2 ports detected
[    1.640211] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.648309] hub 2-0:1.0: USB hub found
[    1.651883] hub 2-0:1.0: 1 port detected
[    1.655990] dwc3-meson-g12a ffe09000.usb: switching to Device Mode

The only thing that seems maybe suspicious is the "We don't know the algorithms for LPM for this host, disabling LPM" message, but I doubt that's related.

@jbytheway
Copy link

After another reboot, everything is now working fine on 12.2.

@jbytheway
Copy link

Today I tried updating to 13.1 in the hope that a more recent OS update would have fixed the issue. No luck. 13.1 is still not recognizing my USB device, with the same symptoms as 12.3.

@HomeAssistant-Steve
Copy link

HomeAssistant-Steve commented Sep 1, 2024 via email

@jbytheway
Copy link

I think my bug isn't really the same as the one originally reported here, so I opened #3573 instead of continuing to hijack this one.

@HomeAssistant-Steve
Copy link

HomeAssistant-Steve commented Sep 1, 2024 via email

@lategoodbye
Copy link

Please see at raspberrypi/linux#6247 (comment) for a test kernel, which should hopefully fix the dwc2 IRQ issue.

@sairon
Copy link
Member

sairon commented Nov 28, 2024

@lategoodbye It won't be trivial to test this kernel build in HAOS but it seems the patches can be applied cleanly to 6.6.y branch too. If only these changes are needed (and not any changes done between 6.6->6.12 in mainline), and if @mmarc can still reproduce it and test it, I can prepare a test build of HAOS for this purpose. Unfortunately, like I said before, I can't reproduce it reliably myself.

@mmarc
Copy link
Author

mmarc commented Nov 28, 2024

@sairon I no longer have the device connected that seems to have caused the issue.
And since I've removed it, I also didn't experience the issue any more.

@lategoodbye
Copy link

lategoodbye commented Nov 29, 2024

@sairon I applied the fixes against a mainline tree, because i wanted to submit them to linux-usb as soon as the Linux 6.13 merge window closes. But i also wanted to provide them to the vendor tree, in order to get faster feedback. So yes there are no further changes required.

TLDR: I don't have a better scenario to reproduce this issue :-(

Regarding to the issue it self, yes it is very timing critical in order to reproduce this issue. Even if i found a "good" configuration for my Raspberry Pi 3B+ the problem occurs with a probability of 50 % at startup. So i focused on the following two scenarios, which might be different issues:

  • suspend2idle on Raspberry Pi, which is currently not available in Linux 6.6
  • fast USB reconnects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/yellow Home Assistant Yellow bug
Projects
None yet
Development

No branches or pull requests

7 participants