-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coral USB Resets on Proxmox #2607
Comments
The only similar thing I was able to find was: google-coral/edgetpu#166 Related? |
On a hunch this was a usb autosuspend issue, I've followed the advice here and disabled it and rebooted.
And, going outside and waving my arms around, still no bounding boxes, no events, so I'm fresh out of ideas. |
related? |
I'm not sure, but it doesn't seem like it. In both of those cases, the users cannot use their Coral. I can, but it stops working.
Maybe? But I need something more solid to go on to decide if I need to spend money on a new hub, a new cable, or both. "Maybe it's using too much power" or "maybe you need a powered hub" aren't very satisfying answers. Is there any way to know more? |
Went ahead and wrote to the coral.ai help address, and they said it sounded like it might be heat related. Given that the Frigate dockerfile runs it with the max performance library, this might make sense on the surface, but I'm still a little suspicious, since this article didn't seem to indicate any heat problems, and I would think this would be more widespread otherwise. I will say that my unit seems quite hot compared to the observations in the article (measurements to follow). I ordered a passive copper heatsink that should cover the unit, and which comes with a binding pad to test this out. Hopefully the addition of a heatsink will at least provide a data point for or against temperature being the issue. If it does seem to be heat, that seems to suggest my unit is hotter than normal for unknown reasons. It would either be a defect in my unit or something off about my environment, and I would suspect the environment before the hardware. Very confusing. |
I have been running mine at max speed for years and never seen a heat related failure even in a warm server rack. |
As I said, I’m pretty skeptical that’s what’s happening, but I don’t have any viable, testable alternatives at the moment. All I know is that the device is “reset” and frigate stops working. I really want this to work, so I won’t stop digging on it until I have a viable solution, but at least the coral.ai help address gave me something to go on. |
Looks like my Coral USB is resetting for me as well: proxmox logs:
I only noticed this after I started optimizing my containers. It seems like Coral is not really being used. When I was running Frigate with Coral on another machine, I didn't notice 20-40% CPU usage for __ |
I bought a powered hub and a new cable, but now Frigate cannot see the Coral. Likely because I need to change the mounts, but this brings up a question I thought I'd ask here: is it possible there is some conflict with the way the qemu-based VM (hassos) and the Frigate LXC container are getting USB passed through? The Home assistant container has a single USB device mapped through in proxmox (a zigbee/zwave combination stick), while the LXC container (privilieged) is getting the entire bus:
|
For what its worth, I have the Coral USB device exposed through proxmox and I do not get any errors. I get errors, ironically, on other devices, but not the Coral. I also pass-through and NVIDIA TESLA M4 without issue. |
I'm setting up the Coral USB as it's being shown by the |
I gave up and purchased a seeed odyssey, as recommended in the docs. I put ubuntu server on it bare metal, exclusively to run Frigate, and connected my coral to a powered usb3 hub with a high speed cable. I am still getting errors, the latest of which is this:
I'm starting to think Either my Coral is defective, the Coral is a poor choice of hardware, or there is some underlying bug with the software or supporting libraries, but in any case it's nothing I can seem to do anything about. This is a very frustrating experience, and I don't even have any insight on how to improve it. Hope someone can help me out. |
I would suggest trying the getting started tutorial here: https://coral.ai/docs/accelerator/get-started If that doesn't work, you may have a defective device. This would be the first time I have heard of that happening. |
Your confidence is reassuring, since you have vastly more experience with this device than I do, however is there anything on the page you linked which the frigate image has not already done, save for me manually running a model? Are you suggesting perhaps that I try and get it to crash by running it through its paces on the command line? Perhaps a loop of image classification? These errors are definitely not constant, though they have happened repeatedly today alone. I've looked up the error, which seems to suggest the device is unavailable or the libraries are not installed correctly, neither of which makes sense if it works sometimes. One mention of it in the issues here suggested the device not getting enough power, which I thought I would have solved with the powered hub. I might even be inclined to ignore them if Frigate hadn't failed to notify me of a package delivery today, after perfectly reporting the first three vehicle alerts. |
Both of the package notification failures that I recall also happened at night. Is it possible that these crashes are transitory and unrelated and that there is poor model performance at night, when the camera switches to black and white? |
These errors are related to the communication with the device. If it only happens sometimes, it may be related to times of high utilization or high temperatures. I have never had that problem with my USB devices even when running at 100fps for weeks on end in a warm server cabinet. The metal part of the USB is a heat sink that should be connected with a thermal pad to the chip. Given you have tried this on multiple machines and one was bare metal, I still think this points to a defective device. There isn't anything I can think of that could cause black and white to make a difference. The part of frigate that uses the Coral runs in a dedicated process and only has one isolated job of processing raw image data in memory. |
I have two USB Corals and I'm seeing identical symptoms as you. My setup is Proxmox-->Ubuntu Server-->Docker |
@lensherm very weird indeed! Looks like once Frigate have access to the USB it might be loading some drivers and the ID/Vendor changes because of that? I'm just assuming things from my completely ignorance here. I'm really looking forward to find a solution to this |
I think this stackoverflow posting describes what you're experiencing: |
That looks eerily similar to what I'm seeing. Will keep digging, when I have the time. For now, I have to pass through the "wrong" device to the VM, start it up, shut it down, pass in the updated correct device, start up the VM and all is fine with the world, until the next power down of the ProxMox machine. |
Similar setup, I am using a coral usb, proxmox, ubuntu 21 vm , and having frigate in docker. |
Someone from the Coral support team send me this link, I still need to look at this but wanted to share in case anyone is able to fix it https://www.reddit.com/r/Proxmox/comments/nmsknx/proxmox_vm_ubuntu_2004_connect_google_coral_usb/ (look for the comment with the solution, not the main comment) |
@jcastro Okay, passing the pcie usb controller seems to be stable, it's some hours now that it's running fine. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I'm experiencing the same behavior on proxmox 7. It does not matter how the usb is passthrough to the VM, by device vendor/id or by usb port or by passing the entire pci device to the VM, the result is always the same (even after adding a powered USB 3 hub) . the Coral USB will perform between 1-3 requests and after that it will start flashing/blinking white. dmesg will show the following:
any ideas? |
The coral blinking means it's working / processing and is normal behavior. As far as the error, not sure why that would happen. |
What I've noticed is that the Coral USB will blink while processing a request then will stop blinking. Although this is a steady blinking/flashing. It will stop processing requests right after the error in my previous comment shows in dmesg . If I unplug the coral and plug it back in. It will get recognize again and will detect object once of twice and then the error will show again in dmesg and it won't process any more request and will continue blinking/flashing indefinitely wether there is motion/object or not. |
Shame this is still an issue over a year later. At least Frigate is now detecting the coral and rebooting detection - so it does continue to work but just has 30 blips of no detection throughout the day and the associated spam logs. I'm not sure if it's the version of Proxmox or just unavoidable, I've tried a separate VM to the HA addon but it's the same problem. M700 tiny so can't pass through the usb controller. Anyone ever find a solution? |
I dont have problem use Coral in Proxmox LXC container with docker for few hours however after some time it is probably disconnected. It is almost similar behaviour I have on VMware ESXi VM. Currently I am thinking about replacing it with M.2 TPU but not sure if does solve my issue... |
@michalcharvat - the M.2 TPU will NOT work on esxi - It's been documented in other posts around the web, as well as my own personal experience, the m.2 coral issue revolves around the fact that the TPU has to be flashed each time its power cycled and ESXI does not handle the PCIE device ID changing during this flash process (re: passthrough, now needs to be done of a new device ID post flash, which will require a reboot -> thus loop entire process over again). I would very much like to know @blakeblackshear your exact hardware setup (The one he describes as working great for years in a warm server cabinet and frequently at 100 FPS of coral detection) - ie what is the hardware, and how are you running frigate? (ie is docker used? any virtualization on-top of docker?) The only stable frigate setup I have gotten is when I use a older USB-A to USB-C cable (from various pciE usb3 cards im testing) - the cable only allows USB-2 / 480mbit usb speeds, however with this "limit" frigate and coral have been stable for WEEKS of uptime - although inference speed is 30-35ms and inference FPS is limited to about 30 FPS (which is better than coral reboots / unstable coral). I have several high quality supermicro servers (x9 and x11 based dual cpu servers) + a nvidia p1000 gpu , and 4x different USB3 pcie cards (along with 3x usb corals and 1x pcie based coral) -- all for testing / trying to get a stable setup. thanks! |
The 0.13 docs have a new getting started guide that outlines exactly how to set up in a way similar to the way Blake runs (it was written by Blake) https://deploy-preview-6262--frigate-docs.netlify.app/guides/getting_started |
thanks, that is helpful - i am more curious as to the hardware he is using though (i assume debian 12 is baremetal, not a guest , and what kind of USB3 pcie card or is a USB3 powered hub in use ). but for now will be re-doing my tests with debian bookworm as the OS (guest on esxi). |
The hardware is described in the docs as well https://docs.frigate.video/frigate/hardware#server |
@bob454522 my current setup is M.2 TPU in Wi-Fi slot in old HP 705 G4 mini inside Proxmox 7.4 in LXC container. I dont remember why but I was not able run it in Proxmox 8. |
Some of the issues being discussed in this thread are not necessarily anything to do with Proxmox, but rather Debian or Linux itself. I initially ran Frigate in an LXC container on Proxmox 7, and saw these USB issues (both with and without the PCI object passed through).
I recently reinstalled Frigate on vanilla Debian 12.4, on a bare-metal Dell Optiplex 3080 Micro Form Factor and moved by USB Coral over to it. I still get exactly the same messages I got in Proxmox. Obviously if people have issues to do with PCI passthrough or running in VMs, that's potentially different, but a few people here are possibly blaming Proxmox or LXC when vanilla Debian bare metal still has an issue. |
This is great, to hear- i have been troubleshooting these exact issues for weeks now- To the point where I now have 3x USB corals, and 1x m.2 coral in a PCIe converter. I also have 4x different pcie USB cards (as pass-ing through the pcie USB is the only stable option, which does make sense). I also bought a usb cable tester (Less of a tester and more of a pin-out / pin-continuity detector, so that I'm able to determine if XYZ USB-cable can support USB at 480 megabit, 5 gigabit, or 10 gigabit) All of my hardware is server grade Supermicro x9 or x11 based boards, with 256g+ of ecc ram, enterprise ssds ect - (in server chassis, with dual super-micro power supplies).
Im using esxi (and baremetal at times only for short tests, as i only plan to run this as a virtual machine). A key finding i have recently discovered - The setup is very stable if you FORCE usb2 speeds to the coral (480mbit) (ie using a non usb3 cable accomplishes this) - Of course the inference latency increases quite a bit, and thus the total detection FPS is reduced alot, but the entire setup is very stable. Ive even tested 2x corals to one frigate at 480mbit, and its very stable but not 2x the coral performance. (ive also tested 2x frigate containers, 1x coral to each, both sharing the p1000, and that too is very stable for days ) (i also Recall reading another post here from a user who was having coral reset issues and thought the Google included USB cable was the issues - as when he switched to a different cable the entire system was now stable and the coral ran much cooler temperature wise, however it was pointed out that he was now just running the coral at USB2 and not USB3 speeds due to the cable) so it's really starting to seem like it's some kind of USB throughput issue and possibly the underlying OS, or perhaps how docker interacts with the USB driver? (also seems like frigate is pushing alot of bandwidth to the coral, Which is of course expected / normal) At first it very much seemed like usb power issues, But I have tested and controlled for that and it does not seem to be a lack of power slash current from the USB to the coral. forcing usb at 480mbit - very stable for days (even seen 1 week uptime on frigate container);
If i use USB3 at 5gbit to the coral (
If i use a USB card capable of 10gbit (but still linked to the coral's max of 5gbit), frigate will run anywhere from 1min to 4min , then restart (just the frigate container not the os / vm), with several of these this in dmesg:
next up- i will be testing with using debian 12.x as the guest OS (still will use docker) , then if not resolved will test with same hardware but on baremetal. I have do have much higher bandwidth services / hardware running on esxi / vSphere using pcie Passthrough - un-related to frigate, with years of uptime, so i dont think esxi passthrough is the problem / limit here (but it is possible!) |
to update my post above- im seeing the same exact USB errors on debian 12 , as i do ubuntu 20 and 22 - (this when im using the coral linked at usb3, 5gbit, on a pcie usb card capable of 10gbit - this is the card im using: https://a.co/d/cCuS298 + with a 10gbit capable usbC to usbC cable to the coral ) - see below :
(over the next week or so will test this same hardware on baremetal) |
bob454522, have you made any other discoveries? I'm beginning to think it is an underlying OS issue so is there another OS that is not Debian/Ubuntu that is more stable and can run docker? |
I've had similar issues experienced in relation to these errors:
I had a look at disabling LPM on the device through the following grub args:
I ended up doing this on both Proxmox and the Docker VM just to be sure I no longer receive the Thought I'd post to rule out any issues with the kernel attempting to run some power saving on the USB device. |
@KevSex thank you, you've cleared my issue on my Dell 3080 Micro / Debian Bookworm with USB Coral! Simple case of editing /etc/default/grub, adding those usbcore. settings, |
@KevSex do you know if there is a bit of a "step by step" guide on how to change those parameters, or can you elaborate on what you did (and how you did it)? I KNOW I have seen an article somewhere that does a "step by step" around this, just cannot find it anymore. So a bit of help would be highly appreciate. |
Edit your grub file
Look for line Mine looks like: Save and exit file Update grub
Reboot |
Thanks for the fix @KevSex! |
I put both "usbcore.autosuspend=-1 usbcore.quirks=18d1:9302:k" "Fri Sep 27 16:20:55 2024] usb 2-2: reset SuperSpeed USB device number 2 using xhci_hcd" "2024-09-27 20:20:24.520968126 [2024-09-27 16:20:24] frigate.watchdog INFO : Detection appears to be stuck. Restarting detection process... The environment is Red Hat Enterprise Linux 9 physical machine running KVM with HA as a VM and Frigate as a container inside. |
@KevSex thank you, I was having the issue in UNRAID and used the tool powertop to force no powertuning on the USB xHCI controller |
Describe the problem you are having
Went outside tonight to test an automation on person detected in front yard zone. Nothing. Opened debug view with all options turned no. No bounding boxes drawn of any kind. No detection and no events.
Logs only show this, from earlier:
And nothing until my websocket connection at 2022-01-08 00:03:17
Proxmox logs indicate the USB device has been "reset"
Frigate now appears to be in a state where running, but simply not detecting anything.
Some digging has offered only sparse suggestions of using a shorter, better quality cable, however this cable is only 2-3 inches long, and supposedly rated for 10Gbps data:
https://www.amazon.com/gp/product/B08NPSX7FF/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1
The machine is a Beelink Mini PC SEi10, 16GB RAM Running the latest proxmox with frigate in LXC, and it functions perfectly well for most of the day.
I'm confused as to where to go from here. Is this an issue with LPM perhaps?
Version
0.9.4-26AE608
Frigate config file
Relevant log output
FFprobe output from your camera
Frigate stats
n/a
Operating system
Proxmox
Install method
Docker Compose
Coral version
USB
Network connection
Wired
Camera make and model
reolinks
Any other information that may be helpful
No response
The text was updated successfully, but these errors were encountered: