-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Support]: OpenVINO detector crashing, hanging machine #8338
Comments
I don't I've seen that error before. Do you have any issue if you change the detector device to |
I will experiment with CPU - but my hunch is this device isn't fast enough to do it across 4 cameras. But we'll see! Thought I had a fix with what I described in a comment here - But unfortinuately that only lasted about a week before:
Which hung the system. Found a similar issue: #5799 |
I have seen this before too, switched to OpenVINO on CPU and it's been running fine for months. I'll update when I have time to repro. |
Ok but that's just a workaround. This is caused by some intel GPU bug and maybe try latest drivers/libraries (which are really old in the current tracked Debian) might help the issue. Had the same and the only real fix was to use a recent machine (intel n100) |
I agree, I switched because I needed it to just work and didn't have time to chase the issue down. For reference my hardware is an i5-8500t, I am using Debian bookworm currently but I believe the issues I had were before the release so they could be fixed now. Like I said, I'll update when I have the chance to try to reproduce the error. |
Fingers crossed have been running in OpenVINO GPU mode since my last post and no crashes. Not running any special packages just current Debian Testing. |
Nevermind it crashed again, taking the system with it (eventually, it ground to a halt over a few minutes and I had to pull the power) kern.log
log stops dead there. Obviously since the machine crashed I couldn't collect the GPU error log. The server runs headless and the only tasks running on the GPU are the OpenVINO detector and the FFMPEG threads for each camera (6). |
Just to hopefully close this loop - beyond one weird, unrelated storage error, my machine has been stable and working with CPU-only detectors and GPU-accelerated ffmpeg for two straight weeks now.
So I think I'm going to just leave it at that. I am vaguely curious if CPU OpenVINO would also solve the problem for me, like you said @martini1992? But I am wondering if it's worth the headache and further debugging of hangs honestly given this machine is remote to me. Are you switching back to that? Does that have a discernible benefit over regular old CPU detectors (one per camera?) |
P.S. Did you see this alternative solution from a related issue? #8470 (comment) |
I hadn't seen that I'll check it out when I have time. I seems to use slightly less CPU for me than the other detector, vaguely remember the inference speed was faster. |
hi @kevin-david, i commented in the other thread #8470 . Moving my openvino model to yolov8 seems to have fixed my issues on my i7-7700 kaby lake. I am using this config now successfully for the last few hours detectors:
ov:
type: openvino
device: GPU
model:
path: /config/yolov8n/yolov8n.xml
model:
width: 416
height: 416
input_tensor: nchw
input_pixel_format: bgr
model_type: yolov8
labelmap_path: /config/yolov8n/coco_80cl.txt |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Finally found the time to switch to yolov8n on GPU, I'll update if I have trouble. |
Nope, went out for the day and it had crashed by the time I'd got home. |
I just discovered that I am having this issue on an i5 6500 Debian Book/Linux kernel 6.5 build. Exactly the same error in logs and (random) frequency. Almost happy to see that it is a known issue. Will change OpenVino to CPU for now, and await a fix. |
@FeatherKing I've used this and it worked well, thanks for that! But after upgrading to Frigate 0.14 it broke as "yolov8" is not a valid model type, For now I've switched to the config on the website . Would yolov8 be possible as well? |
Same issue here, running on i5 6500. For me it's the same for both the default config from the website and on yolonas. I do not recall having such issues with 0.13. I am running Frigate under docker on latest TrueNas scale, tried few tweaks from the internet but none worked so far. What makes me think it's not Frigate specific but rather host/os, is that the crash usually happened when I received a notifiaciton from one of the cams about person/car in zone, but yesterday it seems to have crashed when jellyfin was being used - which runs on the same host as a container. Unless this was just a coeindicinace and there was a detection running on any other cam, just not within any od the notifiaciton configured zones... |
Describe the problem you are having
I'm having this same issue as #7607 an OpenVINO detector using the i915 driver for an older Skylake GPU. When I restart Frigate, it does not come back online either. Haven't rebooted the machine yet in case this state is interesting.
As far as I understand the only driver is the one bundled with the kernel installed, which is...
Linux proxmox-surf 6.2.16-15-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-15 (2023-09-28T13:53Z)
Unfortunately I don't have a coral device to use, are there any other diagnostics I can gather for this?
(side note, I don't recommend using old laptop hardware if you can help it - but I was hoping to make use of something sitting around - https://devopsx.com/intel-gpu-hang/)
It doesn't line up with the time, but I pulled the message from
/sys/class/drm/card0/error
in case it's interesting, because the GPU definitely hung on this before.I can go play with more of the
i915.*
driver settings above, but was hoping someone may have better ideas before I do that. My current set of kernel params is just disabling power management:GRUB_CMDLINE_LINUX_DEFAULT="debug intel_idle.max_cstate=1 i915.enable_dc=0 ahci.mobile_lpm_policy=1 i915.mitigations=off mitigations=off"
I think this may have been the initial crash from the kernel logs:
I'm pretty sure this is a kernel/hardware issue, but figured I'd report here in case anyone else has seen this before and addressed it.
/sys/class/drm/card0/error output
Version
0.12.1-367D724
Frigate config file
Relevant log output
Operating system
Proxmox
Install method
Docker Compose
Coral version
CPU (no coral), using OpenVINO/Intel detector with Skylake architecture.
Network connection
Wired
The text was updated successfully, but these errors were encountered: