Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Support]: Nvidia - High Cpu Usage #3425

Closed
0xfgarcia opened this issue Jul 5, 2022 · 13 comments
Closed

[Support]: Nvidia - High Cpu Usage #3425

0xfgarcia opened this issue Jul 5, 2022 · 13 comments

Comments

@0xfgarcia
Copy link

0xfgarcia commented Jul 5, 2022

Describe the problem you are having

I experience high CPU usage even while using Nvidia GPU

Version

0.11.0-1D45B0B

Frigate config file

### CAM Driveway
  03driveway:
    ffmpeg: 
      hwaccel_args:
           - -c:v 
           - h264_cuvid
      inputs:
        - path: rtsp://user:[email protected]/ch1/main
          roles:
            - detect
        - path: rtsp://user:[email protected]/ch1/main
          roles:
            - record
            - rtmp
    detect:
      width: 1280
      height: 960
      fps: 10

Relevant log output

no

FFprobe output from your camera

ffprobe version 4.3.2-Jellyfin Copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 10 (Debian 10.2.1-6)
  configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-shared --disable-libxcb --disable-sdl2 --disable-xlib --enable-gpl --enable-version3 --enable-static --enable-libfontconfig --enable-fontconfig --enable-gmp --enable-gnutls --enable-libass --enable-libbluray --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libdav1d --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --arch=amd64 --enable-opencl --enable-vaapi --enable-amf --enable-libmfx --enable-vdpau --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvenc --enable-nvdec --enable-ffnvcodec
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, rtsp, from 'rtsp://admin:[email protected]/ch1/main':
  Metadata:
    title           : Media Presentation
  Duration: N/A, start: 0.475667, bitrate: N/A
    Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 2560x1920, 12 fps, 12.50 tbr, 90k tbn, 24 tbc
root@50c9c246adb1:/opt/frigate#

Frigate stats

{"01streetdoor":{"camera_fps":5.1,"capture_pid":239,"detection_fps":0.0,"pid":223,"process_fps":5.1,"skipped_fps":0.0},"02frontdoor":{"camera_fps":5.1,"capture_pid":242,"detection_fps":0.0,"pid":227,"process_fps":5.1,"skipped_fps":0.0},"03driveway":{"camera_fps":5.1,"capture_pid":251,"detection_fps":0.0,"pid":228,"process_fps":5.1,"skipped_fps":0.0},"04garagedoor":{"camera_fps":5.0,"capture_pid":254,"detection_fps":0.0,"pid":230,"process_fps":5.0,"skipped_fps":0.0},"05garage":{"camera_fps":5.0,"capture_pid":256,"detection_fps":0.0,"pid":231,"process_fps":5.0,"skipped_fps":0.0},"06garage":{"camera_fps":5.0,"capture_pid":259,"detection_fps":0.0,"pid":232,"process_fps":5.0,"skipped_fps":0.0},"07boilerroom":{"camera_fps":5.1,"capture_pid":262,"detection_fps":0.0,"pid":233,"process_fps":5.1,"skipped_fps":0.0},"10TVRoom":{"camera_fps":5.0,"capture_pid":265,"detection_fps":0.0,"pid":234,"process_fps":5.0,"skipped_fps":0.0},"11Kitchen":{"camera_fps":5.1,"capture_pid":271,"detection_fps":0.0,"pid":236,"process_fps":5.1,"skipped_fps":0.0},"12LivingRoom":{"camera_fps":5.1,"capture_pid":281,"detection_fps":0.0,"pid":237,"process_fps":5.1,"skipped_fps":0.0},"15garden":{"camera_fps":5.1,"capture_pid":290,"detection_fps":1.5,"pid":238,"process_fps":5.1,"skipped_fps":0.0},"detection_fps":1.5,"detectors":{"cpu1":{"detection_start":0.0,"inference_speed":74.86,"pid":217},"cpu2":{"detection_start":0.0,"inference_speed":87.23,"pid":219}},"service":{"latest_version":"0.10.1","storage":{"/dev/shm":{"free":1048.2,"mount_type":"tmpfs","total":1073.7,"used":25.5},"/media/frigate/clips":{"free":14919870.8,"mount_type":"cifs","total":15873499.2,"used":953628.4},"/media/frigate/recordings":{"free":14919870.8,"mount_type":"cifs","total":15873499.2,"used":953628.4},"/tmp/cache":{"free":1987.9,"mount_type":"tmpfs","total":2000.0,"used":12.1}},"temperatures":{},"uptime":111,"version":"0.11.0-1d45b0b"}}

Operating system

Debian

Install method

Docker Compose

Coral version

Other

Network connection

Wired

Camera make and model

Hikvision IPC-D150H-M

Any other information that may be helpful

Host HTOP
Captura de Pantalla 2022-07-05 a las 14 41 26

Host NVIDIA-SMI
Captura de Pantalla 2022-07-05 a las 14 41 43

Container NVIDIA-SMI
Captura de Pantalla 2022-07-05 a las 14 48 32

@NickM-27
Copy link
Collaborator

NickM-27 commented Jul 5, 2022

What CPU do you have? Low cpu usage isnt necessarily guaranteed with GPU, is it constant and how much motion do see if you view the debug camera view with motion boxes? GPUs only help with video decoding

@blakeblackshear
Copy link
Owner

The screenshots you posted of the ffmpeg processes show command args that don't match the config you posted. Can you post your actual yaml config?

@0xfgarcia
Copy link
Author

0xfgarcia commented Jul 5, 2022

Xeon E-2224G

I have turned off Detect, Recording & Snapshot from Frigate UI, similar cpu usage. Low motion boxes.

CPU usage: +30%
GPU usage: 1%

Can nvidia help with output rtmp ?

@blakeblackshear
Copy link
Owner

Rtmp is just a copy of the stream. It doesn't require any meaningful CPU work.

@0xfgarcia
Copy link
Author

0xfgarcia commented Jul 5, 2022

Config File #1 Cam.

database:
  path: /db/frigate.db
mqtt:
  host: 192.168.1.70
  user: mqttuser
  password: mqttuser
  topic_prefix: frigate
ffmpeg:
  input_args:
    - -c:v
    - h264_cuvid
cameras:
### CAM Driveway
  03driveway:
    ffmpeg: 
      hwaccel_args:
           - -c:v 
           - h264_cuvid
      inputs:
        - path: rtsp://user:[email protected]/ch1/main
          roles:
            - detect
        - path: rtsp://user:[email protected]/ch1/main
          roles:
            - record
            - rtmp
    detect:
      width: 1280
      height: 960
      fps: 5
    objects:
      track:
        - person
        - car
        - cat
        - dog
        - bicycle
        - skateboard        
        - motorcycle
        - bus
    motion:
      mask:
        - 490,40,1280,43,1280,0,492,0
    zones:
        Rampa:
          coordinates: 1280,960,1280,766,1280,799,1280,0,1121,0,1060,427,526,189,524,84,449,0,280,0,0,0,0,960
          objects:
              - person
              - cat
              - dog
    live:
      quality: 5
      height: 960
    snapshots:
      enabled: True
      crop: False
      bounding_box: True
      clean_copy: True
      timestamp: True
      quality: 100
    record:
      enabled: true
      retain:
        days: 15
        mode: all
      events:
        objects:
          - person
          - cat
          - dog
        retain:
          default: 15
    mqtt:
      timestamp: False
      bounding_box: False
      crop: True
      quality: 100
      height: 500
detectors:
  #coral:
  #  type: edgetpu
  #  device: usb
  cpu1:
    type: cpu
    num_threads: 3
  cpu2:
    type: cpu
    num_threads: 3
birdseye:
  enabled: True
  width: 1920
  height: 1080
  quality: 5
  mode: continuous

@0xfgarcia
Copy link
Author

ffmpeg process list:

root@omv:~# ps aux | grep ffmpeg

root 1509567 0.0 0.1 115684 17456 ? Ss 16:44 0:00 ffmpeg -f rawvideo -pix_fmt yuv420p -video_size 1280x960 -i pipe: -f mpegts -s 1280x960 -codec:v mpeg1video -q 5 -bf 0 pipe:

root 1509569 0.0 0.1 115684 17500 ? Ss 16:44 0:00 ffmpeg -f rawvideo -pix_fmt yuv420p -video_size 1280x720 -i pipe: -f mpegts -s 1280x720 -codec:v mpeg1video -q 8 -bf 0 pipe:

root 1509575 24.1 1.1 5328480 185468 ? Ssl 16:44 0:02 ffmpeg -hide_banner -loglevel warning -c:v h264_cuvid -avoid_negative_ts make_zero -fflags nobuffer -flags low_delay -strict experimental -fflags +genpts+discardcorrupt -use_wallclock_as_timestamps 1 -i rtsp://user:[email protected]/ch1/main -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c copy -an /tmp/cache/03driveway-%Y%m%d%H%M%S.mp4 -c copy -f flv rtmp://127.0.0.1/live/03driveway -r 5 -s 1280x960 -f rawvideo -pix_fmt yuv420p pipe:

@blakeblackshear
Copy link
Owner

You have specified the h264_cuvid in multiple locations. Let's see if fixing that helps.

database:
  path: /db/frigate.db
mqtt:
  host: 192.168.1.70
  user: mqttuser
  password: mqttuser
  topic_prefix: frigate
cameras:
### CAM Driveway
  03driveway:
    ffmpeg: 
      hwaccel_args:
        - -c:v 
        - h264_cuvid
      inputs:
        - path: rtsp://user:[email protected]/ch1/main
          roles:
            - detect
        - path: rtsp://user:[email protected]/ch1/main
          roles:
            - record
            - rtmp
    detect:
      width: 1280
      height: 960
      fps: 5
    objects:
      track:
        - person
        - car
        - cat
        - dog
        - bicycle
        - skateboard        
        - motorcycle
        - bus
    motion:
      mask:
        - 490,40,1280,43,1280,0,492,0
    zones:
        Rampa:
          coordinates: 1280,960,1280,766,1280,799,1280,0,1121,0,1060,427,526,189,524,84,449,0,280,0,0,0,0,960
          objects:
              - person
              - cat
              - dog
    live:
      quality: 5
      height: 960
    snapshots:
      enabled: True
      crop: False
      bounding_box: True
      clean_copy: True
      timestamp: True
      quality: 100
    record:
      enabled: true
      retain:
        days: 15
        mode: all
      events:
        objects:
          - person
          - cat
          - dog
        retain:
          default: 15
    mqtt:
      timestamp: False
      bounding_box: False
      crop: True
      quality: 100
      height: 500
detectors:
  #coral:
  #  type: edgetpu
  #  device: usb
  cpu1:
    type: cpu
    num_threads: 3
  cpu2:
    type: cpu
    num_threads: 3
birdseye:
  enabled: True
  width: 1920
  height: 1080
  quality: 5
  mode: continuous

@0xfgarcia
Copy link
Author

0xfgarcia commented Jul 8, 2022

Using the following settings cpu usage has dropped to 8-9%, 2560x1440 camera. I don't know if it can be improved or the result is what is expected ? Could the GPU encoder be used to further improve other process or lower the load on the cpu ?

Nvidia recommends this ffmpeg settings to keep all processing in GPU. https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/

-hwaccel cuda -hwaccel_output_format cuda

cameras:
  coches:
    ffmpeg:
      hwaccel_args:
           -hwaccel cuda
      input_args:
           -avoid_negative_ts make_zero
           -fflags +genpts+discardcorrupt
           -rtsp_transport tcp
           -stimeout 5000000
           -use_wallclock_as_timestamps 1
           #-hwaccel cuda
           #-c:v h264_cuvid
      output_args:
        record:
           -f segment
           -segment_time 10
           -segment_format mp4
           -reset_timestamps 1
           -strftime 1
           -c copy
           -an
           #-c:v h264_nvenc
        rtmp:
           -c copy
           -f flv
           #-c:v h264_nvenc
        detect:
           -f rawvideo
           -pix_fmt yuv420p

Captura de Pantalla 2022-07-08 a las 14 46 14

Captura de Pantalla 2022-07-08 a las 15 20 08

@NickM-27
Copy link
Collaborator

NickM-27 commented Jul 8, 2022

Looks like it's working well and some CPU usage like that makes sense as motion detection and other frigate processing have to use CPU.

I don't think cuda output format would help since it would need to be converted to yuv420 anyway which would use CPU

@Nurovico
Copy link

Nurovico commented Jul 8, 2022

Frigate Version: 0.11.0-1D45B0B

DELL PowerEdge T40 Xeon E-2224G 16 GiB Memory

Operating system: Debian

Install method: Docker Compose

Coral version: NO Coral (waiting for delivery since weeks ago)

Network connection: Wired

Camera make and model: Hikvision IPC-D150H-M [h264 2560x1920 Main stream 12 fps]

11 cameras running with 'simple' hwaccel_args, like this one:
ffmpeg:
hwaccel_args:
- -c:v
- h264_cuvid

Test made in camera 03driveway

I am testing three scenarios to try reduce CPU use. To do that I bought Nvidia graphics card exclusively for this.

  • TEST 01. No ...args

sin args

03driveway:

ffmpeg: 
  inputs:
    - path: rtsp://user:[email protected]/ch1/main
      roles:
        - detect  
        - record
        - rtmp
detect:
  width: 2560
  height: 1920
  fps: 5
motion:
  mask:
    - 996,108,2560,129,2560,0,984,0
zones:
    rampa:
      coordinates: 2560,1896,2560,220,2230,175,2113,877,1046,385,1039,114,967,131,729,100,280,0,0,0,0,1920
      objects:
          - person
          - cat
          - dog
live:
  quality: 5
  height: 1920
snapshots:
  enabled: True
  crop: False
  bounding_box: True
  clean_copy: True
  timestamp: True
  quality: 100
record:
  enabled: true
  retain:
    days: 15
    mode: all
  events:
    objects:
      - person
      - cat
      - dog
    retain:
      default: 15
mqtt:
  timestamp: False
  bounding_box: False
  crop: True
  quality: 100
  height: 500
  • TEST 02. 'Simple' ...args

con args simples

03driveway:

ffmpeg: 
    hwaccel_args:
       - -c:v 
       - h264_cuvid
    inputs:
    - path: rtsp://user:[email protected]/ch1/main
      roles:
        - detect  
        - record
        - rtmp
detect:
  width: 2560
  height: 1920
  fps: 5
motion:
  mask:
    - 996,108,2560,129,2560,0,984,0
zones:
    rampa:
      coordinates: 2560,1896,2560,220,2230,175,2113,877,1046,385,1039,114,967,131,729,100,280,0,0,0,0,1920
      objects:
          - person
          - cat
          - dog
live:
  quality: 5
  height: 1920
snapshots:
  enabled: True
  crop: False
  bounding_box: True
  clean_copy: True
  timestamp: True
  quality: 100
record:
  enabled: true
  retain:
    days: 15
    mode: all
  events:
    objects:
      - person
      - cat
      - dog
    retain:
      default: 15
mqtt:
  timestamp: False
  bounding_box: False
  crop: True
  quality: 100
  height: 500
  • TEST 03. Using complex ...args

con args complex

03driveway:

 ffmpeg: 
  hwaccel_args:
      -hwaccel cuda
  input_args:
       -avoid_negative_ts make_zero
       -fflags +genpts+discardcorrupt
       -rtsp_transport tcp
       -stimeout 5000000
       -use_wallclock_as_timestamps 1
       -hwaccel cuda
       #-c:v h264_cuvid
  output_args:
    record:
       -f segment
       -segment_time 10
       -segment_format mp4
       -reset_timestamps 1
       -strftime 1
       -c copy
       -an
       #-c:v h264_nvenc
    rtmp:
       -c copy
       -f flv
    rtmp:
       -c copy
       -f flv
       #-c:v h264_nvenc
    detect:
       -f rawvideo
       -pix_fmt yuv420p
  inputs:
    - path: rtsp://user:[email protected]/ch1/main
      roles:
        - detect  
        - record
        - rtmp
detect:
  width: 2560
  height: 1920
  fps: 5
motion:
  mask:
    - 996,108,2560,129,2560,0,984,0
zones:
    rampa:
      coordinates: 2560,1896,2560,220,2230,175,2113,877,1046,385,1039,114,967,131,729,100,280,0,0,0,0,1920
      objects:
          - person
          - cat
          - dog
live:
  quality: 5
  height: 1920
snapshots:
  enabled: True
  crop: False
  bounding_box: True
  clean_copy: True
  timestamp: True
  quality: 100
record:
  enabled: true
  retain:
    days: 15
    mode: all
  events:
    objects:
      - person
      - cat
      - dog
    retain:
      default: 15
mqtt:
  timestamp: False
  bounding_box: False
  crop: True
  quality: 100
  height: 500

Nvidia card never goes beyond 10% except when starting Frigate (the is up to 28% during for a second)

smi

I am not a pro so I need extra help. I see there are no encoder use from Nvidia.

One of my questions is if this is the most I can get from the graphics card.

I am using the record stream from main stream, because I saw that Frigate gives the live view resolution from the record settings, that do not allow me to use a sub stream for detection.

Is there a the right ffmpeg settings? I do not if this is the same that Nvidia recommends or if it not compatible with Frigate.

Am I wrong about all of this?

Thank you in advance for your help

@NickM-27
Copy link
Collaborator

NickM-27 commented Jul 8, 2022

  hwaccel_args:
      -hwaccel cuda
  input_args:
       -avoid_negative_ts make_zero
       -fflags +genpts+discardcorrupt
       -rtsp_transport tcp
       -stimeout 5000000
       -use_wallclock_as_timestamps 1
       -hwaccel cuda

please do not double up your hwaccel args, they should only be put under hwaccel_args

I see there are no encoder use from Nvidia.

Frigate does not use GPU for encoding, only decoding the incoming stream from the camera

One of my questions is if this is the most I can get from the graphics card.

As far as standard frigate is concerned I believe it is, things seem to be working well so I don't know what else to look at. I would keep it with the "simple" args and leave it at that as they seem to work best. Those are the args recommended by the docs.

Depending on what GPU you have, you may be able to use the TensorRT cores to run faster object detection via this thread. Keep in mind though that this is not apart of frigate directly (yet anyway) and has no guarantee of stability.

@Nurovico
Copy link

Nurovico commented Jul 8, 2022

Thanks Nick,

I'm sorry I copied the wrong settings, I just update the pst so this is the actual camera config settings from the three test I made.

About the resolution for the record settings? Is there a possibility to have one source for detection and other for Frigate's cam viewer?

@NickM-27
Copy link
Collaborator

NickM-27 commented Jul 8, 2022

About the resolution for the record settings? Is there a possibility to have one source for detection and other for Frigate's cam viewer?

The detect process is the only stream that is decoded, and it needs to be decoded before showing in the frontend so there is no way to use the higher quality stream in the frontend but not for detection itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants