Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased power usage using hardware acceleration #74

Closed
thulle opened this issue Apr 17, 2022 · 8 comments
Closed

Increased power usage using hardware acceleration #74

thulle opened this issue Apr 17, 2022 · 8 comments
Labels
nvidia-issue This is an issue with the NVIDIA GPU driver

Comments

@thulle
Copy link

thulle commented Apr 17, 2022

Not really a bug, more a question, and probably something that should be addressed at the driver level.

I just enabled hardware acceleration in firefox using this, and while it works great I'm seeing a significant increase in power usage.
Using a AMD 3700x & Nvidia 3080 with 470.103.01 drivers, watching one 1080p h264 stream on twitch without hardware acceleration I'm seeing about 120W total consumption as measured by the PSU. Whereof ~46W is used by GPU:

# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk pviol tviol    fb  bar1 sbecc dbecc   pci rxpci txpci
# Idx     W     C     C     %     %     %     %   MHz   MHz     %  bool    MB    MB  errs  errs  errs  MB/s  MB/s
    0    46    45     -    28    19     0     0   810   510     0     0  1178     8     -     -     0   318   154
    0    46    45     -    27    17     0     0   810   510     0     0  1178     8     -     -     0   170   144
    0    46    45     -    28    19     0     0   810   525     0     0  1178     8     -     -     0   170   152
    0    46    45     -    23    17     0     0   810   525     0     0  1178     8     -     -     0   169   142
    0    46    45     -    25    17     0     0   810   510     0     0  1178     8     -     -     0   169   144
    0    46    45     -    24    17     0     0   810   510     0     0  1178     8     -     -     0   178   155

Activating hardware acceleration, the power usage jumps to around 176-180W, all of which goes to the GPU:

# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk pviol tviol    fb  bar1 sbecc dbecc   pci rxpci txpci
# Idx     W     C     C     %     %     %     %   MHz   MHz     %  bool    MB    MB  errs  errs  errs  MB/s  MB/s
    0   111    48     -    11     3     0     5  9251  1785     0     0  1612    11     -     -     0   131   255
    0   111    48     -    10     2     0     5  9251  1785     0     0  1612    11     -     -     0   132   242
    0   110    48     -    11     3     0     5  9251  1785     0     0  1612    11     -     -     0    19   127
    0   111    48     -    11     3     0     5  9251  1785     0     0  1612    11     -     -     0     7   130
    0   109    47     -     6     1     0     5  9251  1785     0     0  1610    11     -     -     0     0   140
    0   108    47     -     5     1     0     5  9251  1785     0     0  1610    11     -     -     0     3   127
    0   108    47     -     5     1     0     5  9251  1785     0     0  1610    11     -     -     0    10   127

Even if decoder usage is only 5% it never drops below 9251MHz memclock and 1785MHz GPU clock. Is this something that can be addressed at this level, or is it something that has to be done by the nvidia drivers?

@philipl
Copy link
Contributor

philipl commented Apr 17, 2022

It's a driver issue. As soon as you initialise cuda, it forces the GPU into a higher power state (at least P2 IIRC), and the nvdec implementation forces you to use cuda to interact with it, so this always happens. It means that using nvdec will never save you power unless you were going to use cuda anyway.

And it's particularly annoying because the actual hardware doesn't have this requirement - VDPAU uses the same hardware but doesn't use cuda, so the power level can stay at a minimum. And I'm sure the windows implementation also doesn't do this. To make it even more insulting, even the application using GL or Vulkan to do post-processing and all sorts of other fancy things also doesn't force the power level higher than is necessary for the work you're doing.

So the right place to complain is nvidia's forums. It's been brought up before and we've never seen anything get done. Very frustrating.

@philipl
Copy link
Contributor

philipl commented Apr 17, 2022

Amusingly, there are people with heavy compute workloads where the force-P2 behaviour actually slows things down because it wants to run at P0.

https://old.reddit.com/r/RenderToken/comments/aco1zv/updated_tutorial_on_nvidias_force_p2_power_state/

Unfortunately, the techniques documented there only work on windows or on non-consumer hardware.

@elFarto
Copy link
Owner

elFarto commented Apr 17, 2022

I hadn't realised the power usage was quite so bad with the newer GPUs. A 1080p Twitch stream on my old 1060 takes about 30W. But as Philip said, there's nothing we can do about it.

@thulle
Copy link
Author

thulle commented Apr 17, 2022

Thanks for the detailed answers!

In search of a workaround I tried limiting the power usage, but the lowest I could set it was 100 W, so not much gain there.
Limiting the GPU- & Mem-clocks worked though, and setting it to 210/405 MHz made it switch down to Performance Level 0 and 46-47 W power consumption. But now I started to drop a frame in the 1080p stream every other second or so, and starting a 8K stream made both streams grind to a halt while the whole desktop started to feel sluggish.

By limiting GPU to 600 MHz and Mem to 810 MHz it stays at PL1, I can still decode the 1080p stream and an additional 2160p60 (4K) stream simultaneously, while also staying at 50 W power draw and with dec% topping out at about 75%. If i try to watch more than 3 1080p60 streams I have to bump Mem to next level, 5001MHz, which is PL2 but tops out at around 75-80W.
To decode a 4160p60 vp9-stream I have to remove any limits on memory clock, which makes it end up with the same power draw as when unlimited. I currently got no monitor capable of that resolution though, and rarely do I have that many simultaneous 1080p streams, so capping at PL1 for regular desktop activities is a good workaround for me.

Pasting the commands if other users of these high power budget cards end up here:
To see allowed GPU clocks:

nvidia-smi -q -d SUPPORTED_CLOCKS|grep Gra|sort -nu -k3

MEM clocks:

nvidia-smi -q -d SUPPORTED_CLOCKS|grep Mem|sort -n -k3

Lock GPU clock to max 600 MHz while allowing to clock down to 210 MHz when CUDA isn't in use:

nvidia-smi -lgc 210,600

Reset GPU clock limit:

nvidia-smi -rgc

Lock Mem clock to max 810 MHz while allowing to clock down to 405 MHz when CUDA isn't in use:

nvidia-smi -lmc 405,810

Reset Mem clock limit:

nvidia-smi -rmc

@elFarto elFarto added the nvidia-issue This is an issue with the NVIDIA GPU driver label May 7, 2022
@elFarto
Copy link
Owner

elFarto commented Jun 5, 2022

Closing this issue as there's nothing we can do to resolve it. Thanks for the work around thulle, however I'm not sure how useful that'll be for most people. Fiddling with the memory clocks is not really a user friendly solution.

@detiam
Copy link

detiam commented May 19, 2023

the issue still present today, so for anyone who come here, I made a simple widget for KDE
https://www.pling.com/p/2037407/
that can toggle clock speed between default and limited speed by @thulle more easily.
now I don't need to hear fans spinning by just watching 1080p video.

@nbryant42
Copy link

nbryant42 commented Jun 17, 2023

I compiled ffmpeg6 with --enable-vulkan --enable-nvdec --enable-shared --enable-gnutls --disable-static --enable-libplacebo.

Ran mpv on an H.264 MP4 via mpv -v --gpu-api=vulkan --vo=gpu-next --hwdec=vulkan --msg-level=all=debug --gpu-debug --vd-lavc-check-hw-profile=no

Monitored with nvtop. Clock speeds are much lower with Vulkan Video.

On a 3060 Ti desktop board, roughly 23W on Vulkan Video vs ~54W with this VAAPI driver.

@schmorp
Copy link

schmorp commented Jan 5, 2024

Another "workaround" to reduce power usage is to use hwdec=vdpau-copy. Although this should result in more work, when decoding a full-HD video to 4k with gpu-next, vulkan and ravu+cfl postprocessing, my GPU clock stays at 400MHz and system power usage is around 90W. Doing the same with nvdec forces clocks to 2400MHz and system power usage is around 115W (using an RTX 4060). The advantage of this over limiting clock speeds is that the gpu auto-scales up when e.g. playing 60Hz 4K video, which needs a lot more processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nvidia-issue This is an issue with the NVIDIA GPU driver
Projects
None yet
Development

No branches or pull requests

6 participants