Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong framerate when fullscreen with error-diffusion dithering #8174

Closed
v-fox opened this issue Oct 13, 2020 · 10 comments
Closed

Wrong framerate when fullscreen with error-diffusion dithering #8174

v-fox opened this issue Oct 13, 2020 · 10 comments

Comments

@v-fox
Copy link

v-fox commented Oct 13, 2020

Important Information

0.32.0+git.20201008T111710.16b44d93f7
openSUSE Tumbleweed 20201009
https://build.opensuse.org/package/show/home:X0F:HSF/mpv
kwin-lowlatency
Radeon RX 580 Series (POLARIS10, DRM 3.38.0, 5.8.14-1958.gcea47bb-HSF, LLVM 10.0.1) (0x67df)

Reproduction steps

Use AMD GPU with Mesa drivers that have ACO enabled.

Expected behavior

Correct monitor framerate used at all times.

Actual behavior

Instead of native 60 or overclocked 73 fps mpv tries to use something around 55 fps with massive drops when fullscreen while windowed rendering is correct. On current Mesa release it's enabled by default. Weirdly, it happens only when mpv is launched from a file manager (tested dolphin or krusader). It works fine when launched via console in yakuake. That maybe because of weirdness in exporting RADV_DEBUG variable with or without llvm (non-aco ?) and aco (now implied on empty).
No settings in mpv itself seems to change that.

Log file

That's log from smplayer running mpv but naked mpv, even with OpenGL instead of Vulkan, does the same: mpv_aco.log

Sample files

Any video.

@v-fox v-fox added the os:linux label Oct 13, 2020
@haasn
Copy link
Member

haasn commented Oct 21, 2020

A brief glance at the log suggests that for some reason, mpv gets spammed with window resize events. I have no idea why. Post a log with naked mpv instead of smplayer, please, to reduce the noise.

@v-fox
Copy link
Author

v-fox commented Oct 21, 2020

A brief glance at the log suggests that for some reason, mpv gets spammed with window resize events. I have no idea why. Post a log with naked mpv instead of smplayer, please, to reduce the noise.

Log from bad framerate example: mpv_naked-aco.log
Log from good framerate example with RADV_DEBUG="${RADV_DEBUG},llvm": mpv_naked-noaco.log

@v-fox
Copy link
Author

v-fox commented Nov 13, 2020

After update to recent Mesa's snapshots, such as 20.3~branchpoint+125~git20201112T223223 built against LLVM 11, even disabling ACO stopped helping, it's happening on OpenGL too. By process of elimination I figured out that "wobbly" frame-rate is triggered by

dither-depth=auto
dither=error-diffusion
zimg-dither=error-diffusion
error-diffusion=<X>

when format conversion is enacted. Heavier algorithms induce bigger drop in rate, as if it's display's own rate changes.
Try 'burkes', 'sierra-2' and 'sierra-3' to reproduce. Which previously worked fine on RX580
Size of video or hardware load does not seem to matter.

Point about ACO still stands though: when it's active even more frames are lost, so something broke in mpv or Mesa.

Ironically, same algorithms work fine inside vapoursynth script: #8140 (comment)

@v-fox v-fox changed the title Wrong framerate when fullscreen under Mesa with ACO Wrong framerate when fullscreen with error-diffusion dithering Nov 13, 2020
@kasper93
Copy link
Contributor

kasper93 commented Dec 4, 2024

error-diffusion is heavy and done on every redraw, so the higher display frame rate the more work has to be done. Your gpu is not fast enough to keep up with the load it's given.

@v-fox
Copy link
Author

v-fox commented Dec 4, 2024

error-diffusion is heavy and done on every redraw, so the higher display frame rate the more work has to be done. Your gpu is not fast enough to keep up with the load it's given.

How come vapoursynth's CPU-only version works perfectly and with <5% load, so I can still use most of CPU time for mvtools' frame interpolation to 60-75 fps in real-time?

GPU offload supposed to be faster, not hundred times slower. And it's RX580 that is on par with Xbox One X in processing power, not some cut-down Intel's integrated GPU block.
Although, maybe algorithm I've been using lately is lighter that same ED ones (fmtconv's modes number 3-6). It's now the number 8, "Void and cluster halftone dithering".

But previously and with mpv's ED I also had some performance regressions and frame-drops from weird A/V desynchronization and unstable attempts to auto-sync back. Not sure about now, I mostly rely on vapoursynth script for post-processing.

@llyyr
Copy link
Contributor

llyyr commented Dec 5, 2024

Could you update the log file with one generated by current mpv version?

@kasper93
Copy link
Contributor

kasper93 commented Dec 5, 2024

How come vapoursynth's CPU-only version works perfectly and with <5% load

Like I said, ED is run on every redraw at target spatial and temporal resolution. I don't know what algorithm your cpu filter is using, but you are dithering video at source resolution and framerate and before rendering. Which makes no sense for dithering to target display.

GPU offload supposed to be faster, not hundred times slower. And it's RX580

RX580 is slow. Either way, you can see detailed timings in stats page 2 (shift+i, 2). Depending on your scaling and rendering target, your gpu might be already busy.

@v-fox
Copy link
Author

v-fox commented Dec 8, 2024

Like I said, ED is run on every redraw at target spatial and temporal resolution. I don't know what algorithm your cpu filter is using, …

But you do know because I linked you the list of fmtconv's supported algorithms... the one you have cited yourself. I'm dithering at format conversion and debanding before interpolation, so it wouldn't be trying to do motion vectors for artefacts, especially if downsampling 10-bit non-HDR HEVC encodings into 8-bit which would otherwise greatly increase CPU load for nothing (frame interpolation really does not like bigger bit-depth, input frame-rate or resolution).

But the point is not about why and when to dither. It's about VS's fmtconv managing to do ED dithering (if you would actually open the link you will see the list of "Sierra-2-4A, Stucki, Atkinson, Classic Floyd-Steinberg, Ostromoukhov" ED) on CPU with little load while your implementation somehow can't work on entire 6 TFLOPS that runs entire generation of recent console games. SteamDeck and most laptops with best integrated GPUs are around 4 times slower, by the way.

…but you are dithering video at source resolution and framerate and before rendering. Which makes no sense for dithering to target display.

No, it's what you're doing makes no sense. You're trying to do all processing at biggest resolution, frame-rate and bit-depth in the pipeline. Dithering isn't even the heaviest potential filter, it's just "supporting" one for something more interesting. Heavy filters should be used in realtime with lowest acceptable settings otherwise nothing short of cluster supercomputer will suffice. When you start saying things like "12 CPU cores and 6 TFLOPS on GPU is slow for some non-HDR 1080p yuv420 crap to reach 75 fps" for your player, that doesn't even have a decent frame interpolation of its own, then you're doing something wrong.
It doesn't need to be mathematically perfect while eating 500W of power on >1000$ of current-gen hardware, it just needs not to have some ugly smearing at the end.

Could you update the log file with one generated by current mpv version?

I'm long tired from pointless bickering about impractical filters that I've not been using for years. With all that drm-next/libplacebo hubbub I hoped that new age of mpv filters will come and VS scripts could be replaced with all built-in stuff: frame interpolation as good mvtools and up to IFRNet, some fancier scaling than lanczos (like the old popular FSRCNN), and not having annoying patters from temporal and ordered dithering by switching to ED. Without random seconds-length desyncs and constant framedrop-stream at 1080p@sub-100 fps. But none of this seems to be happening in my lifetime.
Instead it would waste GPU time on things like creating tens of useless shader-stages for tscale (or whatever it does when spamming vo-logs with tscale enabled), that is barely doing anything, and crash randomly during any seeking for like 5th time of the same regression resurfacing already if not using "bad hacks" like abandoned #12917 for perpetually missing "better solutions upstream" (don't know if it's finally fixed but the last time I built without the patch, it was not).

If "RX580 is slow" while not doing any fancy (or even none at all) resolution upscaling or frame interpolation in mpv internally then this conversation is only a waste of time and nerves. "Ordered" is not that bad in comparison to saturating entire GPU and still have frame-drops and desync at 1080p@sub-100 output. I don't want to even look into it any more.

However, I'm yapping all that but quick test of all ED algorithms in mpv on some random 10-bit non-HDR HEVC file now showed surprisingly good results: only stucki and jarvis-judice-ninke were dropping frames from target 72 to 60-68 (resulting in ugly jittering). Going fullscreen doesn't seem to change it too.
But maybe that file just somehow far from worst-case scenario and is bad for testing.

I've returned from wayland session back into x11 and used these settings, among many others:

vf-add=vapoursynth=~/.config/mpv/vapoursynth/motioninterpolation.vpy:1:66
x11-bypass-compositor=never
vo=gpu-next
gpu-api=vulkan,auto
vulkan-swap-mode=fifo-relaxed
swapchain-depth=4
vulkan-queue-count=8
video-sync=display-resample-vdrop
video-sync-max-video-change=5
video-sync-max-audio-change=0.075

And my ~/.drirc has:

<?xml version="1.0"?>
<driconf>
    <device>
	<application name="Default">
	    <option name="vblank_mode" value="2" />
	    <option name="adaptive_sync" value="true" />
	    <option name="precise_trig" value="true" />
	    <option name="allow_draw_out_of_order" value="true" />
…
	    <option name="block_on_depleted_buffers" value="true" />
…
	    <option name="vk_x11_strict_image_count" value="true" />
	    <option name="vk_x11_ensure_min_image_count" value="true" />
	    <option name="vk_x11_override_min_image_count" value="2" />
	    <option name="vk_require_etc2" value="true" />
	    <option name="vk_require_astc" value="true" />
	    <option name="vk_xwayland_wait_ready" value="true" />
	</application>
    </device>

Using a rolling distro, latest Mesa release and mpv snapshot.

Either way, you can see detailed timings in stats page 2 (shift+i, 2). Depending on your scaling and rendering target, your gpu might be already busy.

ED dithering always takes around 90-95% in "redraw", just like ewa_lanczos scaling does in "fresh". But the weird part is that when ED causes frame-skips/jitter GPU by itself is not even loaded. If you put some fat upscaling shader (like FSRCNN 2x2) then you can see it struggle with fans spinning to the max, power draw and "activity" %-counter rising in drivers. Here power-draw is only about half and activity is very uneven, jumping between nothing and 100%. In both smooth (burkes) and unstable (jarvis-judice-ninke) case. So it drops frames while not actually loading the GPU any more.

The main issue isn't even the inexplicable framedrops, it's "Display -> Refresh Rate -> (estimated)" field implying how much frames are constantly mistimed+dropped instead of actually having to do with display and vsync. And with no obvious reason why.

And I had to add script-opt=stats-font_size=16 into mpv.conf because recent changes made Ctrl+I text illegibly miniscule and not following osd-font-size.

@kasper93
Copy link
Contributor

kasper93 commented Dec 8, 2024

ED dithering always takes around 90-95% in "redraw", just like ewa_lanczos scaling does in "fresh".

Like I said multiple times, ED is done in present/redraw pass. The time budget here is directly tied to refresh rate of the output. Say you have 60 Hz display, with display-resample, you get ~16 ms to present a frame, even if it has been already rendered and is repeated. While at source frame-rate (say 24 fps) you have ~42 ms. See the difference? Your CPU filter is likely not faster, but is doing inherently different thing. Dithering in pre-processing is not the same as dithering output frame.

No, it's what you're doing makes no sense. You're trying to do all processing at biggest resolution, frame-rate and bit-depth in the pipeline.

For output dithering to be effective you want to do it at the final result after, color correction, scaling and so on, else you will smear dither noise and especially in ED case, which is directly tied to the source frame for generating noise. Like said above, pre-processing dithering is not the same as output dithering.

ED pass in libplacebo could be moved to rendering, which would give it more time to finish, but it is not a priority currently, because ED for video is not that useful anyway.

And I had to add script-opt=stats-font_size=16 into mpv.conf because recent changes made Ctrl+I text illegibly miniscule and not following osd-font-size.

Default font size, is script-opt=stats-font_size=20, so you are making it smaller.

@Dudemanguy
Copy link
Member

Briefly glanced through the thread but the problem seems to be simply a performance issue (error-diffusion is too intensive). So at least on mpv's side, there's no real bug here.

@Dudemanguy Dudemanguy closed this as not planned Won't fix, can't repro, duplicate, stale Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants