Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picom spams sgi_video_sync_scheduler_callback with latest nvidia driver #1265

Closed
ShadiestGoat opened this issue May 21, 2024 · 34 comments
Closed

Comments

@ShadiestGoat
Copy link

Platform

Arch (Linux 6.9.1-arch1-1)

GPU, drivers, and screen setup

  • NVIDIA GeForce GTX 1650 Ti Mobile
  • Single Monitor
  • Laptop (w/ optimus-manager)
  • Also integrated GPU: AMD Radeon RX Vega 6 (Ryzen 4000/5000 Mobile Series)
  • nvidia-dkms 550.78-1
  • xf86-video-amdgpu v23.0.0-2
  • mesa 1:24.0.7-3
glxinfo -B

name of display: :0                                                                                                                                                                         
display: :0  screen: 0
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 4096 MB
    Total available memory: 4096 MB
    Currently available dedicated video memory: 3521 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce GTX 1650 Ti/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 550.78
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6.0 NVIDIA 550.78
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.78
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

Environment

Bspwm

picom version

vgit-9a839

Diagnostics

Version: vgit-9a839

Extensions:

  • Shape: Yes
  • RandR: Yes
  • Present: Present

Misc:

  • Use Overlay: Yes
  • Config file specified: None
  • Config file used: /home/shady/.config/picom/picom.conf

Drivers (inaccurate):

NVIDIA, modesetting

Backend: glx

  • Driver vendors:
  • GLX: NVIDIA Corporation
  • GL: NVIDIA Corporation
  • GL renderer: NVIDIA GeForce GTX 1650 Ti/PCIe/SSE2

Backend: egl

  • Driver vendors:
  • EGL: NVIDIA
  • GL: NVIDIA Corporation
  • GL renderer: NVIDIA GeForce GTX 1650 Ti/PCIe/SSE2

Configuration:

Configuration file
shadow = false;

corner-radius = 8;
rounded-corners-exclude = [
  "class_g = 'Polybar'"
];
round-borders = 8;

fading = true;
no-fading-openclose = false;
fade-in-step = 0.1;
fade-out-step = 0.1;
fade-delta = 9;

# nice kawase blur
blur: {
  method = "dual_kawase";
  strength = 3;
  background = false;
  background-frame = false;
  background-fixed = false;
}

backend = "glx";
vsync = true

mark-wmwin-focused = true;
mark-ovredir-focused = true;
detect-client-opacity = true;
detect-client-leader = true;

blur-background-exclude = [
  "class_g ?= 'zoom'",
  "name = 'rect-overlay'",
  "_GTK_FRAME_EXTENTS@:c",
  "class_g = 'LibreWolf'",
  "window_type *= 'menu'",
#  "(class_g = 'Firefox' || class_g = 'Thunderbird') && (window_type = 'utility' || window_type = 'popup_menu') && argb",
#  "window_type = 'menu'",
#  "window_type = 'dropdown_menu'",
#  "window_type = 'popup_menu'",
#  "window_type = 'tooltip'",
];

Steps of reproduction

  1. Start picom
  2. Get output

Expected behavior

No warning spam

Current Behavior

Spams:

[ 05/21/2024 22:12:02.008 c2_parse_target WARN ] Type specifier is deprecated. Type "c" specified on target "_GTK_FRAME_EXTENTS" will be ignored, you can remove it.
[ 05/21/2024 22:12:02.334 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 0. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.334 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 05/21/2024 22:12:02.408 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 28826. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.408 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 05/21/2024 22:12:02.457 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 28826. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.457 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 05/21/2024 22:12:02.505 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 28826. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.505 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 05/21/2024 22:12:02.552 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 28826. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.552 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 05/21/2024 22:12:02.597 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 28826. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.597 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 05/21/2024 22:12:02.647 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 28826. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.647 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 05/21/2024 22:12:02.694 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 28826. Possible NVIDIA bug?
[ 05/21/2024 22:12:02.694 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler

Additionally, things like glxgears & vkcube slow to a crawl when picom is active, though I havent tested various configurations to see if they fix all my problems yet!

@ShadiestGoat
Copy link
Author

Update: doesn't happen if vsync = false. Same with the massive performance hit mentioned in the footnote

@noctuid
Copy link

noctuid commented Jun 1, 2024

Welp, I guess I'll set vsync false for now. I'm also seeing this, though my system was often completely freezing not just slowing down.

noctuid added a commit to noctuid/dotfiles that referenced this issue Jun 8, 2024
yshui/picom#1265

After a somewhat recent update, picom with vsync enabled causes
slowdown/freezing, and I'm having other nvidia issues (e.g. various issues with
mpv failing to open).

Because I've disable vsync, screen tearing is very bad by default, so I'm
enabling force full composition pipeline now to prevent it.
@wmkmn
Copy link

wmkmn commented Jul 23, 2024

I started seeing the same issue after upgrading to Nvidia 555.58.02. (Not sure what version I was running before that).

The release notes for that version of the drivers mention: Updated glXWaitVideoSyncSGI() to be more efficient. This reduces frame stutter in some KDE configurations with GSP offload. Which sounds like it might be related.

I tried disabling GSP using the nvidia NVreg_EnableGpuFirmware=0 kernel option. Unfortunately that did not seem to make a difference.

I also tried running picom using the --no-frame-pacing option. That seem to have helped quite a bit. I only got the Duplicate vblank event found message once at startup. No more after that.

@absolutelynothelix
Copy link
Collaborator

it's known to spam in certain cases e.g. when the monitor is turned off. does it spam when you just use the pc normally?

I also tried running picom using the --no-frame-pacing option. That seem to have helped quite a bit. I only got the Duplicate vblank event found message once at startup. No more after that.

afaik --no-frame-pacing should turn off the vblank scheduler so having this message, even only once, is kinda weird.

@absolutelynothelix
Copy link
Collaborator

absolutelynothelix commented Jul 23, 2024

if you'd like to have frame pacing enabled (which is recommended ig as it reduces the latency) and like experiments, you can try setting the PICOM_DEBUG environment variable to force_vblank_sched=present, e.g. PICOM_DEBUG=force_vblank_sched=present picom .... picom will use the x present extension vblank scheduler instead of the glx_sgi_video_sync glx extension one. despite yshui claims it to be unreliable (see this thread for example) i didn't notice much difference myself.

@absolutelynothelix
Copy link
Collaborator

@ShadiestGoat, @noctuid, a less harmful fix for this should be no-frame-pacing.

@wmkmn
Copy link

wmkmn commented Jul 23, 2024

afaik --no-frame-pacing should turn off the vblank scheduler so having this message, even only once, is kinda weird.

Makes sense. I probably misattributed that log message to the wrong picom invocation then.

@wmkmn
Copy link

wmkmn commented Jul 23, 2024

I also tried running picom with force_vblank_sched=present (and frame-pacing enabled again) and that seems to work well for me. No vsync messages in the logs and rendering is smooth, without apparent frame drops.

Thanks for the suggestion @absolutelynothelix.

@yshui
Copy link
Owner

yshui commented Jul 23, 2024

Huh, maybe we should just make present vsync the default?

@absolutelynothelix
Copy link
Collaborator

@yshui, iirc my very scientific very helix benchmarks showed that it's the smart frame pacing that sucks on nvidia no matter what vblank scheduler is used. without it both vblank schedulers work more or less the same at least for me. but maybe more testing from nvidia users is needed.

@yshui
Copy link
Owner

yshui commented Aug 6, 2024

@awused oh btw, can you try PICOM_DEBUG=force_vblank_sched=present too?

@awused
Copy link

awused commented Aug 6, 2024

Sure, though I did notice that, based on the timestamps, #1306 happened while the computer was almost completely idle and no one was home. I would have been connected over ssh but not doing anything graphically intensive, at worst just running a few compilations.

@yshui
Copy link
Owner

yshui commented Aug 6, 2024

This problem is usually triggered by monitors turning off.

@pijulius
Copy link
Contributor

pijulius commented Aug 6, 2024

hi @yshui can confirm this happening to me too:
[ 08/06/2024 14:04:05.212 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 0. Possible NVIDIA bug?
[ 08/06/2024 14:04:05.212 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler
[ 08/06/2024 14:04:05.889 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 0. Possible NVIDIA bug?
[ 08/06/2024 14:04:05.889 sgi_video_sync_scheduler_callback WARN ] Resetting the vblank scheduler

and indeed it happens when monitor is turned off using dpms and then turned on. Unfortunately this freezes picom and so need to kill it to have a working display again.

This started to happen just 2 days ago so it has been affected me with something from the latest changes, really hope it can be fixed, if there is anything to help you track it down please let me know and will try to see what I can find.

Thanks again for all your hard work!

@pijulius
Copy link
Contributor

pijulius commented Aug 6, 2024

OOO, seems like this fixes the problem:

PICOM_DEBUG=force_vblank_sched=present
but makes the animation to look really bad, like they are skipping frames. PS: on a 120hz monitor, also tried this:

--no-frame-pacing
and this seems to fix it too and animations still work just fine so will have it running like this and let you know if still facing the problem.

@absolutelynothelix
Copy link
Collaborator

jesus christ, how about we just pretend that nvidia doesn't exist?

@yshui
Copy link
Owner

yshui commented Aug 6, 2024

guess the idea of using present by default is out the window then :/

need to find a better way to mitigate the bad nvidia behavior...

@yshui
Copy link
Owner

yshui commented Aug 7, 2024

OK, I am going to test some different strategies and see what works.

I created a branch called nvidia-pain, please try that and tell me what it looks like. It's probably not going to work, but it would be useful to know how it reacts.

@pijulius
Copy link
Contributor

hi @yshui Thanks for the quick fix, can confirm that nvidia-pain does fix the problem and all works well BUT: the problem for me wasn't caused by your code but by the new nVidia driver.

So with the latest NVIDIA-Linux-x86_64-550.107.02 I got the following errors:

  1. The above error where waking up from suspend
  2. vsync=true makes the whole animations sluggish (looks really bad)
  3. vsync=false makes animations look better but cpu usage goes up when monitor off and also when for e.g. i3lock used and so on

ALL these errors go away if reverted to NVIDIA-Linux-x86_64-550.100 so not sure what nvidia did but the new driver seems to cause a lot of problems irrelevant to your coding.

@yshui
Copy link
Owner

yshui commented Aug 11, 2024

@pijulius

wasn't caused by your code but by the new nVidia driver.

yeah it is the new nvidia driver we are trying to fix (or rather, workaround) here.

I got the following errors:

Are these results from when you run without PICOM_DEBUG=force_vblank_sched=present?

@pijulius
Copy link
Contributor

@yshui

Are these results from when you run without PICOM_DEBUG=force_vblank_sched=present?

yes, simply running picom-nvidia-pain without any arguments at all. If I run normal picom with the same config on the 550.100 driver there are no problems at all.

please note: it's not just the above errors, for e.g. full screen animations are slow as hell like i3lock, also when unlocking the keypress events are shown almost like back in time and also flameshot is waay slower, for both launching or selecting an area on the screen so seems like something global is going on.

@pijulius
Copy link
Contributor

hi @yshui this
next...nvidia-pain

seems to be good to go in as do have better results if I'm using this version than the old one. Unfortunately noticed some inconsistency between old and new nvidia drivers, even new one does work time to time and old one does brake time to time so can't replicate that reliably but do think/notice that with this nvidia-pain I do get better results so far it always woke up with this but did not with the old versions.

thank you for the quick fix!

@awused
Copy link

awused commented Sep 18, 2024

After updating to nvidia 560, the issue I had in #1306 seemed to go away, even without the nvidia-pain branch or any special parameters. But since nvidia 560 is broken in other ways and I downgraded to 555 that issue has come back so I'm back to using special parameters.

I'm not sure if whatever changes that happened in nvidia-pain actually need to be merged into the main branch. nvidia has just had a bad couple of drivers and it seems like it'll probably work itself out. That or 560's breakage could have been masking the breakage in here or #1306. Hard to say with confidence until the next driver revision.

@bubbleguuum
Copy link

Continuing discussion from #1367.

Any plan for a workaround to detect sgi_video_sync_scheduler_callback being called repeatedly in a busy loop when the screen is off and avoiding high CPU usage because of it ?

@yshui
Copy link
Owner

yshui commented Oct 17, 2024

@bubbleguuum can you test the nvidia-pain branch mentioned earlier in this issue? remember to remove PICOM_DEBUG=force_vblank_sched=present

@bubbleguuum
Copy link

bubbleguuum commented Oct 17, 2024

With the nvidia-pain version, when the screen is off the console still spams:

[ 10/17/2024 15:35:11.574 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 41168. Possible NVIDIA bug?
[ 10/17/2024 15:35:11.590 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 41169. Possible NVIDIA bug?
[ 10/17/2024 15:35:11.607 sgi_video_sync_scheduler_callback WARN ] Duplicate vblank event found with msc 41170. Possible NVIDIA bug?

However (with screen off), picom CPU usage (monitored via ssh terminal running on another machine) is low, being most of the time at 0% and occasionally reported at 10%.

When I wake up the screen, the log traces above keep being spammed with CPU usage being constant 10%.

That's different than the v12.3 version where when I wake up the screen, spam log traces immediately stop and CPU usage is back to normal.

@yshui
Copy link
Owner

yshui commented Oct 17, 2024

ok, at least it fixes the 100% cpu usage. i think i know what is going on with the log spam.

@yshui
Copy link
Owner

yshui commented Oct 17, 2024

@bubbleguuum i've updated nvidia-pain, can you test again?

@bubbleguuum
Copy link

I confirm it fixes the problem: no more log spam and normal CPU usage when waking up the screen.

@bubbleguuum
Copy link

Also commenting line below that is spammed when screen is off makes picom use 0% CPU all the time when the screen is off (otherwise it reports about 10% briefly once every 20-30 seconds).

log_warn("Duplicate vblank event found with msc %d. Possible NVIDIA bug? "
               "Number of duplicates so far: %d",
               msc, sched->vblank_inserted);

@yshui
Copy link
Owner

yshui commented Oct 17, 2024

i feel 10% cpu every 20-30 seconds is a fair price to pay if we get to complain about nvidia 😈

@bubbleguuum
Copy link

bubbleguuum commented Oct 17, 2024

I would only display it once but it's your call. I could see it spam log files. Thanks for the fix anyway !

@yshui
Copy link
Owner

yshui commented Oct 17, 2024

that's very fair, i will do that.

@yshui yshui closed this as completed in e65ebd7 Oct 17, 2024
@yshui
Copy link
Owner

yshui commented Oct 17, 2024

closed, thanks for testing the fix

yshui added a commit that referenced this issue Oct 17, 2024
We used to teardown the whole vblank thread and restart it every time we
got a duplicate msc. This used to work OK, but newer NVIDIA drivers
broke this. And to recap, simply wait for vblank again upon reception
of duplicate MSCs _does not_ work either, and will just stuck us in an
infinite loop.

After some experimentation, I found that rendering a new frame gets us
out of the infinite duplicate msc loop. So that's what we do now, i.e.
inserting a synthetic vblank to trigger a new frame. Some care is taken
to make sure synthetic vblanks' msc numbers don't conflict with real
ones.

Fixes #1265

Signed-off-by: Yuxuan Shui <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants