Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nVidia "Threaded Optimization" Driver Setting Intefering with Godot Preload Capabilities #45660

Open
Pixelmusement opened this issue Feb 2, 2021 · 7 comments · Fixed by #71472

Comments

@Pixelmusement
Copy link

Godot version: 3.2.3.stable.official (Steam Release)

OS/device including version:
Windows 10 Pro 19041.423
CPU: AMD FX-8350 4 GHz
GPU: GeForce GTX 1070 - 8 GB VRAM
RAM: 16 GB
Using an HDD, not SSD, for Storage

Issue description:
Just a couple days ago I made a report on an issue I was having with the particle system which didn't seem related to shader caching. I reported it here: #45613 . It is important to keep in mind while I explain the following issue that it seems the particle issue I reported may or may not be related, it's actually really hard to tell.

Anyways, after some investigation I discovered I was getting hiccups with instancing literally ANYTHING which was going to render a shader of any kind, it was just much more noticeable with particles for whatever reason. These hiccups were only occurring the first time an instanced object would become visible in a scene, so naturally the thought would be a caching issue and that stuff would need to be preloaded... except... I AM preloading things, and if I run my projects in a window instead of full screen, these hiccups disappear. Some mysterious difference was occurring between running my projects in a window versus full screen which was causing preloading/caching to fail.

After MANY hours of investigation, I have discovered something insidious: nVidia GeForce drivers have a setting called "Threaded Optimization", normally set to "Auto" which is used to spread CPU usage by hardware accelerated games over multiple cores. 99% of the time, changing this has zero effect on the performance of a game, though it often changes how the CPU usage for a game is spread across a multi-core processor.

...however, if I force "Threaded Optimization" into an "Off" state in my driver settings, the hiccups I was having with first-time instancing of supposedly preloaded objects while running my Godot projects full screen disappears!

So... there seems to be some sort of incompatibility with preloading assets in Godot, while running a Godot app full screen, on a GeForce card, with the card's Threaded Optimization driver setting set to on or auto, keeping in mind, it's set to "Auto" by default.

...I don't have a clue how this is supposed to be addressed by the Godot devs, since the problem is a driver setting. If this was just a single game you'd just contact nVidia and say, "Hey, could you include a profile which disables Threaded Optimization for our game? Thanks!" ...but for a game ENGINE, which makes multiple games, I really don't know what the solution here is, other than to instruct players how to disable this setting manually. :/

Steps to reproduce:

  1. This must be done on a system with a GeForce graphics chipset.
  2. Go into the nVidia driver settings and set the global profile's "Threaded Optimization" setting to "Auto" or "On". It should already be "Auto" by default.
  3. Ensure that Godot Engine itself and any projects made with it are NOT in the list of settings for individual programs in the nVidia driver settings.
  4. Open any Godot project which has the ability to instance objects on command and run it full screen.
  5. The very first time a shader appears on screen when instancing an object, there will be a very slight hiccup in the framerate. Subsequent instances of the object will not produce this skip in the framerate.
  6. Go back into the editor and switch the project to run in a window. The hiccups will no longer occur.
  7. Go back into the editor and go back to running full screen, but also go into the nVidia driver settings again and set "Threaded Optimization" to "Off". Run the project and the hiccups remain gone.
  8. Turn Threaded Optimization back to "On" or "Auto" and the hiccups return.

Minimal reproduction project:
Literally any project which instances objects with unique shaders applied to them and is run full screen will trigger this issue.

@Calinou
Copy link
Member

Calinou commented Feb 2, 2021

The issue will likely go away in 4.0 thanks to the Vulkan renderer (and shaders being compiled on scene load rather than during gameplay), but I'm not sure if it can be fixed before 4.0 unless ANGLE is used.

@Calinou
Copy link
Member

Calinou commented Jun 19, 2022

@Pixelmusement Can you test 3.5.rc4 and see if the problem persists in GLES3 in 3D? Asynchronous shader compilation and caching is now enabled by default for 3D in GLES3.

@Pixelmusement
Copy link
Author

Pixelmusement commented Jun 20, 2022

@Pixelmusement Can you test 3.5.rc4 and see if the problem persists in GLES3 in 3D? Asynchronous shader compilation and caching is now enabled by default for 3D in GLES3.

...umm... I'm not sure I can...? So, I was still having this issue last I tested a few days ago with Godot 3.4.3, but upon loading up 3.4.3 just now there's absolutely ZERO preloading or shader caching issues; Everything is running smooth as silk no matter how many settings I change in Godot, my GPU drivers, restarts of the software, anything... so... I'm a bit confused as to what's even going on...??? o_O

I wouldn't think it would have anything to do with anything else being cached because I was restarting Godot many times when I was last testing things out and was constantly getting lag spikes upon first-rendering of things every time I restarted Godot itself, but that's not happening now, even if I disable all preloading in the test I have set up... weird...

So yeah, since nothing's wrong in 3.4.3 at the moment I can't really compare with 3.5.rc4... :|

EDIT: However, I'm testing out 3.5.rc4 anyways and have noticed that if you go back to synchronous shader compiling it's CONSTANTLY recompiling shaders for newly instanced scenes if coupled with the multi-threaded thread model. (IE: Getting a lag spike every single time I fire a simple projectile, despite being preloaded.) Single-threaded this doesn't happen... might not be the shaders but everything's preloaded so I'm not sure what else it would be.

EDIT 2: Oh geeze... things are even WORSE with async shader compiling... Not only do things not appear right for the first couple seconds but I'm STILL getting lag spikes when things first come into view but now it's MULTIPLE lag spikes in rapid succession. I have noticed something very pertinent though: When I first run a Godot program full screen, sometimes the screen flashes black for a moment and sometimes it doesn't. Any time it flashes black, I get the lag spikes. Any time it does NOT flash black, no lag spikes. I don't have a clue what's going on with that...

EDIT 3: OK, by forcing the shader fallbacks on I can at least confirm a lot of what I'm experiencing at the moment in rc4 is NOT a shader issue, since I would assume the fallback shaders don't need to be compiled every time something newly comes into view. Still investigating.

EDIT 4: I don't think shaders are the issue anymore, but it's hard to tell because everything I try to load or preload doesn't ACTUALLY get loaded or preloaded until it first appears on-screen, then lag spikes, and now it remains cached until I quit Godot entirely, so I've been having to constantly quit and restart Godot to test this out and nothing I try is making loading or preloading work properly. Heck, I even tried using ResourceLoader.has_cached() to confirm that something IS loaded, it SAYS it is, then the instant it appears on screen, lag spike. This is getting ridiculous... >_<

@Pixelmusement
Copy link
Author

Yeah, I've spent over two hours trying to figure out what's going on here. Best I can tell, the async shader system seems to be working fine but preloading and loading just don't seem to be doing anything so I can't really tell, plus having to save-quit-restart after every single test isn't helping. :(

@Pixelmusement
Copy link
Author

Pixelmusement commented Jun 20, 2022

OK, I figured out a way to narrow this down. I took the basic FPS arena I've got made and have two orbs appear within it at specific time intervals, both of which are separate objects and meshes. I tested every combination of sync/async shader modes with different materials applied and came up with the following results, keeping in mind Async produced the same results as Async + Cache, plus forcing the fallback shaders had no effect in all cases:

Synchronous - Arena Material A - Orb 1 Material B - Orb 2 Material C:
! Lag Spike on Orb 1 Appear
! Lag Spike on Orb 2 Appear

Asynchronous - Arena Material A - Orb 1 Material B - Orb 2 Material C:
! Lag Spike on Orb 1 Appear
! Lag Spike on Orb 2 Appear

Synchronous - Arena Material A - Orb 1 Material B - Orb 2 Material B:
! Lag Spike on Orb 1 Appear
$ No Lag on Orb 2 Appear

Asynchronous - Arena Material A - Orb 1 Material B - Orb 2 Material B:
! Lag Spike on Orb 1 Appear
$ No Lag on Orb 2 Appear

Synchronous - Arena Material A - Orb 1 Material A - Orb 2 Material B:
! Lag Spike on Orb 1 Appear
! Lag Spike on Orb 2 Appear

Asynchronous - Arena Material A - Orb 1 Material A - Orb 2 Material B:
$ No Lag on Orb 1 Appear
! Lag Spike on Orb 2 Appear

There's clearly some weirdness going on given that there's lag spikes if the first orb which appears uses the same material as the surrounding arena in synchronous mode but not in async mode, yet it's definitely shader related given that the second orb only lag spikes if its material doesn't match the first orb's material.

@Pixelmusement
Copy link
Author

I'm continuing to test this and am noticing that shaders are cached separately for different scenes, sync or async. I had the thought to make a "Loading" scene but that doesn't seem to be doing the trick. :(

Also, I noticed there's a way to have the debug output show how many shaders are presently being compiled asynchronously, but is there a way to check that count in GDScript?

@Pixelmusement
Copy link
Author

OK... umm... This is a little embarasing but it turns out one of the meshes I was using for testing all of this was behaving... oddly, to put it mildly, leading to lag spikes for inconsistent an inexplicable reasons which had me testing for hours trying to figure out what the cause was. If I replace that mesh with another, built in or my own, everything starts working as expected. If I then put the old mesh back, even though it LOOKS PERFECTLY FINE, weird lag spikes again.

So, where I'm at with testing here is that async shader compilation does work and does help keep things cached for when new objects use the same shaders, but there's still significant lag when loading shaders in for the first time.

To compensate, I've put together a "Loading" scene which basically contains simple meshes with all of my materials on it, which I'm planning to add particles to in the future, and slots into the actual scene to run, as switching scenes seems to ignore the cache as it appears to be scene specific, though I haven't yet tested that fully now that I've found out one of my debugging meshes was broken. The loader simply pauses everything, waits, then unpauses everything, sets camera control to the player, and removes itself from memory. This works both with sync and async shaders.

THAT SAID, there does seem to be a lag spike issue with this VERY specific combination of events:

  • Running the Multi-Threaded thread model
  • Running with Synchronous shaders
  • Running full screen
  • Instancing a scene object with a particle emitter

This specific combination is causing tiny lag spikes every time the scene object with the particles is instanced. If ANY of those factors are not true, single-threaded, async shaders, or running in a window, these lag spikes do not happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants