Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.4 beta 4] iOS: Metal shader compilation warnings and unexpected compilation amount #103006

Closed
georgwacker opened this issue Feb 18, 2025 · 33 comments · Fixed by #103185
Closed

Comments

@georgwacker
Copy link
Contributor

Tested versions

  • tested Metal iOS 4.4 beta 4
  • comparing MoltenVK iOS 4.3-stable

System information

Godot v4.4.beta4 - macOS Sonoma (14.7.2) - Multi-window, 2 monitors - Vulkan (Mobile)

Issue description

So far, I've been using MoltenVK on iOS with version 4.3-stable.

Doing manual particle preloading I'm getting 4 compilation warnings from MoltenVK for one of each particle system.

Switching to 4.4 beta4 using Metal with pipeline caching and no manual preloading is showing 78 compilation warnings on the first launch, which is very slow (logs down below).

I even got these warnings when running in Release mode via Xcode.

  • Should these warnings be suppressed?
  • Are the amount of compilations out of the ordinary? I only have 4 particle systems and the game is fully 2D with control nodes (no sprites), so I'm not sure where the 78 shader compilations are coming from?
Log (truncated)
Godot Engine v4.4.beta4.official.93d270693 - https://godotengine.org
Metal 3.2 - Forward Mobile - Using Device #0: Apple - Apple A15 GPU (Apple8)
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
Warning: Compilation succeeded with: 

program_source:68:11: warning: unused variable 'DTid' [-Wunused-variable]
    uint3 DTid = gl_GlobalInvocationID;
          ^
Warning: Compilation succeeded with: 

program_source:68:11: warning: unused variable 'DTid' [-Wunused-variable]
    uint3 DTid = gl_GlobalInvocationID;
          ^
Warning: Compilation succeeded with: 

program_source:189:12: warning: unused variable 'instance_custom' [-Wunused-variable]
    float4 instance_custom = float4(0.0);
           ^
program_source:237:11: warning: unused variable 'bones' [-Wunused-variable]
    uint4 bones = uint4(0u);
          ^
program_source:238:12: warning: unused variable 'bone_weights' [-Wunused-variable]
    float4 bone_weights = float4(0.0);
           ^
program_source:240:11: warning: unused variable 'point_size' [-Wunused-variable]
    float point_size = 1.0;
          ^
program_source:107:15: warning: unused variable 'pso_sc_packed_0' [-Wunused-const-variable]
constant uint pso_sc_packed_0 = is_function_constant_defined(pso_sc_packed_0_tmp) ? pso_sc_packed_0_tmp : 0u;
              ^
Warning: Compilation succeeded with: 

program_source:212:15: warning: unused variable 'cVdotH' [-Wunused-variable]
        float cVdotH = fast::max(dot(view, half_vec), 0.0);
              ^
program_source:213:15: warning: unused variable 'cLdotH' [-Wunused-variable]
        float cLdotH = fast::max(dot(light_vec, half_vec), 0.0);
              ^
program_source:447:12: warning: unused variable 'screen_uv' [-Wunused-variable]
    float2 screen_uv = float2(0.0);
           ^
program_source:450:11: warning: unused variable 'normal_map_depth' [-Wunused-variable]
    float normal_map_depth = 1.0;
          ^
Warning: Compilation succeeded with: 

program_source:190:12: warning: unused variable 'instance_custom' [-Wunused-variable]
    float4 instance_custom = float4(0.0);
           ^
program_source:238:11: warning: unused variable 'bones' [-Wunused-variable]
    uint4 bones = uint4(0u);
          ^
program_source:239:12: warning: unused variable 'bone_weights' [-Wunused-variable]
    float4 bone_weights = float4(0.0);
           ^
program_source:107:15: warning: unused variable 'pso_sc_packed_0' [-Wunused-const-variable]
constant uint pso_sc_packed_0 = is_function_constant_defined(pso_sc_packed_0_tmp) ? pso_sc_packed_0_tmp : 0u;
              ^
Warning: Compilation succeeded with: 

program_source:196:12: warning: unused variable 'instance_custom' [-Wunused-variable]
    float4 instance_custom = float4(0.0);
           ^
program_source:226:11: warning: unused variable 'bones' [-Wunused-variable]
    uint4 bones = uint4(0u);
          ^
program_source:228:11: warning: unused variable 'point_size' [-Wunused-variable]
    float point_size = 1.0;
          ^
program_source:111:15: warning: unused variable 'pso_sc_packed_0' [-Wunused-const-variable]
constant uint pso_sc_packed_0 = is_function_constant_defined(pso_sc_packed_0_tmp) ? pso_sc_packed_0_tmp : 0u;
              ^
Warning: Compilation succeeded with: 

program_source:255:15: warning: unused variable 'cVdotH' [-Wunused-variable]
        float cVdotH = fast::max(dot(view, half_vec), 0.0);
              ^
program_source:256:15: warning: unused variable 'cLdotH' [-Wunused-variable]
        float cLdotH = fast::max(dot(light_vec, half_vec), 0.0);
              ^
program_source:570:12: warning: unused variable 'screen_uv' [-Wunused-variable]
    float2 screen_uv = float2(0.0);
           ^
program_source:573:11: warning: unused variable 'normal_map_depth' [-Wunused-variable]
    float normal_map_depth = 1.0;
          ^
Warning: Compilation succeeded with: 

program_source:196:11: warning: unused variable 'bones' [-Wunused-variable]
    uint4 bones = in.bone_attrib;
          ^
program_source:197:12: warning: unused variable 'bone_weights' [-Wunused-variable]
    float4 bone_weights = in.weight_attrib;
           ^
program_source:251:11: warning: unused variable 'point_size' [-Wunused-variable]
    float point_size = 1.0;
          ^
program_source:77:15: warning: unused variable 'pso_sc_packed_0' [-Wunused-const-variable]
constant uint pso_sc_packed_0 = is_function_constant_defined(pso_sc_packed_0_tmp) ? pso_sc_packed_0_tmp : 0u;
              ^
Warning: Compilation succeeded with: 

program_source:197:11: warning: unused variable 'bones' [-Wunused-variable]
    uint4 bones = in.bone_attrib;
          ^
program_source:198:12: warning: unused variable 'bone_weights' [-Wunused-variable]
    float4 bone_weights = in.weight_attrib;
           ^
program_source:77:15: warning: unused variable 'pso_sc_packed_0' [-Wunused-const-variable]
constant uint pso_sc_packed_0 = is_function_constant_defined(pso_sc_packed_0_tmp) ? pso_sc_packed_0_tmp : 0u;
              ^
Warning: Compilation succeeded with: 

program_source:182:15: warning: unused variable 'cVdotH' [-Wunused-variable]
        float cVdotH = fast::max(dot(view, half_vec), 0.0);
              ^
program_source:183:15: warning: unused variable 'cLdotH' [-Wunused-variable]
        float cLdotH = fast::max(dot(light_vec, half_vec), 0.0);
              ^
program_source:457:12: warning: unused variable 'screen_uv' [-Wunused-variable]
    float2 screen_uv = float2(0.0);
           ^
program_source:460:11: warning: unused variable 'normal_map_depth' [-Wunused-variable]
    float normal_map_depth = 1.0;
          ^
Warning: Compilation succeeded with: 

program_source:195:12: warning: unused variable 'instance_custom' [-Wunused-variable]
    float4 instance_custom = float4(0.0);
           ^
program_source:225:11: warning: unused variable 'bones' [-Wunused-variable]
    uint4 bones = uint4(0u);
          ^
program_source:227:11: warning: unused variable 'point_size' [-Wunused-variable]
    float point_size = 1.0;
          ^
program_source:111:15: warning: unused variable 'pso_sc_packed_0' [-Wunused-const-variable]
constant uint pso_sc_packed_0 = is_function_constant_defined(pso_sc_packed_0_tmp) ? pso_sc_packed_0_tmp : 0u;
              ^
[... truncated]

metal_4.4b4_log.txt

Steps to reproduce

Minimal reproduction project (MRP)

@Calinou
Copy link
Member

Calinou commented Feb 18, 2025

Can you compare with 4.4beta4 on MoltenVK? You can switch back in the Project Settings using Rendering Driver overrides for macOS and iOS.

This is likely a consequence of the new ubershaders in 4.4, although I'm surprised they are compiled when 3D rendering is not used. I guess they are still needed when you use GPUParticles?

@georgwacker
Copy link
Contributor Author

In the project settings "rendering/rendering_device/driver.ios" is already set to vulkan and is the only option for me, probably because I'm running the editor under x86, not on ARM. The iOS export seems to automatically run on Metal, regardless.

Is that the correct settings path for overriding it in theory?

I'm using 4 x GPUParticles2D, which are all in one scene, but only one gets set to emitting upon an initializer. Maybe that is causing all these permutations for compilation?

@Calinou
Copy link
Member

Calinou commented Feb 18, 2025

Is that the correct settings path for overriding it in theory?

Yes. We should probably change the setting hint to allow setting Metal even in x86, so that you can export Metal even if you don't run it yourself.

cc @stuartcarnie

@stuartcarnie
Copy link
Contributor

Good to know, thanks.

The warnings are ok – I will look at whether we can suppress them in release builds. An SPIR-V optimiser would reduce them significantly, as we could enabled a few passes, like dead code elimination.

@bruvzg
Copy link
Member

bruvzg commented Feb 18, 2025

Multiple unused variable warnings were always present with MoltenVK (on macOS as well if you enable verbose logging), it should be fine.

We should probably change the setting hint to allow setting Metal even in x86, so that you can export Metal even if you don't run it yourself.

It's a bit more complex, it will use Metal on iOS even if you set it to Vulkan on x86_64 Mac. Since Vulkan is the default value on x86_64, it's not saved in the config (and default on iOS is Metal). We probably should show always show all available values for both macOS and iOS and always have the same default, and instead auto fallback to Vulkan on x86-64 (#102341 already do the fallback, but a warning print probably should be added to avoid confusion).

@Calinou
Copy link
Member

Calinou commented Feb 18, 2025

and instead auto fallback to Vulkan on x86-64 (#102341 already do the fallback, but a warning print probably should be added to avoid confusion).

This would always print a warning on every startup on x86_64 hardware by default, so I'm not sure.

What we can do though is amend the rendering driver startup line with a notice about the fallback being applied. Something like this:

Godot Engine v4.4.beta.custom_build.e0cf7853b (2025-02-18 21:17:43 UTC) - https://godotengine.org
OpenGL API 3.3.0 NVIDIA 565.77 - Compatibility - Using Device: NVIDIA - NVIDIA GeForce RTX 4090 (fallback from Vulkan)

@akien-mga
Copy link
Member

akien-mga commented Feb 18, 2025

We should probably change the setting hint to allow setting Metal even in x86, so that you can export Metal even if you don't run it yourself.

It's a bit more complex, it will use Metal on iOS even if you set it to Vulkan on x86_64 Mac. Since Vulkan is the default value on x86_64, it's not saved in the config (and default on iOS is Metal). We probably should show always show all available values for both macOS and iOS and always have the same default, and instead auto fallback to Vulkan on x86-64 (#102341 already do the fallback, but a warning print probably should be added to avoid confusion).

I confirm we shouldn't make the default value or available hints depend on th editor host, as we see here that's limiting proper configuration.

I would make the default "auto" for macOS, which would be Metal on arm and Vulkan on x86_64. For iOS, the default should be Metal and Vulkan should be available to select as option (so no need for "auto" there I believe).

For 4.5, I think we should really implement a hint so that rendering method and drivers always get written to project.godot even when using default values. I thought we had a proposal for that but I couldn't find it (GH search isn't super helpful).

@bruvzg
Copy link
Member

bruvzg commented Feb 19, 2025

I would make the default "auto" for macOS, which would be Metal on arm and Vulkan on x86_64. For iOS, the default should be Metal and Vulkan should be available to select as option (so no need for "auto" there I believe).

This is probably better, and with "auto" we do not need any warning.

@bruvzg
Copy link
Member

bruvzg commented Feb 19, 2025

#103026

@georgwacker
Copy link
Contributor Author

I've tested the override by manually editing the project file with closed editor, but the game still starts up with Metal, so it doesn't seem to respect the override currently.

[rendering]   
rendering_device/driver.ios="vulkan"

Regarding the ubershader compilations, I wonder how the pipeline and specializations could be manually tweaked in the future. It is nice that particle preload basically happens automatically in 4.4, but the 78 "compilation succeeded" messages suggest, that a lot of unnecessary features get pre-compiled, that will never get used by a mostly Control-Node based game.

For comparison in 4.3 I only ever get the shader compilations for the particle systems. I need to do some proper timed startup test next.

Out of interest, I've compiled a custom iOS export with disable_3d=yes but that didn't reduce the shader compilations.

@georgwacker
Copy link
Contributor Author

I've done some profiling with Instruments, testing the first run (app was removed before each test):

4.4 b4: 16s until menu (ubershader pipeline, 78 compilations, no manual preloading)
4.3: 4s until menu (no manual particle preloading)
4.3: 4s until menu, followed by 4.3s hang in menu for manual preloading (4 compilations)

So it seems the baseline is 4s to get to the menu, but the additional ubershader compilation in 4.4b4 takes 12s compared to the 4.3s of the manual preload.

@georgwacker
Copy link
Contributor Author

I've forced the DisplayServerIOS to vulkan with a custom build of 4.4b4 and I'm getting 4.21s unitl menu with 4 compilations only (one for each particle system, done automatically).

So it seems all those extra compilations only happen on Metal?

@clayjohn
Copy link
Member

@georgwacker How are you measuring shader compiles?

I am a bit confused since the way we compile particle shaders hasn't changed between 4.3 and 4.4. The ubershader system applies to the shaders we use for drawing 3D meshes.

So are you measuring all shader compiles somehow? And if you are, how are you doing it? Latter 4.4 releases can track shader compiles in the monitors, but that didn't exist in 4.3.

@georgwacker
Copy link
Contributor Author

georgwacker commented Feb 20, 2025

Ah, I wasn't aware that the ubershader system is not used for particle shaders. But it must be related to the new pipeline cache system?

I'm testing cold bootup time to menu with no shader cache in Instruments and looking at Xcode logs for "compilation succeeded" messages. Below is the 16s "severe hang" before reaching the menu.

Image

When running under Metal, it shows 78 compilations vs. the 4 compilations under Vulkan, so I presumed the additional time is due to those additional compilations. But the slow bootup can be something else related to Metal, perhaps?

Edit: Those Points of Interests in the trace are all create_pipeline calls.

@stuartcarnie
Copy link
Contributor

What version of MoltenVK are you using to build your application?

I have not verified this, but can you try running Metal and Vulkan with the Metal compilation cache completely disabled by setting the MTL_SHADER_CACHE_SIZE environment variable to 0. This undocumented feature was referenced in this comment.

There shouldn't be any reason that Metal is compiling more shaders than Vulkan, as it is driven by Godot's rendering driver. I would also expect MoltenVK should be compiling a lot more than 4 shaders on cold startup.

@stuartcarnie
Copy link
Contributor

stuartcarnie commented Feb 20, 2025

As noted in #96052, from a cold startup, Metal should be faster than Vulkan, which was also confirmed by another user. Indeed, this was only validated on macOS, as it is easy to clear the Metal shader cache as noted in the Testing section of the PR description. I don't know how easy that is to test on iOS, which after various runs, and without rebooting the entire device, your results may be affected by previous runs. I'm hopeful that MTL_SHADER_CACHE_SIZE environment variable will help.

I will run those tests again on macOS using master, to make sure there hasn't been any regressions.

@clayjohn
Copy link
Member

@georgwacker thank you for your response.

Indeed, I think Stuart is on the right track. It sounds like some sort of system caching is working successfully in MoltenVK that isn't successful in our Metal backend. The actual number of pipelines compile requests should be the same between them.

@stuartcarnie do you know if the reported compilation number in XCode is for pipelines that were compiled from scratch (as opposed to loaded from cache)?

@stuartcarnie
Copy link
Contributor

@georgwacker try setting this environment variable when you run your iOS app from a cold start:

GODOT_MTL_SHADER_LOAD_STRATEGY=lazy

I'll elaborate in a follow-up comment, but it should make a significant difference.

@stuartcarnie
Copy link
Contributor

I have determined the difference, which I identified in #96052:

Take note of the section summary, for the shader_compile statistics, that indicate 936 unique shaders were compiled.

It turns out that only 250 shaders (26%) of the requested shaders are used by the editor.

The same goes at runtime, where Godot will request that the driver compile a shader, but may never use them in a pipeline, at least not immediately. One aspect I expect would be all the shader variants.

More specifically, Godot will ask the RenderDeviceDriver, specifically Metal, Vulkan or D3D12, to compile the shader via the shader_create_from_bytecode:

RDD::ShaderID RenderingDeviceDriverMetal::shader_create_from_bytecode(const Vector<uint8_t> &p_shader_binary, ShaderDescription &r_shader_desc, String &r_name, const Vector<ImmutableSampler> &p_immutable_samplers) {

which for the Vulkan driver won't do much, but for Metal, it will ask for a new MTLLibrary. This request results in a compilation of that library, which will show up in the Metal Shader Compiler graph:

Image

and associated log as Create Metal Library (Godot (PID)).

I implemented an alternative library loading strategy that compiles the MTLLibrary and shaders when Godot creates the render or compute pipeline. That can be triggered by specifying the following environment variable:

GODOT_MTL_SHADER_LOAD_STRATEGY=lazy

This behaviour more closely matches MoltenVK's implementation, which will also delay MTLLibrary creation until the pipeline is created. The downside of this approach is that a lot of the render and compute pipeline creation in Godot is single-threaded, so we have to wait for the MTLLibrary and then associated vertex, fragment or compute shader to compile, when requesting the shader pipeline. This is the job of the Metal shader compiler services you see running in the activity monitor.

Note

On a desktop machine, there are significantly more Metal shader compiler services available for concurrent compilation, whereas there are only 2 on iOS devices, from what I learned from Apple. I found that lazy had a negative impact on Godot editor startup, as we don't concurrently create pipelines, so we lose a lot of parallelisation.

We tell Metal to maximise compilation services with the following API (macOS only):

#if TARGET_OS_OSX
if (@available(macOS 13.3, *)) {
[id<MTLDeviceEx>(metal_device) setShouldMaximizeConcurrentCompilation:YES];
}
#endif

I further validated this strategy with the Bistro demo, by analysing the Metal compilations from cold start for the metal driver, the vulkan driver and the metal driver with the lazy strategy enabled. This data was pulled from the Metal Shader Compiler graph in Instruments.

Pay attention to the Create MTLibrary counts

Metal Cold Start

The default MTLLibrary creation behaviour.

┌────────────────────────────────────────┬──────────────┐
│                 Source                 │ count_star() │
│                varchar                 │    int64     │
├────────────────────────────────────────┼──────────────┤
│ Compile Compute shader (Godot (5032))  │          165 │
│ Compile Fragment shader (Godot (5032)) │          170 │
│ Compile Vertex shader (Godot (5032))   │          143 │
│ Create MTLibrary (Godot (5032))        │         1487 │
└────────────────────────────────────────┴──────────────┘

Metal Cold Start (lazy)

With the environment variable set to lazy. Notice the significance drop in MTLLibrary compilations.

┌────────────────────────────────────────┬──────────────┐
│                 Source                 │ count_star() │
│                varchar                 │    int64     │
├────────────────────────────────────────┼──────────────┤
│ Compile Compute shader (Godot (7482))  │          165 │
│ Compile Fragment shader (Godot (7482)) │          170 │
│ Compile Vertex shader (Godot (7482))   │          145 │
│ Create MTLibrary (Godot (7482))        │         1010 │
└────────────────────────────────────────┴──────────────┘

Vulkan Cold Start

┌────────────────────────────────────────┬──────────────┐
│                 Source                 │ count_star() │
│                varchar                 │    int64     │
├────────────────────────────────────────┼──────────────┤
│ Compile Compute shader (Godot (5203))  │          166 │
│ Compile Fragment shader (Godot (5203)) │          167 │
│ Compile Vertex shader (Godot (5203))   │          141 │
│ Create MTLibrary (Godot (5203))        │          941 │
└────────────────────────────────────────┴──────────────┘

Solution

@clayjohn

I can expose the compilation behaviour as a driver-specific project setting so users can override it. For iOS it would default to lazy and for desktop, it can stay as the current behaviour. Users can change the setting if iOS increases concurrency or the find that macOS starts faster using the alternative approach for their specific project.

@stuartcarnie
Copy link
Contributor

As an aside, the Metal driver uses considerably fewer resources:

Image

than MoltenVK for the Bistro demo:

Image

That doesn't necessarily mean it's better, but it is significant.

@clayjohn
Copy link
Member

@stuartcarnie I think we need to re-evaluate some of our decisions in light of the Ubershader stuff.

I forgot about iOS' limit of 2 concurrent pipeline compiles. It really complicates the async compilation approach. We rely on being able to throw a bunch of stuff at the driver and then just use the results when ready.

But, ultimately we can now distinguish between Ubershader compiling and optimized pipelines compiling.

Ubershaders will be loaded at load time or the first frame, we need to do more compilations than in 4.3, but it should be fine (Metal and Vulkan should behave the same). When we compile the ubershaders we need to compile all the variants they need (I.e. the pipeline variants should be loaded ASAP). But then at run time the optimized variants should be scheduled to compile with as little overhead as possible.

I'm not sure I fully understand this lazy compile strategy. But if it allows us to defer the cost of creating pipelines, then that sounds like the right approach. Ideally, any cost from creating the optimized pipelines should be deferred and should be constrained to a background thread.

I don't think we need to expose a setting for this. I think we can design a solution specifically for iOS, since it is a unique platform. What we have now works great for MacOS, so let's just figure out the minimal set of changes needed for iOS and then try it out

@kisg
Copy link
Contributor

kisg commented Feb 21, 2025

Would the shader baking PR improve on this? #102552

@georgwacker
Copy link
Contributor Author

georgwacker commented Feb 21, 2025

@georgwacker try setting this environment variable when you run your iOS app from a cold start:

GODOT_MTL_SHADER_LOAD_STRATEGY=lazy

I'll elaborate in a follow-up comment, but it should make a significant difference.

With lazy loading on metal 4.4b4 official I'm getting 8.4s into menu on cold boot, which is much better than the 16s with default strategy.

With my custom build forcing vulkan 4.4b4 it's still only 3.5s on the cold boot, though. Custom build running metal takes 5.2s, so slightly better.

For these tests, I've been using the MoltenVK bundled with the iOS export template from 4.3, which shows as 1.2.283.

@stuartcarnie
Copy link
Contributor

@georgwacker those numbers are more in line and expected. MoltenVK has a little advantage here, as Metal has to convert all the SPIRV to MSL during the calls to shader_compile_binary_from_spirv, so that is still roughly 78 shaders according to your numbers, where as MoltenVK even delays this until it creates a pipeline. If you were to reboot your phone and try again (don't clear the Godot caches), you might find that Metal and MoltenVK are closer and Metal possibly even faster here.

As @kisg noted, the shader baker PR will resolve this problem.

@stuartcarnie
Copy link
Contributor

Further to @kisg's question about #102552, I am planning to leverage the Metal compiler tools, when available on Windows and macOS, so that baking shaders will not only generate the Metal source, but take it a step further and generate Metal libraries compiled to AIR, so Metal will have a significant advantage over MoltenVK here.

@stuartcarnie
Copy link
Contributor

I forgot about iOS' limit of 2 concurrent pipeline compiles. It really complicates the async compilation approach. We rely on being able to throw a bunch of stuff at the driver and then just use the results when ready.

We can still do that, as Metal supports continuations / callbacks for compilation, so we use the results when ready. I use that feature already in the driver for the non-lazy (immediate) shader compiler mode.

@migueldeicaza
Copy link
Contributor

I have tried this patch, and here are some findings in launching the Godot editor on iOS, using 4.4 with both cases.

  • 182 shaders compiled instead of 657 on startup.
  • Metal:
    • Cold start: 41 seconds
    • Warm start: 10 seconds
  • Molten:
    • Cold start: 32 seconds
    • Warm start: 24 seconds

Without the lazy patch, the cold start for Metal is about 80 seconds.

@stuartcarnie
Copy link
Contributor

I have tried this patch, and here are some findings in launching the Godot editor on iOS, using 4.4 with both cases.

@migueldeicaza that's great and in line with my expectations.

The difference in cold-start time is as I described here:

MoltenVK has a little advantage here, as Metal has to convert all the SPIRV to MSL during the calls to shader_compile_binary_from_spirv, so that is still roughly 78 shaders according to your numbers, where as MoltenVK even delays this until it creates a pipeline.

More specifically, both the Metal and Vulkan (MoltenVK on Apple platforms) implementations have to eventually take the SPIR-V shader and convert it to Metal Shader Language (MSL) code on Apple platforms that the Metal APIs can understand. Godot asks the RenderingDeviceDriver (RDD) (Metal or Vulkan) to do this when it calls shader_compile_binary_from_spirv if a cached version of the output of this function does not exist. This naturally happens on first start.

For the Metal RDD, we do the following steps when this happens:

  • parse the SPIR-V
  • analyse and reflect the parsed SPIR-V
  • convert to MSL and compress the output so that it can be cached by Godot

The output of this function is saved by Godot in the .godot/shader_cache folder, so it isn't repeated on future starts.

For the Vulkan driver, SPIR-V is its native language, so it doesn't do much at all.

Godot executes shader_compile_binary_from_spirv for all shaders and their variants, which in your case 657. The Metal RDD is doing a lot of additional cold start work.

When Godot wants to use specific shaders, it creates pipelines for these. It is only then that MoltenVK does the same things as Metal:

  • parse the SPIR-V
  • analyse and reflect the parsed SPIR-V
  • convert to MSL and hand off to the Metal APIs

Which happens about 182 times for MoltenVK, so a significant reduction.

On to warm start times, because Godot caches the output of shader_compile_binary_from_spirv, the Metal RDD no longer performs the parsing, code generation, etc, so it is much faster. This will happen across reboots of the phone, as Godot saves this cached data. MoltenVK performs that work every time the pipeline is created, if it doesn't have a cached version in memory. I'm not sure if MoltenVK's own pipeline cache stores this output; however, from your warm start numbers, it would seem not.

The overall savings from Metal is pretty significant, as users typically only hit the cold start once as long as Godot keeps its own shader cache around

@clayjohn
Copy link
Member

Also worth noting that, once we merge the shader baking PR, your end users should never hit the cold path

@stuartcarnie
Copy link
Contributor

Also worth noting that, once we merge the shader baking PR, your end users should never hit the cold path

Can't wait for that one! By that point, the Metal backend will be way ahead of MoltenVK in cold and warm start.

I plan to support baking the shaders as MSL or AIR, if the metal tool chain is available, which is possible on Windows or macOS.

@migueldeicaza
Copy link
Contributor

I am using a poor man's baker now, and that has helped tremendously - before that the startup was getting too slow.

One thing that I am wondering is whether there would be value in not creating a library per function as Godot does know, and bundle a bunch of functions in one go. I ended up with some 2,000 shaders to avoid the slow startup on the museum sample (it was always slow, but looking at the suggestion above made me think 'what if it doesn't have to be?)

This is the patch I am using, which I dont think would work for the general purpose baker, but should help folks in the short term:

https://gist.github.com/migueldeicaza/3fc307a04eba45197a66c2f86b86273f

I will experiment if using precompiled shaders and lazy mode would help a bit, my patch above just circumvents the lazy mode altogether.

@stuartcarnie
Copy link
Contributor

@migueldeicaza are you saving the dumped files and then manually generating the .metallib files?

@migueldeicaza
Copy link
Contributor

migueldeicaza commented Mar 2, 2025

Correct, I just save them to the filesystem (the patch above uses an environment variable to say "Please dump the files", then I manually use:

# Compile the source to Metal
for x in *.metal; do
    n=`basename $x .metal`
    xcrun -l -sdk iphoneos metal -c $x -o ${n}.ir >& ${n}.log
done

# Compile the metal to a metallibrary
for x in *.ir; do
    n=`basename $x .ir`
    xcrun -l -sdk iphoneos metallib $x -o ${n}.metallib >& ${n}.logar
done

And copy those to the iOS bundle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Very Bad
Development

Successfully merging a pull request may close this issue.

9 participants