Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android editor crash when interacting with some GUI elements #94741

Open
matheusmdx opened this issue Jul 25, 2024 · 27 comments · May be fixed by #92611
Open

Android editor crash when interacting with some GUI elements #94741

matheusmdx opened this issue Jul 25, 2024 · 27 comments · May be fixed by #92611

Comments

@matheusmdx
Copy link
Contributor

Tested versions

Reproducible: Latest master v4.3.rc.custom_build.e343dbbcc, 4.3 betas 1, 2 and 3.

System information

Godot v4.3.rc (e343dbb) - Android - Vulkan (Mobile) - integrated Adreno (TM) 506 - (8 Threads)

Issue description

Android editor crashes with a signal 11 after simple intereactions, only happens in vulkan mobile render, compatibility render works normally:

Example 1
unknown_2024.07.25-10.49.mp4
Example 2
unknown_2024.07.25-10.51.mp4
Example 3
unknown_2024.07.25-10.53.mp4
Backtrace
07-25 10:55:47.602  9029  9066 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 9066 (VkThread), pid 9029 (e.editor.v4.dev)
07-25 10:55:48.067  9124  9124 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-25 10:55:48.067  9124  9124 F DEBUG   : Build fingerprint: 'xiaomi/onc/onc:10/QKQ1.191008.001/V11.0.2.0.QFLMIXM:user/release-keys'
07-25 10:55:48.067  9124  9124 F DEBUG   : Revision: '0'
07-25 10:55:48.067  9124  9124 F DEBUG   : ABI: 'arm64'
07-25 10:55:48.084  9124  9124 F DEBUG   : Timestamp: 2024-07-25 10:55:48-0300
07-25 10:55:48.084  9124  9124 F DEBUG   : pid: 9029, tid: 9066, name: VkThread  >>> org.godotengine.editor.v4.dev <<<
07-25 10:55:48.084  9124  9124 F DEBUG   : uid: 10420
07-25 10:55:48.085  9124  9124 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
07-25 10:55:48.085  9124  9124 F DEBUG   : Cause: null pointer dereference
07-25 10:55:48.085  9124  9124 F DEBUG   :     x0  0000000000000000  x1  0000000000000003  x2  0000007bc22b8800  x3  0000007ba0f10320
07-25 10:55:48.085  9124  9124 F DEBUG   :     x4  0000007bc22f1c00  x5  0000000000000000  x6  0000000000000000  x7  0000000000000001
07-25 10:55:48.085  9124  9124 F DEBUG   :     x8  0000000000000002  x9  0000000000000001  x10 0000000000000001  x11 0000000000000004
07-25 10:55:48.085  9124  9124 F DEBUG   :     x12 0000007bc3125370  x13 0000000000000000  x14 0000000000000000  x15 0000000000000002
07-25 10:55:48.085  9124  9124 F DEBUG   :     x16 0000007bc2c61ef8  x17 0000007bc312b000  x18 0000007bc838a000  x19 0000007bc312b000
07-25 10:55:48.085  9124  9124 F DEBUG   :     x20 0000007bc22b8800  x21 0000007bdbe35710  x22 0000000000000000  x23 0000007bdbe35410
07-25 10:55:48.085  9124  9124 F DEBUG   :     x24 0000000000000001  x25 0000007bb7b50e00  x26 0000000000000028  x27 0000007bc304c020
07-25 10:55:48.085  9124  9124 F DEBUG   :     x28 0000000000000000  x29 0000007bdbe38570
07-25 10:55:48.085  9124  9124 F DEBUG   :     sp  0000007bdbe34af0  lr  0000007bdbe35410  pc  0000007bc2951004
07-25 10:55:48.463  9124  9124 F DEBUG   :
07-25 10:55:48.463  9124  9124 F DEBUG   : backtrace:
07-25 10:55:48.464  9124  9124 F DEBUG   :       #00 pc 00000000000f5004  /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #01 pc 00000000000a0468  /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #02 pc 0000000003d8bdfc  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #03 pc 0000000006b29d64  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #04 pc 0000000006b2b088  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #05 pc 0000000006b32058  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #06 pc 0000000006a79120  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #07 pc 0000000006a78f34  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #08 pc 0000000006c767b0  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #09 pc 0000000006b34c0c  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #10 pc 0000000006b36ef8  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #11 pc 0000000002dde450  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #12 pc 0000000002d7bb0c  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #13 pc 0000000002d9d55c  /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so
07-25 10:55:48.464  9124  9124 F DEBUG   :       #14 pc 0000000000140350  /apex/com.android.runtime/lib64/libart.so (art_quick_generic_jni_trampoline+144) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #15 pc 00000000020399dc  /memfd:/jit-cache (deleted) (org.godotengine.godot.vulkan.VkRenderer.onVkDrawFrame+44)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #16 pc 00000000020385f4  /memfd:/jit-cache (deleted) (org.godotengine.godot.vulkan.VkThread.run+948)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #17 pc 000000000013763c  /apex/com.android.runtime/lib64/libart.so (art_quick_osr_stub+60) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #18 pc 00000000003380f8  /apex/com.android.runtime/lib64/libart.so (art::jit::Jit::MaybeDoOnStackReplacement(art::Thread*, art::ArtMethod*, unsigned int, int, art::JValue*)+1688) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #19 pc 00000000005ac56c  /apex/com.android.runtime/lib64/libart.so (MterpMaybeDoOnStackReplacement+212) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #20 pc 0000000000136350  /apex/com.android.runtime/lib64/libart.so (MterpHelpers+240) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #21 pc 0000000000029f46  [anon:dalvik-classes2.dex extracted in memory from /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/base.apk!classes2.dex] (org.godotengine.godot.vulkan.VkThread.run+282)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #22 pc 00000000002b4b14  /apex/com.android.runtime/lib64/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEbb.llvm.16703252159117058578+240) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #23 pc 0000000000592d18  /apex/com.android.runtime/lib64/libart.so (artQuickToInterpreterBridge+1032) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #24 pc 0000000000140468  /apex/com.android.runtime/lib64/libart.so (art_quick_to_interpreter_bridge+88) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #25 pc 0000000000137334  /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #26 pc 0000000000145fec  /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+244) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #27 pc 00000000004b171c  /apex/com.android.runtime/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #28 pc 00000000004b2830  /apex/com.android.runtime/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue const*)+416) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #29 pc 00000000004f31ec  /apex/com.android.runtime/lib64/libart.so (art::Thread::CreateCallback(void*)+1176) (BuildId: 44dbc6a587cb484a8b272d1608feb17c)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #30 pc 00000000000e6a00  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36) (BuildId: f58dc2e5c0832afee4aa38168e971c9d)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #31 pc 0000000000084c6c  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: f58dc2e5c0832afee4aa38168e971c9d)

I bisected the issue to #84976:

Captura 2024-07-23 15-14-04-774883


Another detail is the editor don't crash if you create a Node2D or a Control node:

unknown_2024.07.25-11.13_1.mp4

Investigating futher i found that if you do any action that create the editor_layout.cfg file (creating a node and saving the scene or creating a node and reloading current project from Project > Reload current project) makes the project don't crash anymore, but if i delete editor_layout.cfg again or create the node but don't do any of the two actions i said above and kill the editor, the crashes come back.

Steps to reproduce

See the videos above

Minimal reproduction project (MRP)

No needed, any new project can reproduce

@akien-mga
Copy link
Member

I'm tentatively putting this in the 4.3 milestone / blockers that we track as it's a regression from a 4.3 PR. But I don't consider it high priority at this stage so close to the release, as it's a crash in the Adreno GPU driver and thus probably a driver bug. Figuring out a workaround might take a while as it might be difficult to reproduce.

Still, CC @DarioSamo who wrote the ARG PR, and @Alex2782 @m4gr3d @clayjohn who are used to debugging Android driver bugs.

@DarioSamo
Copy link
Contributor

DarioSamo commented Jul 25, 2024

We have a list of devices that were affected by a similar regression. Perhaps this device just needs to be added to the detection here.

I say try forcing this particular workaround to true and see if it fixes the issue for you first.

void RenderingContextDriverVulkan::_check_driver_workarounds(const VkPhysicalDeviceProperties &p_device_properties, Device &r_device) {
// Workaround for the Adreno 6XX family of devices.
//
// There's a known issue with the Vulkan driver in this family of devices where it'll crash if a dynamic state for drawing is
// used in a command buffer before a dispatch call is issued. As both dynamic scissor and viewport are basic requirements for
// the engine to not bake this state into the PSO, the only known way to fix this issue is to reset the command buffer entirely.
//
// As the render graph has no built in limitations of whether it'll issue compute work before anything needs to draw on the
// frame, and there's no guarantee that compute work will never be dependent on rasterization in the future, this workaround
// will end recording on the current command buffer any time a compute list is encountered after a draw list was executed.
// A new command buffer will be created afterwards and the appropriate synchronization primitives will be inserted.
//
// Executing this workaround has the added cost of synchronization between all the command buffers that are created as well as
// all the individual submissions. This performance hit is accepted for the sake of being able to support these devices without
// limiting the design of the renderer.
//
// This bug was fixed in driver version 512.503.0, so we only enabled it on devices older than this.
//
r_device.workarounds.avoid_compute_after_draw =
r_device.vendor == VENDOR_QUALCOMM &&
p_device_properties.deviceID >= 0x6000000 && // Adreno 6xx
p_device_properties.driverVersion < VK_MAKE_VERSION(512, 503, 0) &&
r_device.name.find("Turnip") < 0;
}

@Alex2782

This comment was marked as outdated.

@Alex2782
Copy link
Contributor

@matheusmdx To me it looks like an emulator, or is it something like a 'RemoteViewer' ?

from Logs: Build fingerprint: 'xiaomi/onc/onc:10 (Mi A2 Lite? also has an Adreno 506 on Firebase Test Lab)

image

@matheusmdx
Copy link
Contributor Author

We have a list of devices that were affected by a similar regression. Perhaps this device just needs to be added to the detection here.

I say try forcing this particular workaround to true and see if it fixes the issue for you first.

void RenderingContextDriverVulkan::_check_driver_workarounds(const VkPhysicalDeviceProperties &p_device_properties, Device &r_device) {
// Workaround for the Adreno 6XX family of devices.
//
// There's a known issue with the Vulkan driver in this family of devices where it'll crash if a dynamic state for drawing is
// used in a command buffer before a dispatch call is issued. As both dynamic scissor and viewport are basic requirements for
// the engine to not bake this state into the PSO, the only known way to fix this issue is to reset the command buffer entirely.
//
// As the render graph has no built in limitations of whether it'll issue compute work before anything needs to draw on the
// frame, and there's no guarantee that compute work will never be dependent on rasterization in the future, this workaround
// will end recording on the current command buffer any time a compute list is encountered after a draw list was executed.
// A new command buffer will be created afterwards and the appropriate synchronization primitives will be inserted.
//
// Executing this workaround has the added cost of synchronization between all the command buffers that are created as well as
// all the individual submissions. This performance hit is accepted for the sake of being able to support these devices without
// limiting the design of the renderer.
//
// This bug was fixed in driver version 512.503.0, so we only enabled it on devices older than this.
//
r_device.workarounds.avoid_compute_after_draw =
r_device.vendor == VENDOR_QUALCOMM &&
p_device_properties.deviceID >= 0x6000000 && // Adreno 6xx
p_device_properties.driverVersion < VK_MAKE_VERSION(512, 503, 0) &&
r_device.name.find("Turnip") < 0;
}

I changed for a hardcoded true but the crash persists



i have a PR for Adreno 5xx uniform crash #92611

	//TODO: check 'driverVersion'?
	// no crash on Fujitsu F-01L - Adreno (TM) 506, Vulkan 1.0.61, driverVersion = 54185879
	r_device.workarounds.force_material_uniform_set =
			r_device.vendor == VENDOR_QUALCOMM &&
			p_device_properties.deviceID >= 0x5000000 && // Adreno 5xx
			p_device_properties.deviceID <= 0x5999999;

It took at least 3 weeks to isolate this crash via Firebase Test Lab and print_line. I don't own Adreno 5xx yet to debug it properly via Android Studio.

I cherry-picked this pr but unfortunately not worked too



@matheusmdx To me it looks like an emulator, or is it something like a 'RemoteViewer' ?

This is a remote view, scrcpy: https://github.com/Genymobile/scrcpy. I used that to make the testing more faster, but i also tested without using this to make sure that didn't had any interference in the tests.

I can help test any possible solution, just gimme the instrusctions/apk. I also can try get a better backtrace if exists any way to get a more complete one,

@akien-mga
Copy link
Member

You can get a backtrace with debug symbols by passing debug_symbols=yes to SCons (or dev_build=yes, not sure if it expects dev stuff) and building the apk with ./gradlew generateDevTemplate in platform/android/java.

Btw @m4gr3d we really need to document this on https://docs.godotengine.org/en/latest/contributing/development/compiling/compiling_for_android.html
I see the page was updated to instruct using the new generate_apk=yes SCons option, but I suppose this only handles the stripped case and not dev templates?

We should probably still document the various Gradle task for advanced users.

@DarioSamo
Copy link
Contributor

I changed for a hardcoded true but the crash persists

Seems we're pretty much in the situation of having to debug and find yet another workaround for a particular Adreno device because the Render Graph changed the order of operations then. Unfortunately the regression will point you to that particular commit, but it's just a situation of being lucky enough to not run into the driver bug before and now we are.

You can try messing around with these two macros and see if you get any different results, as these will make the render graph basically regress into the behavior of the previous version:

// When true, the command graph will attempt to reorder the rendering commands submitted by the user based on the dependencies detected from
// the commands automatically. This should improve rendering performance in most scenarios at the cost of some extra CPU overhead.
//
// This behavior can be disabled if it's suspected that the graph is not detecting dependencies correctly and more control over the order of
// the commands is desired (e.g. debugging).
#define RENDER_GRAPH_REORDER 1
// Synchronization barriers are issued between the graph's levels only with the necessary amount of detail to achieve the correct result. If
// it's suspected that the graph is not doing this correctly, full barriers can be issued instead that will block all types of operations
// between the synchronization levels. This setting will have a very negative impact on performance when enabled, so it's only intended for
// debugging purposes.
#define RENDER_GRAPH_FULL_BARRIERS 0

@Alex2782

This comment was marked as outdated.

@matheusmdx
Copy link
Contributor Author

matheusmdx commented Jul 26, 2024

You can get a backtrace with debug symbols by passing debug_symbols=yes to SCons (or dev_build=yes, not sure if it expects dev stuff) and building the apk with ./gradlew generateDevTemplate in platform/android/java.

I tried this but ./gradlew generateDevTemplate just skip all the tasks and don't generate anything. The backtrace i put in this issue was generated using a apk builded with dev_build=yes and ./gradlew generateGodotEditor, i also tried using debug_symbols=yes instead but doesn't changed the backtrace.



You can try messing around with these two macros and see if you get any different results, as these will make the render graph basically regress into the behavior of the previous version:

// When true, the command graph will attempt to reorder the rendering commands submitted by the user based on the dependencies detected from
// the commands automatically. This should improve rendering performance in most scenarios at the cost of some extra CPU overhead.
//
// This behavior can be disabled if it's suspected that the graph is not detecting dependencies correctly and more control over the order of
// the commands is desired (e.g. debugging).
#define RENDER_GRAPH_REORDER 1
// Synchronization barriers are issued between the graph's levels only with the necessary amount of detail to achieve the correct result. If
// it's suspected that the graph is not doing this correctly, full barriers can be issued instead that will block all types of operations
// between the synchronization levels. This setting will have a very negative impact on performance when enabled, so it's only intended for
// debugging purposes.
#define RENDER_GRAPH_FULL_BARRIERS 0

Changing RENDER_GRAPH_REORDER to 0 stops the crash, changing RENDER_GRAPH_FULL_BARRIERS to 1 doesn't changed anything.



However, you could also set PRINT_RENDER_GRAPH to 1, in which case the function _print_render_commands is used.

Here what was printed, i just opened the editor and when editor loaded i triggered the crash: print render graph.txt

@Alex2782
Copy link
Contributor

Thanks for testing!

Changing RENDER_GRAPH_REORDER to 0 stops the crash

I will revise PR #92611.
Firebase Test Lab has up to 7 Adreno 5xx devices, on one device I could not reproduce the uniform crash.

	//TODO: check 'driverVersion'?
	// no crash on Fujitsu F-01L - Adreno (TM) 506, Vulkan 1.0.61, driverVersion = 54185879
	r_device.workarounds.force_material_uniform_set =
			r_device.vendor == VENDOR_QUALCOMM &&
			p_device_properties.deviceID >= 0x5000000 && // Adreno 5xx
			p_device_properties.deviceID <= 0x5999999;

I'm not sure exactly which driver versions they are.

			p_device_properties.deviceID >= 0x6000000 && // Adreno 6xx
			p_device_properties.driverVersion < VK_MAKE_VERSION(512, 503, 0) 

@DarioSamo
Copy link
Contributor

DarioSamo commented Jul 26, 2024

Changing RENDER_GRAPH_REORDER to 0 stops the crash, changing RENDER_GRAPH_FULL_BARRIERS to 1 doesn't changed anything.

Pretty much fits the exact same situation of the other Adreno crash that the workaround was introduced for, where basically Godot was lucky enough to not crash on this particular hardware, but reordering the operations triggers the error in the driver.

The problem is not reordering the graph in this particular hardware would just be hiding the issue, because as soon as the renderer changes its behavior, it could reintroduce the error again. We're much better off investigating what exact sequence of events makes the driver crash here so the render graph can insert workarounds as needed, which would guarantee the renderer never breaks on this hardware in the future.

Without this particular hardware however, we're left pretty much guessing at this point. I'm afraid you'll have to dig deeper into it, probably by simplifying the project as much as possible and looking at the output of the render graph, and potentially modifying what looks like could be the problem. When dealing with a driver bug, we don't really have much left to review on our side as it's basically dealing with a black box where some behavior that is known to be correct just doesn't work.

One possible hint I'll give is that the previous crash was related to the relation between compute and drawing, and the old version was guaranteed to dispatch compute first before doing any drawing on the frame. Reordering can cause drawing to happen before compute, and that's what triggered the crash. You said you verified the workaround didn't fix it for you, but so far it's sounding like the exact same issue. I think it's probably worth double checking.

@Alex2782
Copy link
Contributor

Alex2782 commented Jul 27, 2024

I'm afraid you'll have to dig deeper into it, probably by simplifying the project as much as possible and looking at the output of the render graph,

#79760 #82602 #85097 #86037

Maybe same issues. There are MRP to reproduce it outside the editor.
(my test project: ShaderTest.zip)

@matheusmdx: If you have time, please try it out. RENDER_GRAPH_REORDER = 0 -> no crash?


4.x Release Blockers and Status: Bad
My suggestion would be to simply apply it to all Adreno 5xx, slightly (?) worse performance is still better than crashes.

Some stats / PlayStore Device Catalog

PlayStore Device Catalog contains 1150 Adreno 5xx devices out of a total of 17339 (this is a share of approx. 6.63%)

Android API Count
Level 33 (2y old, Android 13) 2
Level 32 (2y old, Android 12L) 1
Level 31 (3y old, Android 12) 7
Level 30 (4y old, Android 11) 75
Level 29 (5y old, Android 10) 288
Level 28 (6y old, Android 9) 275
Level 27 + 26 (7y old, Android 8) 427
Level 25 + 24 (8y old, Android 7) 191

The older the Android devices are the more unusable they become, older Android versions also have older drivers and Vulkan API (1.0.x on Android 8 and 7). I think only from Android 9 devices Vulkan API 1.1.x is it worthwhile to invest more effort to fix exotic bugs. Up to 650 devices with Adreno 5xx GPU (approx. 3.75%).

In Playstore, the installation figures would still have to be taken into account. For example, over 70% of our PlayStore customers already use Android 13+ devices, which is a "normal" app for ordering a cab. Android 9 is even at only 2%!

275 / 17339 * 100 = 1.59% * 2% * 100 = 0.0318% (?) of our customers could still have an Adreno 5xx with Android 9.
😃

@Alex2782
Copy link
Contributor

@zhmt: Xiaomi Redmi Note 11 Pro 5G, Snapdragon 695, Adreno 619 ?

We have driver (vulkan.msm8953.so) crashes on old Adreno 5xx devices:

07-25 10:55:48.463  9124  9124 F DEBUG   : backtrace:
07-25 10:55:48.464  9124  9124 F DEBUG   :       #00 pc 00000000000f5004  /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #01 pc 00000000000a0468  /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)

According to your logs, it looks like a normal Godot issue libgodot_android.so. -> .NET / C# ?
If no one has reported it yet, it would probably be more helpful to create a 'New issue'

2024-07-27 17:09:38.241  8698-8698  DEBUG       A  backtrace:
2024-07-27 17:09:38.241  8698-8698  DEBUG       A        #00 pc 0000000000000000  <unknown>
2024-07-27 17:09:38.241  8698-8698  DEBUG       A        #01 pc 000000000143a5a4  /data/app/~~mUlsDVbQH3XRclE-_me3Hw==/com.example.aaa-NxX8oExdQNHj9Oz-VbgQgw==/lib/arm64/libgodot_android.so

@zhmt
Copy link

zhmt commented Jul 27, 2024

@Alex2782
I searched issues, I dont think anyone else has reported it. I opened a new issue.

@matheusmdx
Copy link
Contributor Author

@DarioSamo @Alex2782 I'll test the other mrp's and try find something. What i should look in the print render graph results? Like what result should be normal and what is a bug.


RENDER_GRAPH_REORDER = 0 -> no crash?

Yep, that stops the crash

@matheusmdx
Copy link
Contributor Author

Also @akien-mga any idea why the debug symbols doesn't work? Get a full backtrace would help a lot.

@Alex2782
Copy link
Contributor

What i should look in the print render graph results?

My recommendation is to render fewer frames:

func _ready():
	Engine.max_fps = 5
	print("======== READY ========")

https://github.com/Alex2782/godot/blob/debug_vulkan_shader/servers/rendering/rendering_device_graph.cpp#L642

TYPE_DRAW_INDEXED = vkCmdDrawIndexed

07-25 10:55:48.463  9124  9124 F DEBUG   : backtrace:
07-25 10:55:48.464  9124  9124 F DEBUG   :       #00 pc 00000000000f5004  /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #01 pc 00000000000a0468  /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)

Analyze what happens before the crash, but also what happens between several TYPE_DRAW_INDEXED and _run_draw_list_command executions. The Uniform (MaterialShader) crash can already be reproduced with 2 CanvasNodes:, I think it was like this:

  • Crash: Node1 with UniformShader, Node2 without
  • OK: Node1 without, Node2 with UniformShader
  • OK: Node1 with UniformShader, Node2 with UniformShader

#82602

image

@akien-mga
Copy link
Member

Also @akien-mga any idea why the debug symbols doesn't work? Get a full backtrace would help a lot.

Since you're building the editor, I think there's no pre-defined task for not-stripping it.

You could add the doNotStrip line from generateDevTemplate to

task generateGodotEditor {

Or copy it and make a generateDevEditor task for that.

CC @m4gr3d

@m4gr3d
Copy link
Contributor

m4gr3d commented Jul 28, 2024

@matheusmdx you can generate a dev build of the editor with no-stripping using the following command build:

./gradlew generateGodotEditor -PgenerateNativeLibs=true -PdoNotStrip=true

For reference:

  • -PgenerateNativeLibs=true generates all the shared libraries needed for the build. It's the gradle equivalent of the generate_apk=yes scons parameter. Note that it'll generate the dev, debug and release shared libraries so the build may take some time. If the shared libraries are already generated, you may omit this parameter.
  • -PdoNotStrip=true disables stripping

For debugging the Android editor, I'd recommend using Android Studio. There's support for setting breakpoints both in java and c++ allowing you to walk through the code line by line to identify the source of the crash.

@m4gr3d
Copy link
Contributor

m4gr3d commented Jul 28, 2024

You can get a backtrace with debug symbols by passing debug_symbols=yes to SCons (or dev_build=yes, not sure if it expects dev stuff) and building the apk with ./gradlew generateDevTemplate in platform/android/java.

Btw @m4gr3d we really need to document this on https://docs.godotengine.org/en/latest/contributing/development/compiling/compiling_for_android.html
I see the page was updated to instruct using the new generate_apk=yes SCons option, but I suppose this only handles the stripped case and not dev templates?

We should probably still document the various Gradle task for advanced users.

Good idea, I'll update the documentation with instructions for advanced users.

Note that we'll need to merge #92859 to address a regression with how stripping is set in the latest versions of gradle.

@matheusmdx
Copy link
Contributor Author

@m4gr3d I was able to build the apk with your instructions, now i just need some help how i do to debug using android studio, i tried use "attach debbuger to android process" but that didn't worked.


Also here the backtrace with a dev build:

07-28 13:58:51.800  7298  7342 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 7342 (VkThread), pid 7298 (e.editor.v4.dev)
07-28 13:58:52.240  8578  8578 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-28 13:58:52.240  8578  8578 F DEBUG   : Build fingerprint: 'xiaomi/onc/onc:10/QKQ1.191008.001/V11.0.2.0.QFLMIXM:user/release-keys'
07-28 13:58:52.240  8578  8578 F DEBUG   : Revision: '0'
07-28 13:58:52.240  8578  8578 F DEBUG   : ABI: 'arm64'
07-28 13:58:52.278  8578  8578 F DEBUG   : Timestamp: 2024-07-28 13:58:52-0300
07-28 13:58:52.278  8578  8578 F DEBUG   : pid: 7298, tid: 7342, name: VkThread  >>> org.godotengine.editor.v4.dev <<<
07-28 13:58:52.278  8578  8578 F DEBUG   : uid: 10420
07-28 13:58:52.278  8578  8578 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
07-28 13:58:52.278  8578  8578 F DEBUG   : Cause: null pointer dereference
07-28 13:58:52.278  8578  8578 F DEBUG   :     x0  0000000000000000  x1  0000000000000003  x2  0000006ffb2de000  x3  0000006fdb618220
07-28 13:58:52.278  8578  8578 F DEBUG   :     x4  0000006ffb2da400  x5  0000000000000000  x6  0000000000000000  x7  0000000000000001
07-28 13:58:52.278  8578  8578 F DEBUG   :     x8  0000000000000002  x9  0000000000000001  x10 0000000000000001  x11 0000000000000004
07-28 13:58:52.278  8578  8578 F DEBUG   :     x12 0000006ffd25c610  x13 0000000000000000  x14 0000000000000000  x15 0000000000000002
07-28 13:58:52.278  8578  8578 F DEBUG   :     x16 0000006ffbe8e6f8  x17 0000007018497800  x18 00000070024d8000  x19 0000007018497800
07-28 13:58:52.278  8578  8578 F DEBUG   :     x20 0000006ffb2de000  x21 00000070165f1640  x22 0000000000000000  x23 00000070165f1340
07-28 13:58:52.278  8578  8578 F DEBUG   :     x24 0000000000000001  x25 0000006fdb978d00  x26 0000000000000028  x27 0000006ffd183ce0
07-28 13:58:52.278  8578  8578 F DEBUG   :     x28 0000000000000000  x29 00000070165f44a0
07-28 13:58:52.278  8578  8578 F DEBUG   :     sp  00000070165f0a20  lr  00000070165f1340  pc  0000006ffc038004
07-28 13:58:52.433  8578  8578 F DEBUG   :
07-28 13:58:52.433  8578  8578 F DEBUG   : backtrace:
07-28 13:58:52.433  8578  8578 F DEBUG   :       #00 pc 00000000000f5004  /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-28 13:58:52.433  8578  8578 F DEBUG   :       #01 pc 00000000000a0468  /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-28 13:58:52.433  8578  8578 F DEBUG   :       #02 pc 00000000046c13e4  /data/app/org.godotengine.editor.v4.dev-L_4bEWJcXagmyz1K8N05Rw==/lib/arm64/libgodot_android.so (RenderingDeviceDriverVulkan::render_pass_create(VectorView<RenderingDeviceDriver::Attachment>, VectorView, <RenderingDeviceDriver::Subpass>, VectorView, <RenderingDeviceDriver::SubpassDependency>, unsigned int)+3552)
07-28 13:58:52.433  8578  8578 F DEBUG   :       #03 pc 000000000743a554  /data/app/org.godotengine.editor.v4.dev-L_4bEWJcXagmyz1K8N05Rw==/lib/arm64/libgodot_android.so (_ZN20RenderingDeviceGraph30_add_buffer_barrier_to_commandEN21RenderingDeviceDriver8BufferIDE8BitFieldINS0_17BarrierAccessBitsEES4_RiS5_+4)

@Alex2782
Copy link
Contributor

Alex2782 commented Jul 28, 2024

Already configured as in the description?
https://docs.godotengine.org/en/latest/contributing/development/configuring_an_ide/android_studio.html

I also had some crashes where the debugger did not work properly, no breakpoints were positioned. Android Studio / Editor - Dev Build consumes an incredible amount of memory. At least 16 GB necessary, better 32 GB.

Because the error occurs in the driver, debugging is less useful:
/vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed

@matheusmdx
Copy link
Contributor Author

#79760 #82602 #85097 #86037

Maybe same issues. There are MRP to reproduce it outside the editor. (my test project: ShaderTest.zip)

I tried the mrps from this issues but all of them crash for me on editor while loading, here the prints they generate before the crash

79760 CircleJumpGodot4.txt
82602 UniformTestProject.txt
85097 vk-uniform-a11.txt
86037 Spritesheet.CPU.Particles.txt
Shader Test.txt

New project + click on renderer switch.txt


I'll do more test this week changing some code to see what happens, also if anyone want to test something else feel free to tell me.

I was able to use the android studio so i can get better backtraces and check parameters if necessary:

Captura 2024-07-28 21-01-10-161068
Captura 2024-07-28 21-02-26-473925
Captura 2024-07-28 21-03-45-226407

@akien-mga akien-mga modified the milestones: 4.3, 4.4 Jul 31, 2024
@Alex2782
Copy link
Contributor

Alex2782 commented Jul 31, 2024

@matheusmdx thanks! please try #92611 again.

PR should be prepared, RENDER_GRAPH_REORDER = 0 if it is an Adreno 5xx device.

Outdated Some information should appear in the logs as to whether the workaround has been activated and which driver version.

Example on MacOS, which driverVersion is displayed on your Redmi 7?

======== Workarounds ========
avoid_compute_after_draw:  false
avoid_render_graph_reorder:  false
-----------------------------
name:  Apple M1
vendor:  4203
deviceID:  235209711
driverVersion:  0.0.2.2012

big impact on performance

I have not yet been able to confirm this with a 2D benchmark,
sometimes less, sometimes more +/- 3%, on MacOS M1 Dev. build:
#92611 (comment)


The crash happens in the graphics driver, which is like a black box if no sources have been released for it.
image


uniform ShaderTest.zip

I have tested on Firebase Test Lab, the 'uniform' shader is not fixed, with RENDER_GRAPH_REORDER = 0

@matheusmdx
Copy link
Contributor Author

matheusmdx commented Aug 5, 2024

@Alex2782 Sorry for the late reply, i was a bit busy last week. This pr doesn't stop the crash, i took a look with android studio and seems that render_graph_reorder still as true after RenderingDevice initialization:

image

image

@Alex2782
Copy link
Contributor

Alex2782 commented Aug 6, 2024

Thank you! I'll try to revise it in the next few days.


@matheusmdx: render_graph_reorder initialization should now be correct: compare

@matheusmdx
Copy link
Contributor Author

Now i didn't received a notification from your comment edit, but anyways @Alex2782 i can confirm now fixes the crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants