Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android ANR caused by graphics conflict with Chromium? #3549

Open
ajwfrost opened this issue Oct 28, 2024 · 13 comments
Open

Android ANR caused by graphics conflict with Chromium? #3549

ajwfrost opened this issue Oct 28, 2024 · 13 comments

Comments

@ajwfrost
Copy link
Collaborator

See https://github.com/distriqt/ANE-Adverts/issues/589#issuecomment-2442319673

There's an ANR showing up in the log (attached). Initial thoughts are below:

libc.so.__futex_wait_ex.log


The ANR looks like it's GPU-related:

"main" (tid 1) = Android UI:

  #03  pc 0x0000000000119c13  /system/lib/libhwui.so (android::uirenderer::renderthread::DrawFrameTask::drawFrame+254)
  at android.graphics.HardwareRenderer.nSyncAndDrawFrame (Native method)

"Thread 4" (tid 25) = AIR runtime:

  #11  pc 0x0000000000016233  /system/lib/libEGL.so (android::eglMakeCurrentImpl+330)
  #12  pc 0x0000000000093495  /system/lib/libandroid_runtime.so (android::jni_eglMakeCurrent+160)
  at com.google.android.gles_jni.EGLImpl.eglMakeCurrent (Native method)
  at com.adobe.air.FlashEGL10.MakeGLCurrent (FlashEGL10.java:684)
  at com.adobe.air.customHandler.callTimeoutFunction (Native method)

Which might mean there's a conflict between these two threads ...

Although threre's also a Chromium instance in tid 71 which appears to also be waiting:

  #05  pc 0x0000000000017121  /system/lib/libEGL.so (void* android::eglCreateImageTmpl<int, void* >+252)
  #06  pc 0x0000000000017019  /system/lib/libEGL.so (android::eglCreateImageKHRImpl+20)
  #07  pc 0x00000000035b9e81  /data/app/~~UNG6tJ4eOyhRL2v1opXwVQ==/com.google.android.trichromelibrary_666810030-RO1dahlTOceM5BJYt0TDMw==/base.apk (BuildId: 3663fd185c8564e6952860b602a348e3e6b01aa5)

plus I'm curious what triggered "JavaBridge" tid=73 Native - this is also in a function that perhaps is blocking.

And then I've just scrolled down further and found there is a "RenderThread" (tid 100) for the Chromium instance which then seems to be somehow calling back into the AIR runtime..?!

  #13  pc 0x0000000000edfb45  /data/app/~~UNG6tJ4eOyhRL2v1opXwVQ==/com.google.android.trichromelibrary_666810030-RO1dahlTOceM5BJYt0TDMw==/base.apk (BuildId: 3663fd185c8564e6952860b602a348e3e6b01aa5)
  #14  pc 0x00000000000025ed  /system/lib/libwebviewchromium_plat_support.so (android::::draw_gl+284)
  #15  pc 0x0000000000130cdd  /system/lib/libhwui.so (android::uirenderer::WebViewFunctor::drawGl+120)
  #16  pc 0x000000000010f135  /system/lib/libhwui.so (android::uirenderer::skiapipeline::GLFunctorDrawable::onDraw+1636)
  #17  pc 0x0000000000182227  /system/lib/libhwui.so (SkDrawable::draw+58)

@hadisn are you able to reproduce the ANR yourselves? Do you know if there's anything in particular that's causing the problem e.g. a particular screen when you then tap in a particular place? I think we'll need to look more into this one - what version of the AIR SDK were you using for this?

thanks

@hadisn
Copy link

hadisn commented Oct 29, 2024

Hi @ajwfrost I was not able to reproduce it and can't precise what causing the problem.
I can tell you that it happen in GPU and Direct rendermode (not sure about cpu).
I am using AIR 51.1.2.1 and distriqt Adverts ANE v15.3.0.
It looks like it happens only on Android 14 (I will try to test on android 14 more).

Here is logcat from Android 14 test device (Pixel 8 pro):
Logcat.txt

Problem described in google play console:
"The main thread is blocked, waiting for the rendering subsystem or the GPU to complete a requested operation. This is usually caused by the slowness of the rendering subsystem, the GPU, or its driver."

Also they point to this link: https://developer.android.com/topic/performance/anrs/find-unresponsive-thread#lock-contention

Regards

@jigtrap
Copy link

jigtrap commented Oct 29, 2024

Hi @ajwfrost , @hadisn

My findings are similar to what Hadisn is reporting

Reporting ANRs in my app

Rendermode=direct
RuntinmeInBackground=true
AIR SDK: 51.1.2.1
Adverts v15.3.0)

Findings:
Filtering all Android versions but Android 14, flutex_wait ANR appears with affected sessions % : 1.4%
Filtering only Android 14, flutex_wait ANR appears with affected sessions % : 29.4%

As we can see there is big difference

Hope it can be solved soon .

Thanks in advance
ALdo

@ajwfrost
Copy link
Collaborator Author

ajwfrost commented Nov 6, 2024

Hi

Quick update here is that we looked again at the stack dumps and where the different threads are, and the issue does seem to be related to the GPU and its usage across different theads. The main thread is trying to render; meanwhile the AIR thread is trying to do a 'make current', and the Chromium webview seems to have one thread trying to render and another thread waiting on something...

About the only thing we can control here is to merge the main/UI thread with the runtime thread i.e. remove that "runtimeInBackgroundThread" setting. Would it be possible to try that, and see if it impacts the ANR rate? If you're still getting ANRs with that, it would be good if we can get another dump file with the thread stacks to see whereabouts things are hanging. Given it seems to be related to the Android version (14) per the above, I'm wondering if there's a change in the Android WebView component that could be behind this..

thanks

Andrew

@bobrokrol
Copy link

Screenshot_20241107_160415_Chrome

So I also have a spike in this ANRs mostly, for MT6855 and MT6765

stacktrace.log.txt
Im using StageWebView in the app.

I have blocked certain devices from google play to avoid breaking bad thresold.

I dont like disbling
RuntinmeInBackground=true
as this parameter works more stable than before.
previously it led to a lot of crashes and ANRs.
Right now ot works fine unless this bunch of ANRs on certain chips
there is a huge difference in "Excessive slow frames" with disabled / enabled option:
14% vs 3% ( that is close to peer median)

@hadisn
Copy link

hadisn commented Nov 8, 2024

Hi @ajwfrost, thank you for update. I agree with @bobrokrol RuntinmeInBackground=true solved a lot of problems except these few and I believe that this will be also fixed.

@hadisn
Copy link

hadisn commented Nov 11, 2024

Hi

Quick update here is that we looked again at the stack dumps and where the different threads are, and the issue does seem to be related to the GPU and its usage across different theads. The main thread is trying to render; meanwhile the AIR thread is trying to do a 'make current', and the Chromium webview seems to have one thread trying to render and another thread waiting on something...

About the only thing we can control here is to merge the main/UI thread with the runtime thread i.e. remove that "runtimeInBackgroundThread" setting. Would it be possible to try that, and see if it impacts the ANR rate? If you're still getting ANRs with that, it would be good if we can get another dump file with the thread stacks to see whereabouts things are hanging. Given it seems to be related to the Android version (14) per the above, I'm wondering if there's a change in the Android WebView component that could be behind this..

thanks

Andrew

Hi @ajwfrost, I see that AIR 51.1.2.2 is released, can you just tell us should we enable or disable runtimeInBackgroundThread with latest SDK version. In release notes I can see that you maybe solved problem releated to this but in your comment here you suggesting to remove runtimeInBackgroundThread so I am not sure what to do :)

Thank you

@ajwfrost
Copy link
Collaborator Author

Hi

The updates in 51.1.2.2 were around some of the other API calls that seemed to result in crashes when using the background thread model - i.e. an actual crash due to state error, rather than just a hang / ANR like you're seeing here.

So I don't expect this version to change anything regarding the conflict we have here with Chromium. My hope had been that it would be possible to see whether switching back to using the UI thread for AIR would then show whether we have a fundamental problem with Chromium interactions, or whether that was just a side-effect of having the extra thread. But if it causes increased ANRs in other areas without the background mode, then it might be tricky (or counter-productive) to check this.

So currently, we're at the same position: we seem to have an odd conflict when using Chromium (but only on certain chipsets?) and we don't know whether or not it's related to the background runtime mode.

thanks

@hadisn
Copy link

hadisn commented Nov 11, 2024

Hi

The updates in 51.1.2.2 were around some of the other API calls that seemed to result in crashes when using the background thread model - i.e. an actual crash due to state error, rather than just a hang / ANR like you're seeing here.

So I don't expect this version to change anything regarding the conflict we have here with Chromium. My hope had been that it would be possible to see whether switching back to using the UI thread for AIR would then show whether we have a fundamental problem with Chromium interactions, or whether that was just a side-effect of having the extra thread. But if it causes increased ANRs in other areas without the background mode, then it might be tricky (or counter-productive) to check this.

So currently, we're at the same position: we seem to have an odd conflict when using Chromium (but only on certain chipsets?) and we don't know whether or not it's related to the background runtime mode.

thanks

I will upload version with runtimeInBackgroundThread disabled on google play and let you know what will happen.

@hadisn
Copy link

hadisn commented Nov 13, 2024

Hi @ajwfrost, two days ago I uploaded version with runtimeInBackgroundThread disabled and already see a lot of anrs on Android 14:

"The main thread is blocked, waiting on a native synchronization routine, such as a mutex."

com.google.android.gles_jni.EGLImpl.eglMakeCurrent.log

@ajwfrost
Copy link
Collaborator Author

Okay thanks -- and it still looks like we have the same problem:

  • Application main thread (tid 1) - i.e. AIR - is stuck at com.google.android.gles_jni.EGLImpl.eglMakeCurrent
  • Chromium GPU thread (tid 70) is stuck at android::eglCreateImageKHRImpl
  • Chromium Render thread (tid 44) is stuck at android::::draw_gl

Some interesting information about how Chromium works on Android:
https://docs.google.com/document/d/1MLPEmMugdVvfeMeQQN_NMolqs4zZekfKjZeNAQJJnMo/edit?usp=sharing
So it sounds like they will always have a single GPU thread, and a separate Render thread, but we appear to be having two separate EGL contexts - one from AIR and one from Chromium WebView - both within the same activity.

This might be the issue: from what I'm reading, it might be that Android needs only a single EGL context for a window/surface. It is (I think) possible to have multiple EGL contexts by having multiple Activities.

So just to check on the use case here:

  • Do you definitely need direct (or gpu) rendering mode? As I would think switching to "cpu" mode should solve this?
  • Do you display the web-based content as just a small part of the UI, or is this a case of a 'full screen' display of the webview component? As I'm wondering if there are better ways to then display the WebView i.e. could we actually push this into a separate activity? Do you need some information back from it, or other such integration between the HTML/JS world and the ActionScript stuff?

thanks

@hadisn
Copy link

hadisn commented Nov 13, 2024

  • I think i tried before with cpu rendering mode and app is laggy, never used cpu rendering mode in production...
  • I do not display and have no need for any web-based content but I think Admob do that...
  • I do not communicate with html, js or something like that...

@Mintonist
Copy link

Mintonist commented Nov 14, 2024

I think different apps may have different cases. I think we can't stop using direct/gpu mode. So about a separate activity for webview - can it be a manifest flag or code param? So everyone may choose and know limitations. By the way what limitations for us with separate webview activity?)

@ajwfrost
Copy link
Collaborator Author

ajwfrost commented Dec 3, 2024

Judging by the use cases here, it doesn't sound like we can solve this by pushing the use of a WebView into a separate activity. Particularly because there are third party components and other considerations - especially around advertising - that may still need to stick within this AIR-based activity.

So instead we are wondering whether the whole Android rendering mechanism may need to change, so that we use this differently. Currently, we have a SurfaceView and get the holder in order to manually create an EGL surface object. That all works fine .. but it's not what Android recommend now when setting up a new application, instead they have a GLSurfaceView component that specifies a separate rendering thread for the GPU activities - which is I believe the thread that we're seeing here from the Chromium library, with the GLES/EGL conflicts.

All Android devices since 4.0.3 support OpenGL ES 2.0 so there's no reason why we can't actually drop the "cpu" render mode support on Android and switch that over to "direct" mode i.e. rendering of normal display list remains using the CPU/SIMD operations, but composition happens using OpenGL ES.

I think that this may help in this scenario, as well as possibly improving our "runtime in background thread" concept (because the implication would be that the AIR runtime would need to sit in this rendering thread too).

The downside - this could be quite a bit of work with a high risk of regression. We'll have a bit of a discussion internally about it, I suspect the benefits will outweigh the challenges though...

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants