-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Crash if the connection is no longer available once an offline downloaded has started #6210
Comments
@zugaldia The error originates in
The underlying issue seems to be that too many local references are being held (there is a hard limit. 16 min as defined by the jni standard, 512 on my nexus 5x). Maybe the objects and thus the local references are not being cleared properly since the changes to detect off-line situations. I see the following in the log that worries me:
The attempt to download is never canceled, so the requests keep coming. So there seem to be two problems:
Let me know if I can do more. |
@zugaldia I was reading through some related code and it seems that there is also a more pro-active way to set the connectivity state. On ios I see the following
This seems to re-set connectivty through mbgl::NetworkStatus. If that's the general mechanism to set connectivity state, than offline downloads should also adhere to this right? cc @1ec5 |
Yes, it is a general connectivity state API. See also #4234. It sounds like we should move this logic to MGLOfflineStorage so that it runs even if no map view has been initialized by the time the user comes online or starts an offline download. I don't know offhand whether OfflineRegion respects NetworkStatus, but I would expect it to. |
I added a connectivity listener to set the connectivity state on core. Had a slight overflow issue which made the requests continuously retry instead of waiting for a connection. However, requests are nicely parked now until connectivity is restored. Offline still crashes though, with what still seems like memory issues:
|
"Solved" the memory issues by making the local references in HttpRequest and associated failure handlers short-lived. Still need a cleaner solution for local object references that work with both the low and high level jni types though: local_object.hpp A couple of previously hidden issues now pop up as well. Going offline works, coming back online crashes on one of the following:
or
or
|
I have the same problem, what would be the most feasible solution, thanks
|
@ivovandongen So why would a [oops, mean to post to #6293] |
@jfirebaugh I was hoping you could tell me :) It seems that PopLocalFrame doesn't clear references immediately, but I can't say when exactly. DeleteLocalRef does. In the case of (failing) HttpRequests it seems that no references are cleared until at least the native constructor has returned, but maybe even after that. The failure creates even more references (going through the java constructor -> onFailure -> etc) and then it crashes. I tried the same approach on conversion for the queryRenderedFeature calls. And push/pop local frame did not work as advertised. Sadly. I verified the creation of http requests and there are never more than the set limit (20). |
ATM there are two remaining issues when switching back and forth between online/offline. URL/ETag gets corrupted Somehow the url/etag passed to the constructor of HTTPRequest get corrupted:
The url/etag look something like this at that time (one/both): �}^�, `��� Full dump:
Crash when stopping Timer More difficult to reproduce, only happened twice in the last couple of days. Will probably be easier to reproduced once we solve the other issues. The crash happens in Timer#stop
|
What thread does OkHttp run Callbacks on? It looks like the current implementation of Android I'm not quite sure how that would lead to crashes with the stack traces above, but it seems suspicious. |
@jfirebaugh Thanks for the tips
I thought that the callback ( Would a good next step be to check the assertions like you mentioned here? #5827 (comment) |
It does, but only the part that happens after the call to |
@jfirebaugh Thanks, yet again, for looking into this. I've looked at the threading in Going by your comments #5827 (comment) though, I found that requests indeed do get scheduled multiple times under some conditions as the pre-conditions are not checked at the proper time. I added runtime checks to |
Nice. Did you determine what the circumstances are where |
No, not quite. Somehow the request get scheduled twice after a online/offline switch. This happens irregardless of setting the NetworkState proactively (first commit), but was previously masked by other crashes. So either Little side-question; do you think it's worth having two collections keeping track of the pending requests? It might be more efficient than a std::set, but it also adds some complexity and possible errors. |
@ivovandongen I want to track down the root cause here. Can you keep digging?
Good question -- the reason to use two collection is that we need a combination of things that no one C++ collection type provides:
Some languages have insertion-ordered maps as a built-in collection (e.g. Ruby, JS); it's a handy datastructure. In C++ the most straightforward way to get the same behavior is to use two coordinated collections. There are alternatives such as boost multi-index but they tend to be very heavy-weight. |
@jfirebaugh Found the issue. Like I thought, the request may be scheduled multiple times. As it turns out, it may happen when connection is restored because I think the fix I proposed (commit) still makes sense as at the moment of connectivity restore / schedule there might be a timer already set to call Doing the check in I traced the methods with and without setting the connection state pro-actively (commit) With pro-active connection state:
With passive connection state:
|
@jfirebaugh Does this look like 👍 to merge #6293? I'd love to have this for |
@ivovandongen Thanks so much for tracking it down! To summarize, a request will be improperly rescheduled in the event that I think we want to make the fix at a higher level than
In this scenario, we end up with more frequent requests than desired. The bug here is really in step 2: a pending request should not be rescheduled. Let's adjust the guard condition so that this doesn't happen. |
@jfirebaugh Thanks again! Didn't consider that scenario yet. I've changed the guard in Also, I added a small check on |
@jfirebaugh Now that I think of it some more. There is one situation we don't cover with the guards in
Now, the guard will fail as the request is not scheduled nor active. So another timer is started, resulting in two subsequent calls to |
@jfirebaugh To prevent both the situation you described and the one just above, I've added an additional guard in the timer completion callback. If you could review the changes in #6293, that would be great. |
Two things:
|
PR looks good! Thanks for getting to the bottom of this. I'm confident this change will fix #5827 as well. |
Platform: Android 6.0.1 (including Wear)
Mapbox SDK version: 4.2.0-beta.2 and
master
Steps to trigger behavior
Expected behavior
Download pauses, the map remains responsive.
Actual behavior
The app crashes with the following stacktrace:
@ivovandongen I'd appreciate if you could dig up the actual point where the crash is produced like you did in #6123 (comment), the generic
terminating with uncaught exception of type jni::PendingJavaException
doesn't say much./cc: @cammace
The text was updated successfully, but these errors were encountered: