-
-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK locking #646
Comments
Going deeper with logs I think the culprit is this line. /**
* We want to use more threads than default in {@code bolts.Executors} since most of the time
* the threads will be asleep waiting for data.
*/ This is fine, but in the line I mentioned, after the request is executed in this (I can’t see why |
I amended that line, and everything works. It appears that when you call So it’s important that the child This also means that this big |
Great findings @natario1. Yeah, it really seems that having that As an aside, I think it would be great to have another module in this project for benchmarking. |
@natario1 Looks like there are some similar cases of this with the database thread executor, but there, we see the comment that they want to seemingly purposefully jump off of the DB executor service: Only thing I could think is that by jumping off and onto the other executor, it would free another resource for the db pool? But still, this also seems like it would lead to thrashing, as the many db threads would be fighting over jumping onto the lesser |
@Jawnnypoo more of a habit, since
So I agree with the need of more comprehensive tests. Meanwhile I think #648 is a urgent fix even if we don’t get why it happens. As far as I know the db executor should not be a problem, it is single threaded so jumping into |
@rogerhu can we leave this open for a while? The PR fixes e.g. my function I am still experiencing the bug in my app Looks that if LDS is enabled, |
Just some thoughts if anyone is interested. Many hours, big post :-) I don’t think this is going to be fixed easily in parse, and I also think that #648 should be reverted (though it does no big harm). The issue seems clear to me now and goes down to When the SDK hangsWhen
The
|
@natario1 Really awesome work digging into this. This is important stuff that really deserves time and attention. The huge downside to this problem in my opinion is the fact that the Parse SDK relies on bolts. Bolts itself seems to be a dead project, with no commits in over a year. I regretfully doubt that if you were to bring this to anyone's attention on the bolts project, even with a pull request, would even get looked at. I think the current potential immediate solution would be to open an issue on bolts, giving the description here. If it manages to get the fix, that is the best case scenario. If not, the second option is to pull in bolts as a local module for the Parse SDK and make the needed fixes. This would be somewhat problematic in that we would have to change the package name of the bolts framework so as to not collide with any projects that are currently using bolts alongside the Parse SDK. The solution I think would be best, but would result in quite a bit of work and rewrite, as well as a 2.0 release would be to drop bolts entirely in favor of a more tried and true and supported reactive framework, RxJava 2. I think most Android developer are much more likely to be familiar with RxJava than bolts, and even if not, there are tons of resources available to do so. Using RxJava, we would be able to leave the threading to the user, using whatever Scheduler they desire to do Parse operations. Instead of having to have both async and synchronous operations for everything, we would provide the appropriate Observable, Single, Completable, etc. to the user and they would make the decision. Any thoughts on these options, or other ideas? |
Let's see how Bolt can be fixed if possible and we can ping Facebook for a review. Moving to RxJava is obviously a lot of work and a steep learning curve to get right. |
I am going to open a issue there, let’s try. If you happen to know who to ping please ping him/her there. If this doesn’t work out, we might have to fork but that’s a breaking change? e.g. one might compile |
Not sure we want this, but there is actually room for some tweaking:
|
Any progress on this issue? Thanks! |
Opening an issue on Bolts doesn't seem to work. Are there any plans for this? Thanks! |
Hi @felipesouto we will eventually figure out how to get out of this, but it doesn't seem something we can fix in a few days at the moment. If this is critical for your application, you can
I spent a lot of time on this and think it's a critical issue. But it would be great if we could solve it through Bolts. In the long run ( If anyone has ideas or plans... |
I will throw in my advocation for RxJava again. It is the most highly adopted reactive framework for Java and would allow people to have more control over the SDK than Bolts does. There is some truth to the fact that it has a learning curve, but you could argue the same about Bolts if you are using it for not the simplest of cases. |
Any thoughts on this? I would love to move forward Rx'ifying Parse and removing Bolts if others are on board. |
If there is a move to make, i’d Rather not pick a side, bolts vs Rx. But have a flexible architecture that rx bindings are easy enough to implementz |
RxJava 2 right? In general there is a lot of underlying plumbing/tests you will have to rework.. |
Also keep in mind that many users have their app written around bolts, deprecating bolts binding without providing an alternative that enables them is a no go in the official repository. |
Any change in the async that requires deprecation is a no go if we don’t have a stable way forward. I understand people like Rx better than bolts, but 100% of the users of parse use bolts both in iOS and Android. |
I had some time to investigate an issue that looks serious.
I think the SDK hangs with a deadlock somewhere when doing too many operations concurrently, where too many depends on the device but is a pretty low number (even down to 3).
Working case
The concurrent policy is managed by
bolts
in this class. You can see thatCORE_POOL_SIZE
is the number of processors + 1.When you schedule too much operation and exceed the pool size, tasks are scheduled. You can see this in action calling this function:
This starts 50 operations concurrently. Each operation will just wait for 3 seconds. This works. It’s also nice to look at the logs because you get to see how the bolts executor works.
My device has 6 processors, so the core pool size is 7. I can see in logs that the first 7 tasks are started, then there’s a delay because the pool is full, then some tasks end and new tasks are added and executed. With a reasonable delay, all 50 tasks are executed.
Hanging case
If you replace my fake tasks with parse tasks, the SDK hangs completely and indefinitely after the pool is full. For example, here I get 10 objects through query, and then call
fetch()
on each of these at the same time.This is the output though:
Just 7. And nothing more, forever. No objects is fetched, not even those belonging to the first tasks. Other tasks are not scheduled. After this operation, the SDK is locked permanently.
It’s not uncommon to practically end up in this situation. E.g. displaying a list that fetches child objects onBind. And with a low end device / slow connection, this is much more frequent.
Can you reproduce this @rogerhu @Jawnnypoo @hermanliang ? I have no idea what’s happening and why.
The text was updated successfully, but these errors were encountered: