Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

[camera]Fix crash due to calling engine APIs from background thread #4608

Merged
merged 1 commit into from
Dec 20, 2021

Conversation

hellohuanlin
Copy link
Contributor

@hellohuanlin hellohuanlin commented Dec 10, 2021

The engine APIs used in camera plugin are required to be run on platform thread. Some of these APIs have explicit asserts (e.g. MethodChannel) and some have warning instructions in documentation (e.g. TextureRegistry). However, in camera plugin we are currently calling these APIs in background thread, causing the crash.

There's already a FLTThreadSafeFlutterResult wrapper to invoke FlutterResult in main thread. This PR created similar "thread safe wrappers" for EventChannel, MethodChannel and TextureRegistry.

This PR should fix the crash, but we should start looking into simplifying the threading logic in this plugin when we get time. I also wrote up some ideas here.

Note that there's a separate crash due to race condition, which is fixed here.

No version change:
I want to update the pubspec.yaml file in next PR so that we can batch the change.

Issues:

flutter/flutter#94723
flutter/flutter#52578

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the relevant style guides and ran the auto-formatter. (Unlike the flutter/flutter repo, the flutter/plugins repo does use dart format.)
  • I signed the CLA.
  • The title of the PR starts with the name of the plugin surrounded by square brackets, e.g. [shared_preferences]
  • I listed at least one issue that this PR fixes in the description above.
  • I updated pubspec.yaml with an appropriate new version according to the pub versioning philosophy, or this PR is exempt from version changes.
  • I updated CHANGELOG.md to add a description of the change, following repository CHANGELOG style.
  • I updated/added relevant documentation (doc comments with ///).
  • I added new tests to check the change I am making, or this PR is test-exempt.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@hellohuanlin hellohuanlin changed the title [camera]fix threading issue with thread safe types to ensure dispatch… [camera]Dispatch to platform thread before calling engine API Dec 15, 2021
@hellohuanlin hellohuanlin changed the title [camera]Dispatch to platform thread before calling engine API [camera]Dispatch to platform thread before calling engine API to prevent crash Dec 15, 2021
@hellohuanlin hellohuanlin changed the title [camera]Dispatch to platform thread before calling engine API to prevent crash [camera]Fix crash due to calling engine APIs from background thread Dec 16, 2021
@hellohuanlin hellohuanlin marked this pull request as ready for review December 16, 2021 00:25
@hellohuanlin hellohuanlin requested review from jmagman and cyanglaz and removed request for bparrishMines December 16, 2021 00:25

- (void)setUp {
[super setUp];
id mock = OCMClassMock([FlutterEventChannel class]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits:

Suggested change
id mock = OCMClassMock([FlutterEventChannel class]);
id mockEventChannel = OCMClassMock([FlutterEventChannel class]);

/**
* Registers a `FlutterTexture` for usage in Flutter and returns an id that can be used to reference
* that texture when calling into Flutter with channels. Textures must be registered on the
* platform thread. On success returns the pointer to the registered texture, else returns 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the word "main thread" here as plugin is not part of engine, "platform thread" might not make much sense to plugin devs or users.

ditto other places

__block int64_t textureId;
// Use dispatch_sync to keep FlutterTextureRegistry's sychronous API, so that we don't introduce new potential race conditions.
// We do not break priority inversion here since it's the background thread waiting for main thread.
dispatch_sync(dispatch_get_main_queue(), ^{
Copy link
Contributor

@cyanglaz cyanglaz Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatch_sync made this method wait for main thread. It is not obvious to the caller.
Maybe we introduce a completion handler and use dispatch_async instead so the wait is more obivous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I wanted to say in my comment is that, if we do not wait for main thread to complete (use dispatch_async), we effectively change the semantic of the wrapped function (from a sync API to an async one), which may introduce a potential race condition. But looks like my comment was not clearly phrased.

Copy link
Contributor Author

@hellohuanlin hellohuanlin Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case miscommunication, consider this dummy code:

var _foo = 0

func setFooTo2() {
  _foo = 2
}

func methodA() {
  setFooTo2()
}

func methodB() {
  let bar = 1 / foo
}

methodA() 
// since `setFooTo2()` is synchronous API, the subsequent code can safely assume `_foo` is updated
methodB()

However, if we create a wrapper and change the API to asynchronous one, we can't make that assumption anymore:


func methodA() {
  setFooTo2Asynchronously(completion: {...}) // async API
}

func methodB() { 
  let bar = 1 / foo
}

methodA() 
methodB()

So back to our example, after calling this texture API, there could be a potential code execution assuming that this API is already taken effect.

Since we have like 10+ registered methods, there could be exponential amount of combination that we need to check for race, which can easily get out of hand.

We have like 10+ methods, and race could happen in exponential amount of combinations.

// For example, we may crash in this order: 
method2()
method8()
method7()

// or in this: 
method1()
method9()

// or in this 
method4()
method3()

That's why it's important to keep the sync API here, and changing it to async may introduce new hard-to-find races. Why hard-to-find? Consider this bug. That race condition happens with a combination of the following methods, and with additional requirement that create() is called right after dispose():

dispose() 
create()  
startVideoRecording()

This is just one execution order out of potentially thousands of combinations.

If we were to re-write this whole plugin from the beginning, i would probably be fine with either approach, but since we are "patching" the code, I'd prefer to stick with sync behavior.

Hope this explanation makes sense (though it's kind of hard to phrase it in a few sentences in my comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your comment is fine.

The race condition could happen on textureId. if we use a completion block, the textureId would only be available inside the block. So the caller should be aware that the textureId is only accessible after the block is executed.

Your point makes sense too, the caller in this plugin always need to wait for the textureId to continue anyway. Maybe we can rename the method to make it clearer to the caller that this method will block the current thread and wait on main thread? Maybe something like:

/**
 * Registers a `FlutterTexture` for usage in Flutter and returns an id that can be used to reference
 * that texture when calling into Flutter with channels. Textures must be registered on the
 * platform thread. On success returns the pointer to the registered texture, else returns 0.
 *
 * The registration happens on Main Thread. If the calling thread is not Main Thread, the current thread will be blocked until the registration is successful.
 */
- (int64_t)registerTextureSync:(NSObject<FlutterTexture> *)texture;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use a completion block, the textureId would only be available inside the block. So the caller should be aware that the textureId is only accessible after the block is executed.

We can use completion block for the code in the same function for sure, but we can't use it for separate functions. (e.g. dispose and create functions are separate). The issue here is that, it's hard to tell if the developers had made implicit assumption of the synchronous nature when implementing this plugin. If we were to rewrite this plugin from the beginning, I would be fine with either approaches.

I am cool with the new name as you suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this one should be done with a completion block instead of blocking within queue hops. Something like:

typedef void (^FLTTextureRegistration)(int64_t textureId);
- (void)registerTexture:(NSObject<FlutterTexture> *)texture completionBlock:(FLTTextureRegistration)completionBlock;
- (void)registerTexture:(NSObject<FlutterTexture> *)texture completionBlock:(FLTTextureRegistration)completionBlock {
  if (!NSThread.isMainThread) {
    [NSOperationQueue.mainQueue addOperationWithBlock:^{
      completionBlock([self->_registry registerTexture:texture]);
    }];
  } else {
    completionBlock([_registry registerTexture:texture]);
  }
}
      [self.registry registerTexture:cam completionBlock:^(int64_t textureId) {
        [result sendSuccessWithData:@{
          @"cameraId" : @(textureId),
        }];
      }];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand with the create dispose example the data should be sent before create returns, but the caller in handleMethodCallAsync can make it synchronous in that case and wait to return in the completion block, instead of enforcing its synchronicity in the -registerTexture: API.

@jmagman
Copy link
Member

jmagman commented Dec 16, 2021

Failed to stop: Error getting access token for service account: 400 Bad Request

You should rebase onto #4592 to pick up the new credentials.

[self waitForExpectations:@[ _mainThreadExpectation ] timeout:1];
}

@end

This comment was marked as resolved.

@@ -4,7 +4,7 @@ description: A Flutter plugin for controlling the camera. Supports previewing
Dart.
repository: https://github.com/flutter/plugins/tree/master/packages/camera/camera
issue_tracker: https://github.com/flutter/flutter/issues?q=is%3Aissue+is%3Aopen+label%3A%22p%3A+camera%22
version: 0.9.4+5
version: 0.9.4+6

This comment was marked as resolved.

Copy link
Contributor Author

@hellohuanlin hellohuanlin Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmagman I removed this change but it's giving me error saying changelog and yaml versions does not match. Maybe i should not update the changelog in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced with @cyanglaz offline that we can use ## NEXT in the changelog

@@ -1,3 +1,7 @@
## 0.9.4+6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## 0.9.4+6
## NEXT

If you want to publish this change in a later PR,

return self;
}

- (int64_t)registerTexture:(NSObject<FlutterTexture> *)texture {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to rename.

Copy link
Contributor

@cyanglaz cyanglaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hellohuanlin hellohuanlin merged commit 3972212 into flutter:master Dec 20, 2021
@stuartmorgan
Copy link
Contributor

No version change:
I want to update the pubspec.yaml file in next PR so that we can batch the change.

I know it's too late now, but I only just saw this: Why? We explicitly don't want to batch changes in general (see the linked docs), and you also have no control in general over what it will be batched with—there's no guarantee that the next camera change to land will be the one you have in mind.

What was the advantage of doing this?

@jmagman
Copy link
Member

jmagman commented Dec 20, 2021

I hadn't reviewed this yet!

@jmagman
Copy link
Member

jmagman commented Dec 20, 2021

@hellohuanlin let's revert this at #4629 until Stuart and I can review it.

@hellohuanlin
Copy link
Contributor Author

No version change:
I want to update the pubspec.yaml file in next PR so that we can batch the change.

I know it's too late now, but I only just saw this: Why? We explicitly don't want to batch changes in general (see the linked docs), and you also have no control in general over what it will be batched with—there's no guarantee that the next camera change to land will be the one you have in mind.

What was the advantage of doing this?

@stuartmorgan I wanted to combine the 2 fixes because I wanted to avoid too frequent publishes, so that developers don't have to bump the version twice. But from your comment it seems this is fine. Then I am ok with bumping the version.

@hellohuanlin
Copy link
Contributor Author

hellohuanlin commented Dec 20, 2021

I hadn't reviewed this yet!

@jmagman Ah sorry. I did not intend to leave you out. I thought I just needed 1 stamp iirc. The land button turned green though, maybe some bug with the CI landing requirement check?

@jmagman
Copy link
Member

jmagman commented Dec 20, 2021

Ah sorry. I did not intend to leave you out. I thought I just needed 1 stamp iirc. The land button turned green though, maybe some bug with the CI landing requirement check?

By policy you generally do only need 1 LGTM from a flutter-hacker, I had just mentioned that I would get a chance to review on Monday. 🙂 You should probably add @stuartmorgan as a reviewer to non-trivial plugin changes, particularly when he has a lot of context for particular patterns (or failures to fix those patterns, in this case).

Copy link
Member

@jmagman jmagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding stylistic nits since I know you're putting this out for review again.

Comment on lines +13 to +14
FLTThreadSafeEventChannel *_channel;
XCTestExpectation *_mainThreadExpectation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally prefer properties to ivars (unless you have a good reason, like not exposing mutable collections), and then once a property, use dot notation.

Copy link
Contributor Author

@hellohuanlin hellohuanlin Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. My preference was the opposite (use ivar unless I need extra functionalities in property). But not a strong preference so I'm fine changing it to use property.

Comment on lines +40 to +42
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
[self->_channel invokeMethod:@"foo" arguments:nil];
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer higher level abstraction Foundation APIs to gcd C APIs.

Suggested change
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
[self->_channel invokeMethod:@"foo" arguments:nil];
});
NSOperationQueue *backgroundQueue = [[NSOperationQueue alloc] init];
[backgroundQueue addOperationWithBlock:^{
[self.channel invokeMethod:@"foo" arguments:nil];
}];

I know gcd C APIs are used all over this plugin, but if it were written from scratch I would want to use operation queues instead, and this new code doesn't need to interact with the older code using C APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am cool with either, though are we concerned about inconsistency in the same plugin?

Comment on lines +20 to +26
if (!NSThread.isMainThread) {
dispatch_async(dispatch_get_main_queue(), ^{
[self->_channel setStreamHandler:handler];
});
} else {
[_channel setStreamHandler:handler];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer higher level abstraction Foundation APIs to gcd C APIs.

Suggested change
if (!NSThread.isMainThread) {
dispatch_async(dispatch_get_main_queue(), ^{
[self->_channel setStreamHandler:handler];
});
} else {
[_channel setStreamHandler:handler];
}
if (!NSThread.isMainThread) {
[NSOperationQueue.mainQueue addOperationWithBlock:^{
[self->_channel setStreamHandler:handler];
}];
} else {
[_channel setStreamHandler:handler];
}

Since this doesn't set the stream handler synchronously, might be better to have a callback so _isStreamingImages = YES isn't set before the handler is set up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

Comment on lines +13 to +14
FLTThreadSafeMethodChannel *_channel;
XCTestExpectation *_mainThreadExpectation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer properties to ivars.

Comment on lines +13 to +16
FLTThreadSafeTextureRegistry *_registry;
XCTestExpectation *_registerTextureExpectation;
XCTestExpectation *_unregisterTextureExpectation;
XCTestExpectation *_textureFrameAvailableExpectation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer properties to ivars.

}

- (void)testShouldDispatchToMainThreadIfCalledFromBackgroundThread {
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NSOperationQueue

/**
* Wrapper for FlutterMethodChannel that always invokes messages on the main thread
*/
@interface FLTThreadSafeMethodChannel : NSObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really that it's "thread-safe" so much as it executes operations on the main queue.
How about FLTMainQueueMethodChannel?

/**
* Wrapper for FlutterTextureRegistry that always sends events on the main thread
*/
@interface FLTThreadSafeTextureRegistry : NSObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it's not just "thread-safe", it's main queue-confined. FLTMainQueueTextureRegistry? Although I guess this matches FLTThreadSafeFlutterResult...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was following the convention for consistency. But I can change both if you prefer.

__block int64_t textureId;
// Use dispatch_sync to keep FlutterTextureRegistry's sychronous API, so that we don't introduce new potential race conditions.
// We do not break priority inversion here since it's the background thread waiting for main thread.
dispatch_sync(dispatch_get_main_queue(), ^{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this one should be done with a completion block instead of blocking within queue hops. Something like:

typedef void (^FLTTextureRegistration)(int64_t textureId);
- (void)registerTexture:(NSObject<FlutterTexture> *)texture completionBlock:(FLTTextureRegistration)completionBlock;
- (void)registerTexture:(NSObject<FlutterTexture> *)texture completionBlock:(FLTTextureRegistration)completionBlock {
  if (!NSThread.isMainThread) {
    [NSOperationQueue.mainQueue addOperationWithBlock:^{
      completionBlock([self->_registry registerTexture:texture]);
    }];
  } else {
    completionBlock([_registry registerTexture:texture]);
  }
}
      [self.registry registerTexture:cam completionBlock:^(int64_t textureId) {
        [result sendSuccessWithData:@{
          @"cameraId" : @(textureId),
        }];
      }];

jmagman added a commit that referenced this pull request Dec 20, 2021
jmagman added a commit that referenced this pull request Dec 20, 2021
jmagman added a commit that referenced this pull request Dec 20, 2021
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Dec 20, 2021
@stuartmorgan
Copy link
Contributor

so that developers don't have to bump the version twice

Developers don't have to bump the version twice; they can update it however often they want to, regardless of how often we release. Releasing more often gives people more options, not fewer. (Also, manually updating versions of individual is not generally how most projects get new bugfix versions of plugins anyway, because of the way the constraint system works.)

KyleFin pushed a commit to KyleFin/plugins that referenced this pull request Dec 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants