Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detox v18/v19 crashes sometimes on iOS in DetoxSync #3000

Closed
1 of 4 tasks
mikehardy opened this issue Sep 30, 2021 · 27 comments · Fixed by #3135
Closed
1 of 4 tasks

Detox v18/v19 crashes sometimes on iOS in DetoxSync #3000

mikehardy opened this issue Sep 30, 2021 · 27 comments · Fixed by #3135

Comments

@mikehardy
Copy link
Contributor

mikehardy commented Sep 30, 2021

Description

When updating from v17.latest of Detox to v18.latest I started seeing flakiness in the react-native-firebase e2e tests.

This does not happen all the time, it only happens sometimes. It looks like a bit of a race that some object destruct code is losing sometimes?

  • I have tested this issue on the latest Detox release and it still reproduces

Reproduction

Provide the steps necessary to reproduce the issue. If you are seeing a regression, try to provide the last known version where the issue did not reproduce.

  1. The react-native-firebase test suite is fully open source - instructions here https://github.com/invertase/react-native-firebase/blob/master/tests/README.md
  2. git clone [email protected]:invertase/react-native-firebase.git
  3. yarn && yarn tests:ios:pod:install && yarn tests:ios:build
  4. In a few terminals: yarn tests:packager:jet and yarn tests:emulator:start and yarn tests:ios:test

Sometimes (not every time, maybe 10% of the time? 20% fo the time?) it will crash with the stack below.

Expected behavior

It rolls through all our tests. For the purposes of this issue, it runs at all :-) - if it is going to crash, it will do so with no tests passing, if it starts running and a single test passes, they will all roll through It usually crashes at startup but sometimes will make it through a handful of tests first.

Screenshots

If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Detox: v18.22.0
  • React Native: 0.66.0-rc.4 (about to be 0.66.0 with no changes)
  • Node: v14.17.6
  • Device: apple simulator
  • Xcode: 12.5.1 (so I can avoid ios15 issues Support iOS 15 #2895 which are in progress separately)
  • iOS: 14.5 (to avoid issues with ios15 Support iOS 15 #2895)
  • macOS: macOS 11.6 (Apple Silicon M1 or Intel Mac - I've seen it on both)

Logs

It either works through all tests, or sometime shortly after app launch it crashes like this:


detox[98942] ERROR: [WS_ERROR] The app has crashed, see the details below:

The app has crashed, see the details below:

Signal 11 was raised
(
        0   Detox                               0x00000001104c5a15 +[NSThread(DetoxUtils) dtx_demangledCallStackSymbols] + 37
        1   Detox                               0x00000001104c8e10 __DTXHandleCrash + 464
        2   Detox                               0x00000001104c9555 __DTXHandleSignal + 59
        3   libsystem_platform.dylib            0x00007fff60335d7d _sigtramp + 29
        4   ???                                 0x00007f8fdf920050 0x0 + 140255907938384
        5   CoreFoundation                      0x0000000114170833 -[__NSDictionaryM dealloc] + 128
        6   libobjc.A.dylib                     0x0000000113f3b604 objc_object::sidetable_release(bool, bool) + 174
        7   libobjc.A.dylib                     0x0000000113f378ad _object_remove_assocations + 562
        8   libobjc.A.dylib                     0x0000000113f34c02 objc_destructInstance + 84
        9   libobjc.A.dylib                     0x0000000113f3ad88 -[NSObject dealloc] + 21
        10  libobjc.A.dylib                     0x0000000113f3b604 objc_object::sidetable_release(bool, bool) + 174
        11  CFNetwork                           0x0000000111038610 _CFNetworkHTTPConnectionCacheSetLimit + 162696
        12  CFNetwork                           0x0000000110e81183 CFNetwork + 24963
        13  DetoxSync                           0x0000000146cf0220 ____detox_sync_dispatch_wrapper_block_invoke + 23
        14  libdispatch.dylib                   0x00000001148c370d _dispatch_call_block_and_release + 12
        15  libdispatch.dylib                   0x00000001148c48df _dispatch_client_callout + 8
        16  libdispatch.dylib                   0x00000001148cae15 _dispatch_lane_serial_drain + 715
        17  libdispatch.dylib                   0x00000001148cb9c3 _dispatch_lane_invoke + 455
        18  libdispatch.dylib                   0x00000001148d5f81 _dispatch_workloop_worker_thread + 772
        19  libsystem_pthread.dylib             0x00007fff6034045d _pthread_wqthread + 314
        20  libsystem_pthread.dylib             0x00007fff6033f42f start_wqthread + 15
)

If you are experiencing a timeout in your test

If you are seeing a Detox build problem (e.g. during npm install, not detox build)

  • I am providing the npm install log below:

Device and verbose Detox logs

  • I have run my tests using the --loglevel trace argument and am providing the verbose log below:

Reproducible Demo

In case of a bug or a crash please add an example forking from the DetoxTemplate (follow the guidelines there) which reproduce the issue and ready to clone.
Add to the DetoxTemplate (After you fork it), the minimal things which required to reproduce the issue (3rd party libraries / e2e tests).

@uloco
Copy link
Contributor

uloco commented Oct 1, 2021

probably related #2641

@mikehardy mikehardy reopened this Oct 1, 2021
@mikehardy
Copy link
Contributor Author

Sorry for the close reopen, an incorrect mobile tap on my part.
That thread looks unrelated except for firebase
Our tests are not parallel at all and the stack is different

@d4vidi
Copy link
Collaborator

d4vidi commented Oct 3, 2021

@mikehardy this is an excellent report, which has real potential in helping us figuring out issues with DetoxSync.

@mikehardy
Copy link
Contributor Author

Here's a GitHub Actions run showing it happening in case that's useful https://github.com/invertase/react-native-firebase/runs/3783157278?check_suite_focus=true#step:21:13

I have determined that it is not always immediately on startup, but it's usually pretty close to startup. Reproduction rate still seems to be about 10-20% of my local intel-mac e2e runs in the react-native-firebase suite.

Not sure how to mitigate it as it doesn't always happen on startup, and I'm not sure how to catch an "app crashed" error so that I can attempt to just relaunch the app and see if it will go through afterwards / do "app crashed retries"

@rodperottoni
Copy link

Hey Mike. Just confirming that this issue is not related to Firebase Performance at all?
I had the same issue here until I decided to remove Firebase Performance from my project and all the crashes are gone.

@mikehardy
Copy link
Contributor Author

I haven't tried, the whole point of my using detox is in my role maintaining react-native-firebase, and it's full e2e test harness 😁😅

@d4vidi d4vidi added this to the Q3/2021 milestone Oct 18, 2021
@shamilovtim
Copy link

shamilovtim commented Oct 27, 2021

We're having this issue despite using .e2e.ts files to mock out firebase perf and firebase analytics. So I don't think it's Firebase related.

@shamilovtim
Copy link

One thing: this only seems to happen on CircleCI for us. Could it be related to the performance of macOS containers vs. bare metal?

@mikehardy
Copy link
Contributor Author

Perhaps. It happens to us a lot on Github Actions which are not that fast, but it also happens to me a lot both in an OSX VM which sounds like it would be slow but is on about the fastest laptop you can purchase so is actually really fast, plus I see it on an M1 machine (bare metal, pretty quick) and MacBook Airs which are slow but bare metal. So for me doesn't seem speed-related. Surprised you don't ever see it in your local environments vs my experience then 🤔

@Pipeman
Copy link

Pipeman commented Oct 28, 2021

I can vouch that it is not speed related. We had Detox running on physical M1 Mac Minis used as GitHub action runners for a few weeks and had this SIGNAL 11 issue repeatedly. In the end we had to revert back to Intel Mac Minis to be used only for running Detox.
What we noticed was that this error presented itself almost systematically during the most intensive hours of the day (so usually during the afternoon when most people were creating PRs and therefore running Detox tests), instead during the less usage intensive hours, the issue could or could not present itself, it was more random. We decreased the number of runners down to 1, but the issue was still there. Consider that these machines were there only to run these operations, they had no other type of load whatsoever.

@shamilovtim
Copy link

shamilovtim commented Oct 28, 2021

Here is my comment from the other thread:
#2802 (comment)

I did extensive load tests on my own machine with 100% CPU usage on bare metal and could not reproduce the issue. However, the moment Detox runs under a VM (which is 16GB mem, 8*vCPU and running at 5% load), the issue appears. What I can conclude is that this is some sort of incompatibility between Detox and a hypervisor. Detox + virtualization, not performance, seems to cause the error.

@shamilovtim
Copy link

Perhaps. It happens to us a lot on Github Actions which are not that fast, but it also happens to me a lot both in an OSX VM which sounds like it would be slow but is on about the fastest laptop you can purchase so is actually really fast, plus I see it on an M1 machine (bare metal, pretty quick) and MacBook Airs which are slow but bare metal. So for me doesn't seem speed-related. Surprised you don't ever see it in your local environments vs my experience then 🤔

The reason you probably see this issue on bare metal is because you've left in your Firebase dependency, where the two swizzlers (detox and firebase) seem to have some sort of underlying compatibility. Once you mock out Firebase (we removed it with an e2e.ts NOOP stub), the issue disappears on bare metal. BUT if you run Detox in a virtualized environment, firebase or not, it will crash with this error. Thus:

  1. It happens with firebase
  2. It happens while running under a VM

@mikehardy
Copy link
Contributor Author

Interesting - well, I need it to work with firebase performance integrated of course :-), even if a hypervisor is only a very strong "want" vs a need. It appears that I need to run a test on bare metal without firebase performance to see if I can at least reproduce clean runs. Then I suppose to actually make this more easily debuggable for someone with appropriate skills (that being, solid Objective-C skills which I lack unfortunately) I'll need to fork here and add a test with firebase performance in it. Thanks for the info @shamilovtim

@shamilovtim
Copy link

Yeah for us it's hard blocker that Detox has to run virtualized. Circle or Github aren't going to give us bare metal containers, so it's a blocker either way. Being able to have Firebase running during Detox would also be a plus.

@amq
Copy link

amq commented Nov 4, 2021

In my case, this problem is reproducible on every try.

x86 on macos-11 on github actions

    Signal 6 was raised
    (
    	0   Detox                               0x00000001068a4d45 +[NSThread(DetoxUtils) dtx_demangledCallStackSymbols] + 37
    	1   Detox                               0x00000001068a79e0 __DTXHandleCrash + 464
    	2   Detox                               0x00000001068a8125 __DTXHandleSignal + 59
    	3   libsystem_platform.dylib            0x000000010e774d7d _sigtramp + 29
    	4   ???                                 0x0000000106834b80 0x0 + 4404235136
    	5   libsystem_c.dylib                   0x000000010e488cb5 abort + 120
    	6   libc++abi.dylib                     0x000000010e188692 abort_message + 241
    	7   libc++abi.dylib                     0x000000010e179dfd demangling_unexpected_handler() + 0
    	8   libobjc.A.dylib                     0x000000010b1c7ace _objc_terminate() + 96
    	9   Detox                               0x00000001068a861f __dtx_terminate() + 157
    	10  libc++abi.dylib                     0x000000010e187aa7 std::__terminate(void (*)()) + 8
    	11  libc++abi.dylib                     0x000000010e187a49 std::terminate() + 41
    	12  libdispatch.dylib                   0x000000010e2ca8f3 _dispatch_client_callout + 28
    	13  libdispatch.dylib                   0x000000010e2d0e15 _dispatch_lane_serial_drain + 715
    	14  libdispatch.dylib                   0x000000010e2d198c _dispatch_lane_invoke + 400
    	15  libdispatch.dylib                   0x000000010e2dbf81 _dispatch_workloop_worker_thread + 772
    	16  libsystem_pthread.dylib             0x000000010e79845d _pthread_wqthread + 314
    	17  libsystem_pthread.dylib             0x000000010e79742f start_wqthread + 15
    )

      43 |     // on ios keyboard is presented so it needs to be dismissed
      44 |     if (device.getPlatform() === 'ios') {
    > 45 |       await element(by.text('Phone number *')).tap();
         |                                                ^
      46 |     }
      47 |
      48 |     // todo: fix

      at _callee2$ (signup.e2e.js:45:48)
      at tryCatch (../node_modules/regenerator-runtime/runtime.js:63:40)
      at Generator.invoke [as _invoke] (../node_modules/regenerator-runtime/runtime.js:293:22)
      at Generator.next (../node_modules/regenerator-runtime/runtime.js:118:21)
      at tryCatch (../node_modules/regenerator-runtime/runtime.js:63:40)
      at invoke (../node_modules/regenerator-runtime/runtime.js:154:20)
      at ../node_modules/regenerator-runtime/runtime.js:164:13

m1 on macos-12 on mba

    The app has crashed, see the details below:

    Signal 6 was raised
    (
        0   Detox                               0x000000010ead0d45 +[NSThread(DetoxUtils) dtx_demangledCallStackSymbols] + 37
        1   Detox                               0x000000010ead39e0 __DTXHandleCrash + 464
        2   Detox                               0x000000010ead4125 __DTXHandleSignal + 59
        3   libsystem_platform.dylib            0x0000000115542e2d _sigtramp + 29
        4   ???                                 0x0000000000000000 0x0 + 0
        5   libsystem_c.dylib                   0x0000000112cf6684 abort + 123
        6   libc++abi.dylib                     0x00000001108df5c2 abort_message + 241
        7   libc++abi.dylib                     0x00000001108d076d demangling_unexpected_handler() + 0
        8   libobjc.A.dylib                     0x0000000110703c1b _objc_terminate() + 96
        9   Detox                               0x000000010ead461f __dtx_terminate() + 157
        10  libc++abi.dylib                     0x00000001108de9e7 std::__terminate(void (*)()) + 8
        11  libc++abi.dylib                     0x00000001108de998 std::terminate() + 56
        12  libdispatch.dylib                   0x000000011396ea6a _dispatch_client_callout + 28
        13  libdispatch.dylib                   0x000000011397508b _dispatch_lane_serial_drain + 718
        14  libdispatch.dylib                   0x0000000113975c31 _dispatch_lane_invoke + 400
        15  libdispatch.dylib                   0x00000001139806de _dispatch_workloop_worker_thread + 772
        16  libsystem_pthread.dylib             0x0000000113c3a08f _pthread_wqthread + 326
        17  libsystem_pthread.dylib             0x0000000113c3901b start_wqthread + 15
    )

      30 |
      31 |     await element(by.text('Not a member yet? Register')).tap();
    > 32 |     await element(by.id('register-email-input')).replaceText(
         |                                                  ^
      33 |       `${random}@abc.fake`,
      34 |     );
      35 |

      at _callee2$ (signup.e2e.js:32:50)
      at tryCatch (../node_modules/regenerator-runtime/runtime.js:63:40)
      at Generator.invoke [as _invoke] (../node_modules/regenerator-runtime/runtime.js:293:22)
      at Generator.next (../node_modules/regenerator-runtime/runtime.js:118:21)
      at tryCatch (../node_modules/regenerator-runtime/runtime.js:63:40)
      at invoke (../node_modules/regenerator-runtime/runtime.js:154:20)
      at ../node_modules/regenerator-runtime/runtime.js:164:13

@mikehardy
Copy link
Contributor Author

@amq I see no instance of the string DetoxSync in your stack traces in any location. I believe your crash is unrelated, and is likely only showing up with Detox at all because Detox registers itself as a general crash handler in the system so all crashes go through it. Your crash does not actually look Detox related at all.

@cohesivejones
Copy link

Detox: v18.22.1
React Native: 0.64.33
Node: v14.15.4
Device: apple simulator
Xcode: 12.5.1
iOS: 14.5

This occurs for our team in CircleCI only:

DetoxRuntimeError: The pending request #-1000 ("isReady") has been rejected due to the following error:

    The app has crashed, see the details below:

    Signal 11 was raised
    (
    	0   Detox                               0x0000000110dc66c5 +[NSThread(DetoxUtils) dtx_demangledCallStackSymbols] + 37
    	1   Detox                               0x0000000110dc9250 __DTXHandleCrash + 464
    	2   Detox                               0x0000000110dc9991 __DTXHandleSignal + 59
    	3   libsystem_platform.dylib            0x000000011cc55d7d _sigtramp + 29
    	4   ???                                 0x0000000000000000 0x0 + 0
    	5   DetoxSync                           0x0000000152985ca8 +[DTXRunLoopSyncResource _existingSyncResourceWithRunLoop:clear:] + 120
    	6   DetoxSync                           0x0000000152980734 +[DTXSyncManager _untrackCFRunLoop:] + 45
    	7   DetoxSync                           0x00000001529806d5 +[DTXSyncManager untrackCFRunLoop:] + 75
    	8   DetoxSync                           0x000000015297ca0b swz_runRunLoopThread + 116
    	9   Foundation                          0x00000001117458a9 __NSThread__start__ + 1068
    	10  libsystem_pthread.dylib             0x000000011cc7c8fc _pthread_start + 224
    	11  libsystem_pthread.dylib             0x000000011cc78443 thread_start + 15
    )
 

@ball-hayden ball-hayden mentioned this issue Nov 14, 2021
1 task
@mikehardy mikehardy changed the title Detox v18 crashes sometimes on iOS in DetoxSync Detox v18/v19 crashes sometimes on iOS in DetoxSync Nov 16, 2021
@shamilovtim
Copy link

Found this disclaimer in the Firebase performance SDK docs:
Screen Shot 2021-12-12 at 11 37 03 AM

Validates theories that swizzling is conflicting

@asafkorem
Copy link
Contributor

I just opened a related issue on firebase-ios-sdk repo: firebase/firebase-ios-sdk#9083.

Also, this seems to be an issue we may be able to solve from DetoxSync's end. I'll investigate this a bit more but the general direction is to dispose the runtime-generated class that DetoxSync created before Firebase tries to dispose their generated class (the superclass of it), which will prevent the crash of calling objc_disposeClassPair() on a class before disposing its subclasses. See Apple docs:

Do not call this function if instances of the cls class or any subclass exist.

At the moment, the quickest solution I can offer is to disable Firebase Performance when Detox Tests are running.

@asafkorem asafkorem mentioned this issue Dec 12, 2021
1 task
@mikehardy
Copy link
Contributor Author

mitigation via new command line environment variable on the app launch appears to work

SIMCTL_CHILD_NSZombiesEnabled=1

firebase/firebase-ios-sdk#9083 (comment)

mikehardy added a commit to invertase/react-native-firebase that referenced this issue Dec 13, 2021
root cause hypothesis is double-swizzling w/DetoxSync and performance
mitigation hypothesis is to use new GUL swizzler feature to disable some object destruction
firebase/firebase-ios-sdk#9083
wix/Detox#3000
@shamilovtim
Copy link

@mikehardy assuming this will this work with yarn detox test -c. noticed you pass the exact path to the binary in node_modules

mikehardy added a commit to invertase/react-native-firebase that referenced this issue Dec 13, 2021
root cause hypothesis is double-swizzling w/DetoxSync and performance
mitigation hypothesis is to use new GUL swizzler feature to disable some object destruction
firebase/firebase-ios-sdk#9083
wix/Detox#3000
mikehardy added a commit to invertase/react-native-firebase that referenced this issue Dec 13, 2021
root cause hypothesis is double-swizzling w/DetoxSync and performance
mitigation hypothesis is to use new GUL swizzler feature to disable some object destruction
firebase/firebase-ios-sdk#9083
wix/Detox#3000
@mikehardy
Copy link
Contributor Author

@shamilovtim unknown and I don't want to speculate, sorry - all I know is that I've got at least some mitigation with the change. I think I still see crashes under some circumstances, but I think I've got an improvement if I'm running the experiments I think I'm running and I'm interpreting results correctly. Sincere apologies for being so weasel-y / vague about it. This stuff is subtle and I'm not sure of much at the moment, takes time

@shamilovtim
Copy link

@mikehardy I see, thanks for your efforts on this. Sorry for off topic but is there a reason you execute detox as node_modules/.bin/detox rather than yarn detox or npm detox?

@mikehardy
Copy link
Contributor Author

Probably not? Per my bio I attempt to reverse entropy as a full-time thing, but there always seems to be more of it ;-). Most likely that is unnecessary, at least I'm not aware of any reason why we reach into .bin, and I'm aware than in a yarn v2+ future that's non-functional

asafkorem added a commit to asafkorem/Detox that referenced this issue Dec 15, 2021
This workaround solves the issue described here:
firebase/firebase-ios-sdk#9083

And some of the crashes that was mentioned here:
- wix#3000
- wix#3123
- wix#2641
- wix#2802
@awinograd
Copy link
Contributor

Was previous having this same stacktrace as #3000 (comment)

Tried @mikehardy's solution (but beware of the typo in the env variable!) Adding SIMCTL_CHILD_NSZombieEnabled=YES to my test script resolved the crash for me (to be determined if it's consistent or just reduces crash rate as indicated in another thread I'm having trouble finding right now)

Thanks @mikehardy for your investigation & work to uncover this fix!

@mikehardy
Copy link
Contributor Author

@asafkorem and the folks in the firebase iOS repo deserve the credit but I'm glad it works for you. I'm 100% green in CI with this mitigation

asafkorem added a commit that referenced this issue Dec 20, 2021
This workaround solves the issue described here:
firebase/firebase-ios-sdk#9083

And some of the crashes that was mentioned here:
- #3000
- #3123
- #2641
- #2802
@asafkorem asafkorem linked a pull request Dec 22, 2021 that will close this issue
@asafkorem
Copy link
Contributor

#3135 was merged, it solve (with a workaround) the issue of DetoxSync crashes when Firebase/Performance is integrated, it was tested on the example apps that reproduced this issue.

We will release a version with this change soon.

Once you have updated a version (to the latest/next version), if any of you are still having Signal 11 crashes on DetoxSync, please open a new bug report with the required details.

Huge thanks @mikehardy for your support and the assistance in solving this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants