Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution hangs when exceptions are thrown #649

Closed
RainNapper opened this issue Mar 3, 2020 · 11 comments · Fixed by #657
Closed

Execution hangs when exceptions are thrown #649

RainNapper opened this issue Mar 3, 2020 · 11 comments · Fixed by #657
Assignees
Milestone

Comments

@RainNapper
Copy link
Contributor

When passing in a test filter that is improperly formed, Flank will throw an exception but will not terminate. When flank is run in CI, this causes unnecessary timeouts while the scripts are waiting for the command to finish.

Example error
java.lang.IllegalArgumentException: Invalid argument: com.package.Class#testMethod at ftl.filter.TestFilters.parseSingleFilter(TestFilters.kt:100) at ftl.filter.TestFilters.access$parseSingleFilter(TestFilters.kt:31) at ftl.filter.TestFilters$fromTestTargets$parsedFilters$2.invoke(TestFilters.kt:85) at ftl.filter.TestFilters$fromTestTargets$parsedFilters$2.invoke(TestFilters.kt:31) at kotlin.sequences.TransformingSequence$iterator$1.next(Sequences.kt:172) at kotlin.sequences.SequencesKt___SequencesKt.toCollection(_Sequences.kt:702) at kotlin.sequences.SequencesKt___SequencesKt.toMutableList(_Sequences.kt:732) at kotlin.sequences.SequencesKt___SequencesKt.toList(_Sequences.kt:723) at ftl.filter.TestFilters.fromTestTargets(TestFilters.kt:86) at ftl.args.AndroidTestShard.getTestMethods(AndroidTestShard.kt:32) at ftl.args.AndroidTestShard.getTestShardChunks(AndroidTestShard.kt:25) at ftl.run.AndroidTestRunner$runTests$2.invokeSuspend(AndroidTestRunner.kt:38) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.ResumeModeKt.resumeUninterceptedMode(ResumeMode.kt:45) at kotlinx.coroutines.internal.ScopeCoroutine.afterCompletionInternal(Scopes.kt:32) at kotlinx.coroutines.JobSupport.completeStateFinalization(JobSupport.kt:311) at kotlinx.coroutines.JobSupport.tryFinalizeFinishingState(JobSupport.kt:237) at kotlinx.coroutines.JobSupport.tryMakeCompletingSlowPath(JobSupport.kt:851) at kotlinx.coroutines.JobSupport.tryMakeCompleting(JobSupport.kt:813) at kotlinx.coroutines.JobSupport.makeCompletingOnce$kotlinx_coroutines_core(JobSupport.kt:789) at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:111) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46) at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241) at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.kt:116) at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:79) at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:54) at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source) at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:36) at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source) at ftl.cli.firebase.test.android.AndroidRunCommand.run(AndroidRunCommand.kt:49) at picocli.CommandLine.executeUserObject(CommandLine.java:1687) at picocli.CommandLine.access$900(CommandLine.java:146) at picocli.CommandLine$RunLast.handle(CommandLine.java:2059) at picocli.CommandLine$RunLast.handle(CommandLine.java:2026) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1893) at picocli.CommandLine.execute(CommandLine.java:1822) at ftl.Main$Companion.main(Main.kt:41) at ftl.Main.main(Main.kt)

@pawelpasterz
Copy link
Contributor

@RainNapper Thanks for reporting this bug, have you encountered similar issue with other cases?

@RainNapper
Copy link
Contributor Author

Brought up some more details in slack message here: https://firebase-community.slack.com/archives/C72V6UW8M/p1583276521008200

Copy pasting here:

I filed #649
It might be tangentially related to #532

But I think I narrowed it down to couple things:

  1. In Main.kt there's CommandLine(Main()).execute(*args). For non-zero exit statuses, I believe it should be replaced with exitProcess(CommandLine(Main()).execute(*args)) . It returns an exit code of 1, but the process never terminates. I can't figure out how to build the .jar locally, but I repro'd in Android Studio via Debug.kt throwing exceptions.
  2. picocli.CommandLine#execute(...) doesn't catch Throwable. It only catches Exception. In my luckily dysfunctional local env, running Debug.kt would throw a VerifyError . Because Error is not caught, excute(...) never even returns.
    Regarding the picocli issue, I was able to "fix" the issue in Flank by just wrapping the AndroidRunCommand#run with a try { ... } catch (t: Throwable) { throw RuntimeException(t) } so it got caught by picocli's filters. This is obviously not an ideal fix, but it's worth noting as how I "confirmed" it was the issue with picocli.

If you're curious, here's the first couple lines of the VerifyError , but unrelated to this bug.

Exception Details:
  Location:
    com/fasterxml/jackson/databind/deser/std/StdDeserializer._parseDate(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/util/Date; @77: astore
  Reason:
    Type 'com/fasterxml/jackson/core/JsonParseException' (current frame, stack[0]) is not assignable to 'com/fasterxml/jackson/core/exc/StreamReadException' (stack map, stack[0])
  Current Frame:
    bci: @69
    flags: { }
    locals: { 'com/fasterxml/jackson/databind/deser/std/StdDeserializer', 'com/fasterxml/jackson/core/JsonParser', 'com/fasterxml/jackson/databind/DeserializationContext' }
    stack: { 'com/fasterxml/jackson/core/JsonParseException' }```

@remkop
Copy link

remkop commented Mar 4, 2020

Hello, @bootstraponline raised a ticket on the picocli issue tracker and asked me to help with investigation on this issue, mentioning that "It seems like the root cause is due to picocli". I'm not convinced yet that this is the case, but I'm happy to help.

First, @RainNapper's point (1):

It (the CommandLine.execute method) returns an exit code of 1, but the process never terminates.

Does Flank or kotlinx.coroutines spin off any non-daemon threads? That would explain the JVM not exiting even though the CommandLine.execute method, and subsequently the Main.main method, returned and the main thread exited.
If that is the case, and the desired behaviour is to terminate the JVM process, I support @RainNapper's suggestion to explicitly call System.exit when the CommandLine.execute method returns.

The non-daemon thread hypothesis may also explain @RainNapper's point (2):

picocli.CommandLine#execute(...) doesn't catch Throwable. It only catches Exception. In my luckily dysfunctional local env, running Debug.kt would throw a VerifyError . Because Error is not caught, excute(...) never even returns.

It is true that picocli only catches Exceptions and not Errors. But this does not mean that the CommandLine.execute method does not return, it means that the Error is percolated all the way up the stack to the Main.main method, never being caught, and terminates the main thread. Usually that would mean the JVM would exit, but if a non-daemon thread is still running, that non-daemon thread will keep the JVM going even after the main thread died.

Putting those together, perhaps the Main.kt's main method should be changed to the below:

// Main.kt
companion object {
    @JvmStatic
    fun main(args: Array<String>) {
        try {
            exitProcess(CommandLine(Main()).execute(*args))
        } (catch t: Throwable) {
            t.printStackTrace()
            exitProcess(picocli.CommandLine.ExitCode.SOFTWARE)
        }
    }
}

@RainNapper
Copy link
Contributor Author

That looks good to me. The only thing I did before concluding that CommandLine.execute doesn't execute is put a print statement after it and verify it didn't print. I had (incorrectly) assumed that if Main.main caught the exception, then the everything would exit. But I am not familiar with the JVM / coroutine interactions that would create the same result.

What you said sounds like a good explanation to me, and thus the changes look good too.

Thanks for the input! Very much appreciated.

@bootstraponline
Copy link
Contributor

@remkop Thank you for the insight! That makes sense.

@remkop
Copy link

remkop commented Mar 4, 2020

Glad I was able to help. Remember: your picocli GitHub 🌠 stars are the only thing that fill the yawning void in my soul! 😉
(Translation: please star picocli on GitHub if you like the project.) ;-)

@remkop
Copy link

remkop commented Mar 4, 2020

Additionally, assuming that my hypothesis holds water and the problem is caused by non-daemon threads preventing the JVM from exiting, if Flank creates those threads, Flank could also consider making those threads daemon threads. (I would still recommend the above changes in case something else creates a non-daemon thread.)

@bootstraponline bootstraponline added this to the May 2020 milestone Mar 6, 2020
@pawelpasterz
Copy link
Contributor

Hey, I managed to find root cause of this issue.
TLDR; it's not picocli but bugsnag :)

While debugging I noticed that flank has non-daemon thread (as @remkop predicted). Digging deeper lead me to this line in FtlConstants

val bugsnag = Bugsnag(if (useMock) null else (...) )

Bugsnag uses its own thread pool to handle sending statistics, reports etc. This executor is not closed properly and in this case prevents jvm from shutting down. I verified this by closing bugsnag manually by simple call

bugsnag.close()

and in all cases flank terminated successfully. When I tracked down it is bugsnag I found there is already issue posted Bugsnag#151

I tried proposed solution but I think simple wrapping with exitProcess(...) is enough (verified)

@pawelpasterz pawelpasterz mentioned this issue Mar 8, 2020
3 tasks
@bootstraponline
Copy link
Contributor

@pawelpasterz Thanks for figuring this out!

@remkop
Copy link

remkop commented Mar 8, 2020

@pawelpasterz The proposed solution covers against the case where some component in the call stack starts a non-daemon thread, and then throws an Error that kills the main thread.

I'm not sure if any of the components you are currently testing against show this behaviour (I assumed they did, since @RainNapper brought it up). Even if there are currently no such components, it may still be worth protecting against them. :-)

@pawelpasterz
Copy link
Contributor

@remkop got your point, I'll change my fix then. Thanks for pointing this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants