-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix hung launcher on shutdown of Safari #38
fix: Fix hung launcher on shutdown of Safari #38
Conversation
In some cases (notably macOS Safari under GitHub Actions), a call to forceKill() would be triggered during another call to forceKill(). This could cause the second Promise to go unresolved, leading to a hang. On GitHub, eventually, the workflow would be cancelled. This fixes the nested calls to forceKill by making them both resolve on the same event (the shutdown triggered by the first call). This also adds a timeout for shutting down a WebDriver session. Although this does not appear to be the root cause of the hang we were experiencing in GitHub Actions workflows, it should be safer to have this timeout. If we can't stop a WebDriver session gracefully, we will timeout after 5s and end the launcher anyway.
I was led to this change by lots of console.log-based debugging and tracing through both Karma and the launcher. With the logging still in place, I can see that the mitigations are active. With this change, I've been able to run Shaka's tests on Safari, in the GitHub Actions environment, |
@@ -116,9 +119,18 @@ const LocalWebDriverBase = function(baseBrowserDecorator, args, logger) { | |||
await this.stopWebdriver_(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say, what's the difference between the kill
and forceKill
operations? They both just call stopWebdriver_
.
Is kill
called in such a way that we don't need to worry about them being nested, or about a forceKIll
being nested inside a kill
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill() and forceKill() are part of the launcher API, but in my logs, I don't see kill() being called. In a purely external-process-based launcher, the base classes would send different POSIX signals to the subprocess for kill() and forceKill() if I recall correctly.
I initially thought forceKill() should never fail or be infinitely delayed, so I tried adding a timeout at the level of forceKill(). But that didn't solve the issue. In the end, I added the timeout to the WebDriver quit command, since that's the only thing that can fail or be delayed.
So after all the debugging and logging, I concluded that kill() and forceKill() can still effectively be the same, except for the state tracking part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I was going to suggest moving the new logic for saving the operation as a promise into stopWebdriver_
, instead of forceKill
. But if kill
isn't being called I guess it doesn't matter.
In some cases (notably macOS Safari under GitHub Actions), a call to
forceKill() would be triggered during another call to forceKill().
This could cause the second Promise to go unresolved, leading to a
hang. On GitHub, eventually, the workflow would be cancelled.
This fixes the nested calls to forceKill by making them both resolve
on the same event (the shutdown triggered by the first call).
This also adds a timeout for shutting down a WebDriver session.
Although this does not appear to be the root cause of the hang we
were experiencing in GitHub Actions workflows, it should be safer to
have this timeout. If we can't stop a WebDriver session gracefully,
we will timeout after 5s and end the launcher anyway.
Closes #24