-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call stays in active state upon channel error #900
Comments
It looks like a communication issue between verboice and surveda: the former failed to capture the MP3 file with a timeout. If I understand the code correctly, the Twilio PBX crashed, the session handler noticed it and logged an internal error, but the close state got never changed? |
I forgot to say that I reviewed issues logged in Sentry and found nothing remotely related, hence the network issue. |
Errata: the
Now, it seems Verboice considers the call to still be open, when the Broker properly failed and reported. Maybe the CallLog state is incorrectly updated in |
I tried hard to reproduce this bug with no success. I did reproduce the timeout error and got the same logs in this issue. My strongest hypothesis is very simple: this update fails. As a consequence of this failure, in the Mumbai instance, 0.01% of the started calls remained "active" when they actually failed. SELECT sum(case when finished_at is null then 1 else 0 end) `unfinished`,
count(1) `started`,
sum(case when finished_at is null then 1 else 0 end) / count(1) * 100 as `percentage`
FROM `call_logs`
WHERE started_at is not null
|------------|---------|------------|
| unfinished | started | percentage |
|------------|---------|------------|
| 237 | 1725747 | 0.0137 |
|------------|---------|------------| #912 doesn't solve this failure. It will continue happening. But its consequences (the active calls) won't be there anymore. |
When a call remains active for too long Verboice considers there was an error and cancels it. For #900
huh... is it possible that we retain a MySQL connection which gets closed during the X minutes timeout? which is only noticed when we try to update (EPIPE)? could the connection be missing an auto-reconnect or something? |
Today it seems had a couple of (consistent) occurrences of this issue in STG by hanging up calls using Callcentric. I share the logs. |
As outlined by @ggiraldez, as soon as we get the hung up during the Gather operation, we try to update a row of
|
I reproduced with the Twilio simulator:
Maybe this is how the "capture timeout" above triggers in: Twilio reports an error status to |
See #778 for a similar issue.
We've observed some calls that stay with state
active
for weeks in Verboice, even if it was finished/cancelled/errored. The one we've just seen (call in Verboice, broker's logs below) was sent via Twilio (call SIDCA381882b5e5f24f714d0ba28eee084f3c
inInSTEDD 4 NCD
project). The error was internal to the broker (maybe a communication issue with Surveda?), there are no errors seen in Twilio.We should check what the error is, and how to properly handle it.
CC: @ggiraldez
The text was updated successfully, but these errors were encountered: