You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a recent CRI 568174889 the Functions Host got into a state where no workers were running but the host did not try to start/restart a worker, causing all function invocations to fail with error "Did not find any initialized language workers".
Below is a Kusto query showing the sequence of events that lead to the app getting into this broken state. Here's the relevant timeline:
At 2024-11-19 23:36:29.396 a concurrency bug happened during function loading
"System.InvalidOperationException : Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct."
HandleWorkerFunctionLoadError gets triggered to handle this exception
This causes RpcFunctionInvocationDispatcher.WorkerError to invoke DisposeAndRestartWorkerChannel
As part of this method, ShouldRestartWorkerChannel determines whether to restart the worker
we see a log "Restarting worker channel for runtime: 'python'" after this
However, the worker startup timed out at 2024-11-19 23:36:59.395 with message "Initializing worker process failed" error "System.TimeoutException : The operation has timed out."
After this point, no further attempts are made to restart the worker, and the host stays in a broken state. All Function invocations fail until the host is Function App is restarted by the customer. Perhaps when timeouts happen, our logic to restart the worker doesn't kick in?
FunctionsLogs
| where PreciseTimeStamp between (datetime(2024-11-19 20:00) .. datetime(2024-11-20))
| where Host == "pl1mdlwk000BLC"
| where RoleInstance == "pl1MediumDedicatedLinuxWebWorkerRole_IN_15024"
| project PreciseTimeStamp, Level, AppName, FunctionName, Source, EventName, HostInstanceId, Summary, Details, HostVersion
| order by PreciseTimeStamp asc
The text was updated successfully, but these errors were encountered:
In a recent CRI 568174889 the Functions Host got into a state where no workers were running but the host did not try to start/restart a worker, causing all function invocations to fail with error "Did not find any initialized language workers".
Below is a Kusto query showing the sequence of events that lead to the app getting into this broken state. Here's the relevant timeline:
The text was updated successfully, but these errors were encountered: