Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GrpcWorkerChannel concurrency bugs #10682

Open
mathewc opened this issue Dec 10, 2024 · 0 comments
Open

GrpcWorkerChannel concurrency bugs #10682

mathewc opened this issue Dec 10, 2024 · 0 comments

Comments

@mathewc
Copy link
Member

mathewc commented Dec 10, 2024

In a recent CRI 568174889 we discovered a concurrency bug on the worker indexing path. One example stack trace (there are others):

System.InvalidOperationException : Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
    at System.ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported()
    at System.Collections.Generic.Dictionary`2.TryInsert(TKey key,TValue value,InsertionBehavior behavior)
    at System.Collections.Generic.Dictionary`2.set_Item(TKey key,TValue value)
    at Microsoft.Azure.WebJobs.Script.Grpc.GrpcWorkerChannel.LoadResponse(FunctionLoadResponse loadResponse) at /src/azure-functions-host/src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs : 821
    at Microsoft.Azure.WebJobs.Script.Grpc.GrpcWorkerChannel.PendingItem.SetResult(InboundGrpcEvent message) at /src/azure-functions-host/src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs : 1778

The code referenced in the above stack trace is here. It appears we're attempting to modify these collections concurrently without proper concurrency controls.

Here's a Kusto query showing some occurrences across apps:

FunctionsLogs
| where PreciseTimeStamp > ago(1d)
| where Level == 2
| where Summary == "Loading function failed."
| where Details startswith "System.InvalidOperationException : Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct."
| project PreciseTimeStamp, Level, AppName, FunctionName, Source, EventName, HostInstanceId, Summary, Details, HostVersion
| summarize dcount(AppName) by HostVersion, Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant