Backfill mutex not unlocked when context is canceled in CleanupBackfills
#1576
Labels
kind/bug
Something isn't working
CleanupBackfills
#1576
What happened:
A backfill stopped being acknowledged since it was no longer needed. After a while it expired and was scheduled for cleanup. The
CleanupBackfills
routine was interrupted through the context being cancelled. This led to a mutex lock on the backfill not being unlocked, until thebackfillLockTimeout
was reached.Within this period the game server needed backfilling again started trying to re-acknowledge the backfill, but ran into continuous errors of
rpc error: code = Unknown desc = redsync: failed to acquire lock
until the lock timeout was reached and we finally got a response that it had expired. Only then could it react to that state and trigger creation of a new backfill. This is in part an issue with our current implementation, which is quite early stage, and we don't implement deleting of backfills right now and just let them expire when they're not needed.However I do believe there's a bug with the mutex handling in
DeleteBackfillsCompletely
where the defer func to release the lock is using the same context that is provided, which in the case of a cancellation will be canceled, so I think it will never properly release the lock in that case: https://github.com/googleforgames/open-match/blob/main/internal/statestore/backfill.go#L245It might be better to give it
context.Background()
or a new context with a timeout.Logs:
What you expected to happen:
Cancelling the context should not leave mutexes in a locked state.
How to reproduce it (as minimally and precisely as possible):
Not quite sure what caused the context cancellation in the first place, looks like it happens when all the callers (director FetchMatches calls) are done.
Anything else we need to know?:
Output of
kubectl version
:Client Version: v1.27.1
Kustomize Version: v5.0.1
Server Version: v1.24.11-gke.1000
Cloud Provider/Platform (AKS, GKE, Minikube etc.):
GKE
Open Match Release Version:
1.5.0
Install Method(yaml/helm):
Helm to generate yaml applied with kubectl.
The text was updated successfully, but these errors were encountered: