Fix sporadic duplicate key errors in mysql queue implementation #2802
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changed?
This fixes a concurrency bug in the mysql queue implementation for xdc namespace changes.
Why?
Fixing flaky tests. Real-world impact should be very low.
When >2 writers try to enqueue concurrently, they queue against the range lock taken by
templateGetLastMessageIDQuery
. The query returns 1 row to the 2nd writer (1st in the queue of blocked writers), 2 rows to the 3rd, and so on. Before this fix,GetContext
would return the first row (in database-order), which is probably not the correct (highest) one. Now it always returns the correct one.How did you test it?
go test -v $(pwd)/common/persistence/persistence-tests/ -run TestMySQLQueuePersistence
TestNamespaceReplicationQueue
andTestNamespaceReplicationDLQ
used to fail with frequency proportional to their concurrency.Potential risks
If the MAX somehow interferes with the mysql gap lock, this will break queueing. The docs suggest this won't happen (ctrl-F "For other search conditions"), but you know how docs can be. Actual impact would be errors from (Create/Update)Namespace when called concurrently.
Is hotfix candidate?
Probably not. I think this only impacts xdc and the natural callpaths (CreateNamespace, UpdateNamespace) are unlikely to hit the bug.