Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sporadic duplicate key errors in mysql queue implementation #2802

Merged
merged 4 commits into from
May 9, 2022

Conversation

nagl-temporal
Copy link
Contributor

@nagl-temporal nagl-temporal commented May 4, 2022

What changed?
This fixes a concurrency bug in the mysql queue implementation for xdc namespace changes.

Why?
Fixing flaky tests. Real-world impact should be very low.

When >2 writers try to enqueue concurrently, they queue against the range lock taken by templateGetLastMessageIDQuery. The query returns 1 row to the 2nd writer (1st in the queue of blocked writers), 2 rows to the 3rd, and so on. Before this fix, GetContext would return the first row (in database-order), which is probably not the correct (highest) one. Now it always returns the correct one.

How did you test it?
go test -v $(pwd)/common/persistence/persistence-tests/ -run TestMySQLQueuePersistence

TestNamespaceReplicationQueue and TestNamespaceReplicationDLQ used to fail with frequency proportional to their concurrency.

Potential risks
If the MAX somehow interferes with the mysql gap lock, this will break queueing. The docs suggest this won't happen (ctrl-F "For other search conditions"), but you know how docs can be. Actual impact would be errors from (Create/Update)Namespace when called concurrently.

Is hotfix candidate?
Probably not. I think this only impacts xdc and the natural callpaths (CreateNamespace, UpdateNamespace) are unlikely to hit the bug.

@nagl-temporal nagl-temporal requested a review from a team as a code owner May 4, 2022 22:29
@CLAassistant
Copy link

CLAassistant commented May 4, 2022

CLA assistant check
All committers have signed the CLA.

@nagl-temporal nagl-temporal enabled auto-merge (squash) May 9, 2022 16:16
@nagl-temporal nagl-temporal merged commit 9c9b4c9 into temporalio:master May 9, 2022
@nagl-temporal nagl-temporal deleted the mysql-queue-flake branch May 9, 2022 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants