Prevent a possible race condition in blocking requests #3656

dnephin · 2022-11-16T16:50:50Z

The new blocking requests (aka requests used for long polling) have (I believe) a possible race condition between retrieving the next ID from the seq_update_index sequence, and committing the transaction that uses that ID. Note I haven't actually reproduced this race, so it's possible that it's unlikely in practice. We should try to reproduce before attempting to fix the problem.

Problem

There is no synchronization between nextval('seq_update_index') and committing of the transaction that uses the value. This means that if two CreateGrant or DeleteGrant requests are made at the same time, it's possible for them to be ordered like this:

CreateGrant1 calls nextval('seq_update_index') to get value 1000
CreateGrant2 calls nextval('seq_update_index') to get value 1001
CreateGrant2 transaction commits, which sends a notify, and the blocking query reads value 1001
CreateGrant1 transaction commits, which sends a notify, and the blocking query reads value 1001 (which means it has missed the update with id 1000)

This is only a problem if both requests (CreateGrant1 and CreateGrant2) are for the same organization and same destination (or in the future same org same user may also be a problem). If either the org or destination are different, then the race doesn't break anything.

Given that the query has to receive the notification and then repeat a query, it seems like this race is very unlikely, because even if it happens, as long as the second commit completes before the blocking request does its query, the problem will be hidden. None the less, I think it's important we have some way to fix it.

Proposed Solution

One way I've found to fix this is by using locks. https://www.postgresql.org/docs/current/explicit-locking.html says this:

PostgreSQL provides various lock modes to control concurrent access to data in tables. These modes can be used for application-controlled locking in situations where MVCC does not give the desired behavior.

Which sounds like exactly the scenario we are in.

I'm not sure yet exactly which locking mode we need, but I think what we want is a lock that synchronizes the nextval('seq_update_index') with the commit of the transaction. Since locks are automatically released on commit, all we need is to acquire the right lock before doing a CreateGrant or DeleteGrant.

The lock does not need to be global, it should be per-organization, so one org will never block other orgs. We could get even more fine grained locks by destination and user, but I think it's unlikely we need that complexity.

I don't think we want to lock the grants tables, and we can't lock individual rows in the grants table, because inserts will not be on an existing row.

One idea is to either use the organizations table (since the lock is scoped to an org), or to create a new table used exclusively for this purpose. The table would have at least one row per org (but if we end up with different updateIndex sequences maybe we add more).

The solution would be for every CreateGrant and DeleteGrant to do this:

-- this should get a per-organization row lock
SELECT FOR UPDATE * FROM organization_locks WHERE organization = ?

-- Create or Delete grant
INSERT ... nextval('seq_update_index')

-- release the lock
COMMIT

I believe that will prevent any races, and also ensure that one org can never block another.

If an org starts running into problems, we can direct them to the new bulk grants edit endpoint. That endpoint allows multiple CREATE/DELETE in the same transaction, which allows an org to get a lot more throughput without lock contention.

The text was updated successfully, but these errors were encountered:

dnephin added area/api Issue or PR related to the Infra API. kind/improvement A report of a quality problem, or a change that addresses a quality problem. status/never-stale Indicates to actions/stale that the issue or PR should never be marked stale. labels Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent a possible race condition in blocking requests #3656

Prevent a possible race condition in blocking requests #3656

dnephin commented Nov 16, 2022 •

edited

Loading

Prevent a possible race condition in blocking requests #3656

Prevent a possible race condition in blocking requests #3656

Comments

dnephin commented Nov 16, 2022 • edited Loading

Problem

Proposed Solution

dnephin commented Nov 16, 2022 •

edited

Loading