You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've observed a silent data loss problem if (auto)vacuum, swapping tables by pg_repack, and INSERT are executed concurrently. The root cause is that while pg_repack is swapping the target table and its temp table, these two tables have different OID but use the same underlying file (i.e., use the same relfilenode), and two vacuum processes can concurrently process the same underlying file. I've confirmed this problem with PostgreSQL 15.4 and pg_repack 1.4.8. I think it impacts other PostgreSQL versions and pg_repack versions as well.
The reproducible step is:
Preparation
createtabletest (id intprimary key, a int);
insert into test values (1, 1), (2,2), (3,3);
These tuples are inserted at TID (0, 1), (0, 2), and (0, 3) on the `test table, respectively.
Steps
Run pg_repack with debugger for the table A, and stop pg_repack after copying table data to its temp table by creating a breakpoint, for example at pg_repack.c:1537.
Update a tuple of test table A, say UPDATE test SET id = 999, a = 999 WHERE id = 1. This change is logged to its log table.
Let pg_repack resume, its applies the log. And stop pg_repack after all logs are applied, for example at pg_repack.c:L1598.
- In the temp table, the tuple at (0, 1) becomes garbage now.
- Note that conn2 already holds an AccessExclusiveLock on the test table A, but nobody has a lock on the temp table.
Run vacuum on the temp table and stop it before calling to lazy_vacuum_heap().
- vacuum scans the temp table and collects TID (0,1) as dead tuple, but does not actually remove the itemid there yet.
Let pg_repack resume, it swaps the test table and the temp table. And stop pg_repack right after swapping them, for example at pg_repack.c:1617.
- Now both the table test and the temp table use the same relfilenode. And conn2 already released AccessExclusiveLock on the test table.
run another vacuum on the test table, and it removes a tuple at TID (0,1) (since the first vacuum process doesn't remove it yet).
Insert a new tuple into the test table, say INSERT INTO test VALUES (4,4).
- This new tuple is inserted at TID (0,1).
Let the first vacuum resume, it ends up removing the tuple at TID (0,1) we inserted at step 7.
In the above producer I used vacuum command, but what I actually observed was that autovacuum processes run on the test table and its temp table because they got older than autovacuum_freeze_max_age.
A simple solution would be to acquire AccessExclusiveLock on both the target table and its temp table before swapping them.
The text was updated successfully, but these errors were encountered:
Previously, while swapping the target table and its temp table, we
used to acquire an AccessExclusiveLock on only the target table but
not on its temp table. So the following scenario could happen:
1. Run (auto) vacuum on the temp table.
2. Swap the target table's relfile node and the temp table's
relfilenode, and commit.
3. Run (auto) vacuum on the target table.
1 and 3 run concurrently on different tables but these tables use the
same relfilenode. In the reported case, since concurrent
changes (INSERT/UPDATE/DELETE) can also run concurrently, the second
vacuum ended up removing newly added tuples.
To fix the issue, this commit changes the pg_repack command so that it
acquires an AcessExclusiveLock also on the temp table before swapping
these tables.
Issue: reorg#399
Hi,
I've observed a silent data loss problem if (auto)vacuum, swapping tables by pg_repack, and INSERT are executed concurrently. The root cause is that while pg_repack is swapping the target table and its temp table, these two tables have different OID but use the same underlying file (i.e., use the same relfilenode), and two vacuum processes can concurrently process the same underlying file. I've confirmed this problem with PostgreSQL 15.4 and pg_repack 1.4.8. I think it impacts other PostgreSQL versions and pg_repack versions as well.
The reproducible step is:
Preparation
These tuples are inserted at TID (0, 1), (0, 2), and (0, 3) on the `test table, respectively.
Steps
test
table A, sayUPDATE test SET id = 999, a = 999 WHERE id = 1
. This change is logged to its log table.- In the temp table, the tuple at (0, 1) becomes garbage now.
- Note that
conn2
already holds an AccessExclusiveLock on thetest
table A, but nobody has a lock on the temp table.lazy_vacuum_heap()
.- vacuum scans the temp table and collects TID (0,1) as dead tuple, but does not actually remove the itemid there yet.
test
table and the temp table. And stop pg_repack right after swapping them, for example at pg_repack.c:1617.- Now both the table test and the temp table use the same relfilenode. And
conn2
already released AccessExclusiveLock on thetest
table.test
table, and it removes a tuple at TID (0,1) (since the first vacuum process doesn't remove it yet).test
table, sayINSERT INTO test VALUES (4,4)
.- This new tuple is inserted at TID (0,1).
In the above producer I used vacuum command, but what I actually observed was that autovacuum processes run on the
test
table and its temp table because they got older thanautovacuum_freeze_max_age
.A simple solution would be to acquire AccessExclusiveLock on both the target table and its temp table before swapping them.
The text was updated successfully, but these errors were encountered: