Silent data loss due to two vacuum running logically different tables using the same relfilenode. #399

MasahikoSawada · 2024-05-18T12:17:57Z

Hi,

I've observed a silent data loss problem if (auto)vacuum, swapping tables by pg_repack, and INSERT are executed concurrently. The root cause is that while pg_repack is swapping the target table and its temp table, these two tables have different OID but use the same underlying file (i.e., use the same relfilenode), and two vacuum processes can concurrently process the same underlying file. I've confirmed this problem with PostgreSQL 15.4 and pg_repack 1.4.8. I think it impacts other PostgreSQL versions and pg_repack versions as well.

The reproducible step is:

Preparation

create table test (id int primary key, a int);
insert into test values (1, 1), (2,2), (3,3);

These tuples are inserted at TID (0, 1), (0, 2), and (0, 3) on the `test table, respectively.

Steps

Run pg_repack with debugger for the table A, and stop pg_repack after copying table data to its temp table by creating a breakpoint, for example at pg_repack.c:1537.
Update a tuple of test table A, say UPDATE test SET id = 999, a = 999 WHERE id = 1. This change is logged to its log table.
Let pg_repack resume, its applies the log. And stop pg_repack after all logs are applied, for example at pg_repack.c:L1598.
- In the temp table, the tuple at (0, 1) becomes garbage now.
- Note that conn2 already holds an AccessExclusiveLock on the test table A, but nobody has a lock on the temp table.
Run vacuum on the temp table and stop it before calling to lazy_vacuum_heap().
- vacuum scans the temp table and collects TID (0,1) as dead tuple, but does not actually remove the itemid there yet.
Let pg_repack resume, it swaps the test table and the temp table. And stop pg_repack right after swapping them, for example at pg_repack.c:1617.
- Now both the table test and the temp table use the same relfilenode. And conn2 already released AccessExclusiveLock on the test table.
run another vacuum on the test table, and it removes a tuple at TID (0,1) (since the first vacuum process doesn't remove it yet).
Insert a new tuple into the test table, say INSERT INTO test VALUES (4,4).
- This new tuple is inserted at TID (0,1).
Let the first vacuum resume, it ends up removing the tuple at TID (0,1) we inserted at step 7.

In the above producer I used vacuum command, but what I actually observed was that autovacuum processes run on the test table and its temp table because they got older than autovacuum_freeze_max_age.

A simple solution would be to acquire AccessExclusiveLock on both the target table and its temp table before swapping them.

The text was updated successfully, but these errors were encountered:

Previously, while swapping the target table and its temp table, we used to acquire an AccessExclusiveLock on only the target table but not on its temp table. So the following scenario could happen: 1. Run (auto) vacuum on the temp table. 2. Swap the target table's relfile node and the temp table's relfilenode, and commit. 3. Run (auto) vacuum on the target table. 1 and 3 run concurrently on different tables but these tables use the same relfilenode. In the reported case, since concurrent changes (INSERT/UPDATE/DELETE) can also run concurrently, the second vacuum ended up removing newly added tuples. To fix the issue, this commit changes the pg_repack command so that it acquires an AcessExclusiveLock also on the temp table before swapping these tables. Issue: reorg#399

MasahikoSawada · 2024-05-24T01:41:38Z

It's resolved by commit f5aa388.

MasahikoSawada mentioned this issue May 20, 2024

Fix possible two vacuums concurrently processing the same relfilenode. #400

Merged

MasahikoSawada added the bug label May 20, 2024

MasahikoSawada closed this as completed May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silent data loss due to two vacuum running logically different tables using the same relfilenode. #399

Silent data loss due to two vacuum running logically different tables using the same relfilenode. #399

MasahikoSawada commented May 18, 2024

MasahikoSawada commented May 24, 2024

Silent data loss due to two vacuum running logically different tables using the same relfilenode. #399

Silent data loss due to two vacuum running logically different tables using the same relfilenode. #399

Comments

MasahikoSawada commented May 18, 2024

Preparation

Steps

MasahikoSawada commented May 24, 2024