Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent data loss due to two vacuum running logically different tables using the same relfilenode. #399

Closed
MasahikoSawada opened this issue May 18, 2024 · 1 comment
Labels

Comments

@MasahikoSawada
Copy link
Contributor

Hi,

I've observed a silent data loss problem if (auto)vacuum, swapping tables by pg_repack, and INSERT are executed concurrently. The root cause is that while pg_repack is swapping the target table and its temp table, these two tables have different OID but use the same underlying file (i.e., use the same relfilenode), and two vacuum processes can concurrently process the same underlying file. I've confirmed this problem with PostgreSQL 15.4 and pg_repack 1.4.8. I think it impacts other PostgreSQL versions and pg_repack versions as well.

The reproducible step is:

Preparation

create table test (id int primary key, a int);
insert into test values (1, 1), (2,2), (3,3);

These tuples are inserted at TID (0, 1), (0, 2), and (0, 3) on the `test table, respectively.

Steps

  1. Run pg_repack with debugger for the table A, and stop pg_repack after copying table data to its temp table by creating a breakpoint, for example at  pg_repack.c:1537.
  2. Update a tuple of test table A, say UPDATE test SET id = 999, a = 999 WHERE id = 1. This change is logged to its log table.
  3. Let pg_repack resume, its applies the log. And stop pg_repack after all logs are applied, for example at pg_repack.c:L1598.
        -  In the temp table, the tuple at (0, 1) becomes garbage now.
        - Note that conn2 already holds an AccessExclusiveLock on the test table A, but nobody has a lock on the temp table.
  4. Run vacuum on the temp table and stop it before calling to lazy_vacuum_heap().
        - vacuum scans the temp table and collects TID (0,1) as dead tuple, but does not actually remove the itemid there yet.
  5. Let pg_repack resume, it swaps the test table and the temp table. And stop pg_repack right after swapping them, for example at pg_repack.c:1617.
        - Now both the table test and the temp table use the same relfilenode. And conn2 already released AccessExclusiveLock on the test table.
  6. run another vacuum on the test table, and it removes a tuple at TID (0,1) (since the first vacuum process doesn't remove it yet).
  7. Insert a new tuple into the test table, say INSERT INTO test VALUES (4,4).
        - This new tuple is inserted at TID (0,1).
  8. Let the first vacuum resume, it ends up removing the tuple at TID (0,1) we inserted at step 7.

In the above producer I used vacuum command, but what I actually observed was that autovacuum processes run on the test table and its temp table because they got older than autovacuum_freeze_max_age.

A simple solution would be to acquire AccessExclusiveLock on both the target table and its temp table before swapping them.

MasahikoSawada added a commit to MasahikoSawada/pg_repack that referenced this issue May 20, 2024
Previously, while swapping the target table and its temp table, we
used to acquire an AccessExclusiveLock on only the target table but
not on its temp table. So the following scenario could happen:

1. Run (auto) vacuum on the temp table.
2. Swap the target table's relfile node and the temp table's
relfilenode, and commit.
3. Run (auto) vacuum on the target table.

1 and 3 run concurrently on different tables but these tables use the
same relfilenode. In the reported case, since concurrent
changes (INSERT/UPDATE/DELETE) can also run concurrently, the second
vacuum ended up removing newly added tuples.

To fix the issue, this commit changes the pg_repack command so that it
acquires an AcessExclusiveLock also on the temp table before swapping
these tables.

Issue: reorg#399
@MasahikoSawada
Copy link
Contributor Author

It's resolved by commit f5aa388.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant