[Bug] Backup stuck forever on v0.18.0, eating up resources #3261

ghost · 2023-09-05T16:43:23Z

I'm no expect on pocketbase or SQLite, so I have no clue what's going on here.

This is something I did not observe on previous versions but has been happening 80-90% (but not always!) of the time on 0.18.0.

Taking a backup:

gets stuck forever
takes up increasingly more and more disk space, to the point of eventually filling the entire disk
increases cpu usage to 100%

Hard resetting the machine or killing the container resets disk usage to normal.

Backups on the instance are configured to save to S3 (confirmed working previously).

There is nothing relevant in stdout/stderr.

ganigeorgiev · 2023-09-05T16:46:44Z

I'm not able to reproduce it. We also have tests that runs without error.

How big is your database?
Could you test with an earlier version to rule v0.18.0 issue?

ghost · 2023-09-05T16:52:36Z

You are right, it is also happening with the previous version (0.17.7).

DB size:

kalmard@instance-1:~$ sudo du -hsc /data/pb_data/*
32K     /data/pb_data/backups
139M    /data/pb_data/data.db
32K     /data/pb_data/data.db-shm
3.1M    /data/pb_data/data.db-wal
160M    /data/pb_data/logs.db
32K     /data/pb_data/logs.db-shm
120M    /data/pb_data/logs.db-wal
420M    total

I admit I have been recently experimenting with litestream - I suspect now that adding (and then removing) that has caused this issue. I still have no clue how to fix it unfortunately.

ganigeorgiev · 2023-09-05T16:58:32Z

I don't think this is PocketBase related and most likely it is some misconfiguration with Litestream but I'm not much familiar with it and I'm not sure how to help you debug it.

Note that when performing PocketBase backups we wrap the archive generation in a transaction, so double check whether Litestream is not holding a lock on the db or something else similarly.

In any case, if you think that this is a PocketBase issue, please provide more information about your deployment setup and ideally some minimal reproducible steps and I'll have a look.

ghost · 2023-09-05T16:59:45Z

This persists even with litestream removed. So at the very least it's not holding a lock, right?

ganigeorgiev · 2023-09-05T17:13:44Z

I've tested it both locally (Linux) and on S3 (Wasabi free trial) and it works fine for me - both the automated backups and manual initialized. My test pb_data is around ~500MB.

If you use Docker, ensure that the volume is mounted properly and accessible.

If you use S3 - double check your storage configuration and permissions.
It could be also an issue with a vendor S3 APIs compatibility (in the past several users reported issues with Cloudflare R2 - #2952, #2068).
To rule out S3 issue, you can try to temp disable it and test it with the local filesystem.

There is something happening in your setup but I'm not really sure how to help you further without info or at least some reproducible steps.

ganigeorgiev · 2023-09-05T17:17:00Z

Also I forgot, If you have custom Go/JS hooks try to temp comment it to rule out a conflicting operation.

ghost · 2023-09-05T17:24:52Z

Thank you for the responses, I'm looking into these options. But practically speaking backups were working fine and nothing has changed besides:

upgrading from 0.17.7 to 0.18.0, then downgrading for verification
setting up litestream, then disabling it for verification

For the record, I did a manual sqlite VACUUM on the DBs which brought down their size but the issue still persists.

ghost · 2023-09-05T17:39:44Z

A few more things I tried with no luck:

disabling S3 for backup storage
removing all functionality from pocketbase, running only the core app
running with --debug to see if there's any useful information

ganigeorgiev · 2023-09-05T17:43:32Z

Again, without information about your deployment setup I'm not really sure how to help you.

Where PocketBase is running and what command you are executing?
Are you able to reproduce it consistently?
Can you reproduce the issue locally?

ghost · 2023-09-05T17:48:05Z

I understand that you can't help without more infromation, at this point I'm just writing down my process for record keeping. I will try to share more if I can't resolve this otherwise.

Right now it is 100% reproducible on both version 0.17.7 and 0.18.0. I have not yet tried to reproduce it locally, but I will.

PB is running in a docker container on a GCP VM, the command to run it is ./backend serve (for 0.18.0, and the old syntax for the previous version).

ghost · 2023-09-05T17:55:57Z

I may have found the issue. I realized in pb_data there are some litestream files:

kalmard@instance-1:/data/pb_data$ find .
.
./logs.db-shm
./.autocert_cache
find: ‘./.autocert_cache’: Permission denied
./logs.db-wal
./.logs.db-litestream
./.logs.db-litestream/generation
./.logs.db-litestream/generations
./.logs.db-litestream/generations/459f183b154035ce
./.logs.db-litestream/generations/459f183b154035ce/wal
./.logs.db-litestream/generations/459f183b154035ce/wal/00000083.wal
./.logs.db-litestream/generations/459f183b154035ce/wal/00000082.wal
./data.db
./.data.db-litestream
./.data.db-litestream/generation
./.data.db-litestream/generations
./.data.db-litestream/generations/f23109187d440d8a
./.data.db-litestream/generations/f23109187d440d8a/wal
./.data.db-litestream/generations/f23109187d440d8a/wal/00000037.wal
./.data.db-litestream/generations/f23109187d440d8a/wal/00000038.wal
./backups
./backups/pb_backup_20230814172737.zip.attrs
./backups/pb_backup_20230814172737.zip
./data.db-shm
./logs.db
./data.db-wal

The litestream folders were tiny, <100k each. However, removing them fixed the backup issue (so far). I managed to take backups consistently on both tested versions now.

I'll try to find the time to reproduce the problem based on this information.
Thanks again for the help!

ghost · 2023-09-05T22:03:44Z

I tried to recreate this issue and failed. The only thing I can think of is that the litestream files were very large on disk, causing significant load during compression, which basically killed the system. Unfortunately I'm not sure about their actual size.

ghost · 2023-09-05T22:52:47Z

Actually after more investigation I managed to reproduce the issue locally. Based on this I believe there is a genuine bug in pocketbase.

A minimal Dockerfile for repro:

FROM alpine:3.18
ADD https://github.com/pocketbase/pocketbase/releases/download/v0.18.0/pocketbase_0.18.0_linux_amd64.zip /tmp/pocketbase.zip
RUN unzip /tmp/pocketbase.zip -d /usr/local/bin && rm /tmp/pocketbase.zip
ADD /tmp /pb_data
CMD ["pocketbase", "serve", "--http=0.0.0.0:8090", "--debug"]

The contents of tmp:

$ find tmp/
tmp/
tmp/.data.db-litestream
tmp/.data.db-litestream/generation
tmp/.data.db-litestream/generations
tmp/.data.db-litestream/generations/b8788b0b2b06c882
tmp/.data.db-litestream/generations/b8788b0b2b06c882/wal
tmp/.data.db-litestream/generations/b8788b0b2b06c882/wal/00000000.wal
tmp/.logs.db-litestream
tmp/.logs.db-litestream/generation
tmp/.logs.db-litestream/generations
tmp/.logs.db-litestream/generations/6940142b5547a8f1
tmp/.logs.db-litestream/generations/6940142b5547a8f1/wal
tmp/.logs.db-litestream/generations/6940142b5547a8f1/wal/00000005.wal
tmp/.logs.db-litestream/generations/6940142b5547a8f1/wal/00000006.wal

File sizes:

$ du -ah tmp/
1.0K    tmp/.data.db-litestream/generation
8.0K    tmp/.data.db-litestream/generations/b8788b0b2b06c882/wal/00000000.wal
8.0K    tmp/.data.db-litestream/generations/b8788b0b2b06c882/wal
8.0K    tmp/.data.db-litestream/generations/b8788b0b2b06c882
8.0K    tmp/.data.db-litestream/generations
9.0K    tmp/.data.db-litestream
1.0K    tmp/.logs.db-litestream/generation
200K    tmp/.logs.db-litestream/generations/6940142b5547a8f1/wal/00000005.wal
196K    tmp/.logs.db-litestream/generations/6940142b5547a8f1/wal/00000006.wal
396K    tmp/.logs.db-litestream/generations/6940142b5547a8f1/wal
396K    tmp/.logs.db-litestream/generations/6940142b5547a8f1
396K    tmp/.logs.db-litestream/generations
397K    tmp/.logs.db-litestream
406K    tmp/

I'm not sure if the exact contents of the files matter. However I'd rather not share them publicly as I'm not sure if they contain anything sensitive. Let me know how to get them to you.

When the server starts up, /pb_data is as expected:

/ # du -hsc /pb_data/
1.2M    /pb_data/

However once you create an admin user and trigger a backup, the backup keeps running and the temporary backup file in /pb_data grows indefinitely:

$ du -hsc /pb_data/.pb_temp_to_delete/pb_backup_3DPT
5.5G /pb_data/.pb_temp_to_delete/pb_backup_3DPT

ghost · 2023-09-05T23:15:28Z

After playing around with the backup code (https://github.com/pocketbase/pocketbase/blob/f3fc7f78d747c7e8b9c2da74ffe7c7bf73039910/tools/archive/create.go) the explanation is pretty simple: the zip file containing the backup is picked up during the iteration of the files to be backed up, creating a recursive backup where the system tries to back up the current backup archive into itself indefinitely.

I'm not sure why it's triggered in this specific case, I believe it may be due to a race condition depending on the size and number of files to be iterated.

It can be solved trivially by adding LocalTempDirName to the skipped path when calling archive.Create() (

pocketbase/core/base_backup.go

Line 71 in f3fc7f7

return archive.Create(app.DataDir(), tempPath, LocalBackupsDirName)

).

ganigeorgiev · 2023-09-06T03:34:44Z

Indeed it could be a race condition introduced with v0.17.3 (7a3223e).

I'll investigate it further today, but in all cases as you've noted the LocalTempDirName should be definitely excluded.

ganigeorgiev · 2023-09-06T04:32:16Z

I wasn't able to make it "stuck" simulating your directory structure, but it is certainly a bug to have the local temp dir as part of the backups so that was removed in v0.18.1. Sorry for this.

Feel free to let me know if you stumble on the same issue again and I'll attempt to investigate it further.

ganigeorgiev closed this as completed Sep 5, 2023

ganigeorgiev reopened this Sep 6, 2023

ganigeorgiev added a commit that referenced this issue Sep 6, 2023

[#3261] exclude the local temp dir from the backups

b97dc89

ganigeorgiev added a commit that referenced this issue Sep 6, 2023

[#3261] exclude the local temp dir from the backups

abfe18b

ganigeorgiev closed this as completed Sep 6, 2023

chenrui333 mentioned this issue Sep 6, 2023

pocketbase 0.18.1 Homebrew/homebrew-core#141568

Merged

abdokhaire pushed a commit to abdokhaire/postgresbase that referenced this issue Aug 7, 2024

[pocketbase#3261] exclude the local temp dir from the backups

bfdb2bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Backup stuck forever on v0.18.0, eating up resources #3261

[Bug] Backup stuck forever on v0.18.0, eating up resources #3261

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023 •

edited

Loading

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023 •

edited by ghost

Loading

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 6, 2023 •

edited

Loading

ganigeorgiev commented Sep 6, 2023

[Bug] Backup stuck forever on v0.18.0, eating up resources #3261

[Bug] Backup stuck forever on v0.18.0, eating up resources #3261

Comments

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023 • edited Loading

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 5, 2023

ghost commented Sep 5, 2023 • edited by ghost Loading

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ghost commented Sep 5, 2023

ganigeorgiev commented Sep 6, 2023 • edited Loading

ganigeorgiev commented Sep 6, 2023

ganigeorgiev commented Sep 5, 2023 •

edited

Loading

ghost commented Sep 5, 2023 •

edited by ghost

Loading

ganigeorgiev commented Sep 6, 2023 •

edited

Loading