-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Backup stuck forever on v0.18.0, eating up resources #3261
Comments
I'm not able to reproduce it. We also have tests that runs without error. How big is your database? |
You are right, it is also happening with the previous version (0.17.7). DB size:
I admit I have been recently experimenting with litestream - I suspect now that adding (and then removing) that has caused this issue. I still have no clue how to fix it unfortunately. |
I don't think this is PocketBase related and most likely it is some misconfiguration with Litestream but I'm not much familiar with it and I'm not sure how to help you debug it. Note that when performing PocketBase backups we wrap the archive generation in a transaction, so double check whether Litestream is not holding a lock on the db or something else similarly. In any case, if you think that this is a PocketBase issue, please provide more information about your deployment setup and ideally some minimal reproducible steps and I'll have a look. |
This persists even with litestream removed. So at the very least it's not holding a lock, right? |
I've tested it both locally (Linux) and on S3 (Wasabi free trial) and it works fine for me - both the automated backups and manual initialized. My test If you use Docker, ensure that the volume is mounted properly and accessible. If you use S3 - double check your storage configuration and permissions. There is something happening in your setup but I'm not really sure how to help you further without info or at least some reproducible steps. |
Also I forgot, If you have custom Go/JS hooks try to temp comment it to rule out a conflicting operation. |
Thank you for the responses, I'm looking into these options. But practically speaking backups were working fine and nothing has changed besides:
For the record, I did a manual sqlite VACUUM on the DBs which brought down their size but the issue still persists. |
A few more things I tried with no luck:
|
Again, without information about your deployment setup I'm not really sure how to help you. Where PocketBase is running and what command you are executing? |
I understand that you can't help without more infromation, at this point I'm just writing down my process for record keeping. I will try to share more if I can't resolve this otherwise. Right now it is 100% reproducible on both version 0.17.7 and 0.18.0. I have not yet tried to reproduce it locally, but I will. PB is running in a docker container on a GCP VM, the command to run it is ./backend serve (for 0.18.0, and the old syntax for the previous version). |
I may have found the issue. I realized in
The litestream folders were tiny, <100k each. However, removing them fixed the backup issue (so far). I managed to take backups consistently on both tested versions now. I'll try to find the time to reproduce the problem based on this information. |
I tried to recreate this issue and failed. The only thing I can think of is that the litestream files were very large on disk, causing significant load during compression, which basically killed the system. Unfortunately I'm not sure about their actual size. |
Actually after more investigation I managed to reproduce the issue locally. Based on this I believe there is a genuine bug in pocketbase. A minimal Dockerfile for repro:
The contents of tmp:
File sizes:
I'm not sure if the exact contents of the files matter. However I'd rather not share them publicly as I'm not sure if they contain anything sensitive. Let me know how to get them to you. When the server starts up, /pb_data is as expected:
However once you create an admin user and trigger a backup, the backup keeps running and the temporary backup file in /pb_data grows indefinitely:
|
After playing around with the backup code (https://github.com/pocketbase/pocketbase/blob/f3fc7f78d747c7e8b9c2da74ffe7c7bf73039910/tools/archive/create.go) the explanation is pretty simple: the zip file containing the backup is picked up during the iteration of the files to be backed up, creating a recursive backup where the system tries to back up the current backup archive into itself indefinitely. I'm not sure why it's triggered in this specific case, I believe it may be due to a race condition depending on the size and number of files to be iterated. It can be solved trivially by adding pocketbase/core/base_backup.go Line 71 in f3fc7f7
|
Indeed it could be a race condition introduced with v0.17.3 (7a3223e). I'll investigate it further today, but in all cases as you've noted the |
I wasn't able to make it "stuck" simulating your directory structure, but it is certainly a bug to have the local temp dir as part of the backups so that was removed in v0.18.1. Sorry for this. Feel free to let me know if you stumble on the same issue again and I'll attempt to investigate it further. |
I'm no expect on pocketbase or SQLite, so I have no clue what's going on here.
This is something I did not observe on previous versions but has been happening 80-90% (but not always!) of the time on 0.18.0.
Taking a backup:
Hard resetting the machine or killing the container resets disk usage to normal.
Backups on the instance are configured to save to S3 (confirmed working previously).
There is nothing relevant in stdout/stderr.
The text was updated successfully, but these errors were encountered: