Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled Restic Exit Code for 'No Space Left on Device' Condition #642

Open
ziotibia81 opened this issue Jan 17, 2025 · 5 comments
Open
Labels
bug Something isn't working

Comments

@ziotibia81
Copy link

Describe the bug
Related to Issue #641

Restic exited after several attempts due to a 'no space left on device' condition. However, Backrest failed to detect this termination, and the restore operation remained in an 'in progress' status until a Backrest restart.

To Reproduce
Steps to reproduce the behavior:

  1. Select a Snapshot
  2. Browse into a directory of this shapshot
  3. Select "Restore to path"
  4. Insert a path that has insufficient space to contain the largest file in the snapshot. (e.g. for a 1GB file, select a path with 500MB of free space.)

Expected behavior
Backrest must detect that Restic has exited and report a failure instead of remaining in an 'in progress' status.

Platform Info

  • Ubuntu 24.04.1 LTS x64
  • Backrest 1.7.0
  • Restic 0.17.3
  • Restore path is in BTRFS filesystem
@ziotibia81 ziotibia81 added the bug Something isn't working label Jan 17, 2025
@garethgeorge
Copy link
Owner

garethgeorge commented Jan 23, 2025

Thanks for the report -- interestingly I've seen this happen a few times. But am not able to reproduce right now when testing on a small ramdisk.

Does this happen for you reproducibly?

@ziotibia81
Copy link
Author

ziotibia81 commented Jan 23, 2025

My repository is rest-server based.
Repository contains a Virtualbox VM directory with two vdi files (120G and 543G).

I've just tested with the following steps, but I am unable to reproduce the issue. I believe I know the reason.

  • created a lvm volume of 20G size
  • formatted in ext4
  • mountend it
  • started a restore job on this path
  • Restic processe all files, reporting many 'no space left on device' errors ([restic] {"message_type":"error","error":{"message":"write ...)
  • During this time, Backrest increased the progress bar and the 'Bytes Done' counter without warnings (It might be useful if Backrest also caught error messages.)
  • After 1h and 22m Bacrest shows "Restore error"
administrator@bknastest:~$ sudo lvdisplay /dev/vg0/backrestbug
  --- Logical volume ---
  LV Path                /dev/vg0/backrestbug
  LV Name                backrestbug
  VG Name                vg0
  LV UUID                1ucH6J-PLfp-lP5y-1gCo-uH6x-P1KT-cPRCaZ
  LV Write Access        read/write
  LV Creation host, time bknastest, 2025-01-23 11:00:37 +0100
  LV Status              available
  # open                 1
  LV Size                20,00 GiB
  Current LE             5120
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:5

administrator@bknastest:~$ mount | grep tbug
/dev/mapper/vg0-backrestbug on /media/backrestbug type ext4 (rw,relatime)

administrator@bknastest:~$ df -h /media/backrestbug
File system                  Dim. Usati Dispon. Uso% Montato su
/dev/mapper/vg0-backrestbug   20G   20G       0 100% /media/backrestbug

The filesystem is full, even though Restic is still running and reporting errors:

Image

Restic Exited:

Image

I believe that in the previous attempt, Backrest did not capture the Restic exit code because the partition containing the Bacrest SQLite database was full because it was writing on /, causing the update to fail.

@garethgeorge
Copy link
Owner

Hmm, that being the case it sounds like what's happening here is that the restore operation runs to completion but is logging failures for each item as it's unable to write them to disk (no disk space available).

I think this is restic's expected behavior -- but I agree that it's a bit weird. It might actually be something worth filing a bug with restic upstream. I think something could be done to short-circuit in these scenarios and early-terminate the restore if the error is probably non-recoverable.

@ziotibia81
Copy link
Author

One improvement to Backrest would be to intercept message_type: "error" during Restic operations and display the last message in the "Restore Detail" section.

Something like this:

Image

This enhancement would provide visibility into errors, enabling manual termination of the operation. Currently, this information is only available by examining the logs. The presence of a progressing bar and increasing byte count gives a false impression of success, leading to the discovery of a failure (due to insufficient device space) only after an hour and a half, despite the actual failure occurring after five minutes.

@garethgeorge
Copy link
Owner

I like the idea that backrest could better surface item errors e.g. on a restore operation (similar to how it does for backups today).

I took a small step in this direction in 1.7.1 with 82f05d8 which better focuses log output for backup / restore tasks to only include actionable items e.g. errors, unexpected plain text, and the summary event at the end. This means that backrest does a much better job of capturing the log of capturing long streams of errors during a backup -- but it's still not presented very front and center.

I think something analogous to what backup tasks show (re: item errors) would make sense for the restore task as well to show specific items that didn't restore successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants