Unhandled Restic Exit Code for 'No Space Left on Device' Condition #642

ziotibia81 · 2025-01-17T11:21:43Z

Describe the bug
Related to Issue #641

Restic exited after several attempts due to a 'no space left on device' condition. However, Backrest failed to detect this termination, and the restore operation remained in an 'in progress' status until a Backrest restart.

To Reproduce
Steps to reproduce the behavior:

Select a Snapshot
Browse into a directory of this shapshot
Select "Restore to path"
Insert a path that has insufficient space to contain the largest file in the snapshot. (e.g. for a 1GB file, select a path with 500MB of free space.)

Expected behavior
Backrest must detect that Restic has exited and report a failure instead of remaining in an 'in progress' status.

Platform Info

Ubuntu 24.04.1 LTS x64
Backrest 1.7.0
Restic 0.17.3
Restore path is in BTRFS filesystem

garethgeorge · 2025-01-23T02:50:45Z

Thanks for the report -- interestingly I've seen this happen a few times. But am not able to reproduce right now when testing on a small ramdisk.

Does this happen for you reproducibly?

ziotibia81 · 2025-01-23T11:38:28Z

My repository is rest-server based.
Repository contains a Virtualbox VM directory with two vdi files (120G and 543G).

I've just tested with the following steps, but I am unable to reproduce the issue. I believe I know the reason.

created a lvm volume of 20G size
formatted in ext4
mountend it
started a restore job on this path
Restic processe all files, reporting many 'no space left on device' errors ([restic] {"message_type":"error","error":{"message":"write ...)
During this time, Backrest increased the progress bar and the 'Bytes Done' counter without warnings (It might be useful if Backrest also caught error messages.)
After 1h and 22m Bacrest shows "Restore error"

administrator@bknastest:~$ sudo lvdisplay /dev/vg0/backrestbug
  --- Logical volume ---
  LV Path                /dev/vg0/backrestbug
  LV Name                backrestbug
  VG Name                vg0
  LV UUID                1ucH6J-PLfp-lP5y-1gCo-uH6x-P1KT-cPRCaZ
  LV Write Access        read/write
  LV Creation host, time bknastest, 2025-01-23 11:00:37 +0100
  LV Status              available
  # open                 1
  LV Size                20,00 GiB
  Current LE             5120
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:5

administrator@bknastest:~$ mount | grep tbug
/dev/mapper/vg0-backrestbug on /media/backrestbug type ext4 (rw,relatime)

administrator@bknastest:~$ df -h /media/backrestbug
File system                  Dim. Usati Dispon. Uso% Montato su
/dev/mapper/vg0-backrestbug   20G   20G       0 100% /media/backrestbug

The filesystem is full, even though Restic is still running and reporting errors:

Restic Exited:

I believe that in the previous attempt, Backrest did not capture the Restic exit code because the partition containing the Bacrest SQLite database was full because it was writing on /, causing the update to fail.

garethgeorge · 2025-01-24T04:33:39Z

Hmm, that being the case it sounds like what's happening here is that the restore operation runs to completion but is logging failures for each item as it's unable to write them to disk (no disk space available).

I think this is restic's expected behavior -- but I agree that it's a bit weird. It might actually be something worth filing a bug with restic upstream. I think something could be done to short-circuit in these scenarios and early-terminate the restore if the error is probably non-recoverable.

ziotibia81 · 2025-01-24T10:41:02Z

One improvement to Backrest would be to intercept message_type: "error" during Restic operations and display the last message in the "Restore Detail" section.

Something like this:

This enhancement would provide visibility into errors, enabling manual termination of the operation. Currently, this information is only available by examining the logs. The presence of a progressing bar and increasing byte count gives a false impression of success, leading to the discovery of a failure (due to insufficient device space) only after an hour and a half, despite the actual failure occurring after five minutes.

garethgeorge · 2025-01-25T00:50:42Z

I like the idea that backrest could better surface item errors e.g. on a restore operation (similar to how it does for backups today).

I took a small step in this direction in 1.7.1 with 82f05d8 which better focuses log output for backup / restore tasks to only include actionable items e.g. errors, unexpected plain text, and the summary event at the end. This means that backrest does a much better job of capturing the log of capturing long streams of errors during a backup -- but it's still not presented very front and center.

I think something analogous to what backup tasks show (re: item errors) would make sense for the restore task as well to show specific items that didn't restore successfully.

ziotibia81 added the bug Something isn't working label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unhandled Restic Exit Code for 'No Space Left on Device' Condition #642

Unhandled Restic Exit Code for 'No Space Left on Device' Condition #642

ziotibia81 commented Jan 17, 2025

garethgeorge commented Jan 23, 2025 •

edited

Loading

ziotibia81 commented Jan 23, 2025 •

edited

Loading

garethgeorge commented Jan 24, 2025

ziotibia81 commented Jan 24, 2025

garethgeorge commented Jan 25, 2025

Unhandled Restic Exit Code for 'No Space Left on Device' Condition #642

Unhandled Restic Exit Code for 'No Space Left on Device' Condition #642

Comments

ziotibia81 commented Jan 17, 2025

garethgeorge commented Jan 23, 2025 • edited Loading

ziotibia81 commented Jan 23, 2025 • edited Loading

garethgeorge commented Jan 24, 2025

ziotibia81 commented Jan 24, 2025

garethgeorge commented Jan 25, 2025

garethgeorge commented Jan 23, 2025 •

edited

Loading

ziotibia81 commented Jan 23, 2025 •

edited

Loading