-
-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled Restic Exit Code for 'No Space Left on Device' Condition #642
Comments
Thanks for the report -- interestingly I've seen this happen a few times. But am not able to reproduce right now when testing on a small ramdisk. Does this happen for you reproducibly? |
My repository is rest-server based. I've just tested with the following steps, but I am unable to reproduce the issue. I believe I know the reason.
The filesystem is full, even though Restic is still running and reporting errors: Restic Exited: I believe that in the previous attempt, Backrest did not capture the Restic exit code because the partition containing the Bacrest SQLite database was full because it was writing on /, causing the update to fail. |
Hmm, that being the case it sounds like what's happening here is that the restore operation runs to completion but is logging failures for each item as it's unable to write them to disk (no disk space available). I think this is restic's expected behavior -- but I agree that it's a bit weird. It might actually be something worth filing a bug with restic upstream. I think something could be done to short-circuit in these scenarios and early-terminate the restore if the error is probably non-recoverable. |
One improvement to Backrest would be to intercept message_type: "error" during Restic operations and display the last message in the "Restore Detail" section. Something like this: This enhancement would provide visibility into errors, enabling manual termination of the operation. Currently, this information is only available by examining the logs. The presence of a progressing bar and increasing byte count gives a false impression of success, leading to the discovery of a failure (due to insufficient device space) only after an hour and a half, despite the actual failure occurring after five minutes. |
I like the idea that backrest could better surface item errors e.g. on a restore operation (similar to how it does for backups today). I took a small step in this direction in 1.7.1 with 82f05d8 which better focuses log output for backup / restore tasks to only include actionable items e.g. errors, unexpected plain text, and the summary event at the end. This means that backrest does a much better job of capturing the log of capturing long streams of errors during a backup -- but it's still not presented very front and center. I think something analogous to what backup tasks show (re: item errors) would make sense for the restore task as well to show specific items that didn't restore successfully. |
Describe the bug
Related to Issue #641
Restic exited after several attempts due to a 'no space left on device' condition. However, Backrest failed to detect this termination, and the restore operation remained in an 'in progress' status until a Backrest restart.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Backrest must detect that Restic has exited and report a failure instead of remaining in an 'in progress' status.
Platform Info
The text was updated successfully, but these errors were encountered: