Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic handler, that saves unsaved work to temporary files #290

Open
cessen opened this issue Jun 17, 2021 · 25 comments
Open

Panic handler, that saves unsaved work to temporary files #290

cessen opened this issue Jun 17, 2021 · 25 comments
Labels
A-core Area: Helix core improvements C-enhancement Category: Improvements

Comments

@cessen
Copy link
Contributor

cessen commented Jun 17, 2021

It would be good to have a panic handler that saves the user's work before crashing, so that when a panicking bug is encountered the user doesn't lose all their work.

It should probably save the work to temporary files, so that if the user didn't want to save they don't lose the on-disk data either. Maybe append ".crash-01/02/03/..." or something to the filename.

@kirawi
Copy link
Member

kirawi commented Jun 17, 2021

We could possibly dump the Document's ChangeSet directly to the temporary file since it's unlikely that users will make radically large changes that would negate the space benefit of simply dumping the ChangeSet.

@archseer
Copy link
Member

I think persistent undo would partly solve this as well

@cessen
Copy link
Contributor Author

cessen commented Jun 18, 2021

@kirawi:
I don't know if optimizing for space/speed/etc. is the right thing to do in this case. This is an exceptional case (a crash), and I think reliability is most important.

So I would still advocate for simply dumping all unsaved buffers to disk (probably in utf8, regardless of what the original file was). Basically, just get the user's unsaved work to the disk in the dumbest, hardest-to-screw-up way possible. And in a format they can easily open with anything, so that if something odd is going on (Helix did crash, after all) they aren't beholden to Helix for conveniently accessing their data again.

In other words, I think this is a feature that should do absolutely nothing fancy. We can save the fancy stuff for when Helix is functioning as expected (which is ideally 100% of the time).

@cessen
Copy link
Contributor Author

cessen commented Jun 18, 2021

I think persistent undo would partly solve this as well

Yeah, that's true to an extent. But unless we're really careful about how we implement those kinds of features, things can still potentially get into a weird state (e.g. maybe the on-disk undo history and on-disk file are out of sync, and it's hard to figure out where to start in the undo history to correctly recover the lost data).

This feature is basically like wearing your seat belt: you don't expect to crash, but just in case you do, you'd like to minimize irrecoverable damage. And I think it should be pretty simple to implement, at least to the extent that we're willing to trust Ropey to always stay in a valid state.

@pickfire
Copy link
Contributor

Maybe swap file will do? For every change we keep a swap file in /tmp then if it panics we have a panic handler to report the swap file within /tmp.

@kirawi
Copy link
Member

kirawi commented Jun 18, 2021

@cessen How about file compression then? If I was editing a big file for whatever reason and Helix crashed, I wouldn't want it to write it completely down. File compression and decompression on text files is exceptionally fast and well optimized. Maybe https://github.com/BurntSushi/rust-snappy

@cessen
Copy link
Contributor Author

cessen commented Jun 18, 2021

Maybe swap file will do?

Yeah, we could do that. Basically like auto-save "light". Then on crash, the swap file will always still be there. That's probably more robust than the approach I've been suggesting.

How about file compression then?

The key point I'm trying to make is that this code will only run in exceptional circumstances, and ideally it will never run at all. It's also code that only runs when something has gone wrong (potentially seriously wrong).

All of these factors mean that it should be the dumbest, hardest-to-screw-up thing possible. Every additional layer of anything that we add on top of it is another code path that may have its own bugs. And for something that is supposed to be the safeguard in case of crashes, minimizing the potential for bugs trumps even major efficiency gains, I think.

Usually I'm all for making things faster, more efficient, etc. But this is one of the few cases where I think that's basically a non-consideration.

@cessen
Copy link
Contributor Author

cessen commented Jun 19, 2021

Over the last day or so I used cargo-fuzz to fuzz the mutation methods on Ropey's Rope type. I let it run for ~14 hours on my 24-core workstation, for a total of ~1 billion iterations.

Each iteration includes a check that panics if the Rope doesn't pass any of Ropey's own validation checks. So the fuzzing is effectively searching not just for unexpected panics/memory-safety-violations/etc., but also for anything that can put the Rope into an invalid state by Ropey's own definition.

It found a single issue within the first 10 minutes or so (a benign validation-check violation), which I've fixed. But other than that, it's found zero issues. That's not proof of perfection, of course. But it does indicate that Ropey's Rope type is at least very difficult to get into an invalid state. So I think we can reasonably depend on it, if we want to go the route of dumping from memory.

@kirawi kirawi added A-core Area: Helix core improvements C-enhancement Category: Improvements labels Aug 19, 2021
@kirawi
Copy link
Member

kirawi commented Sep 26, 2021

Should we prefer #401 to this?

@antoyo
Copy link
Contributor

antoyo commented Mar 3, 2022

Should we prefer #401 to this?

I personally see those as two different features: for instance, I'd probably want to disable persistent undo as this is not a feature I want yet I'd like to have swap files in case of problems.

@sqwxl
Copy link

sqwxl commented Oct 11, 2022

#4215 author here. How difficult would you estimate this fix would be to implement? I'm a more or less advanced beginner when it comes to rust and i'd be interested in tackling this issue if it isn't too far out of my league.

@Philipp-M
Copy link
Contributor

#4215 author here. How difficult would you estimate this fix would be to implement? I'm a more or less advanced beginner when it comes to rust and i'd be interested in tackling this issue if it isn't too far out of my league.

I think first a decision about how this should be implemented should be made. I also lean towards a general solution as #401.
I don't think that this necessarily means persistent undo, but rather some kind of general session state that may be configurable, that is serialized to disk also in a panic handler (at least, if possible without serializing inconsistent state).

@josephholsten
Copy link

I think we're missing the fact that most process crashes aren't due to internal panic but external factors, ie unexpected system shutdown, out-of-memory, some idiot forgetting his laptop is set to automatically log out after an hour of inactivity.

So while a panic handler is great, a swap file or operation journal is vital feature.

@6lj6
Copy link

6lj6 commented Dec 24, 2022

So there is no method now to get my lost file back?

@WarpspeedSCP
Copy link

WarpspeedSCP commented Jun 20, 2023

I'd really appreciate a swap file feature; I just lost about an hour's work because I assumed helix would at least hold onto my unsaved work in a swap file when I closed a console tab (I mean, even vim does it!)

@bcspragu
Copy link
Contributor

Similar to @sqwxl, I'm happy to contribute here and care mostly about the "something went wrong (internally or externally) and the program went down hard, where did the last <duration> of my work end up?". Especially when building from main, I've hit (and filed) a number of hard crashes that cost me a lot of time. On the bright side, I'm now a compulsive :w-er.

I'm not opinionated on the exact fix (swap files, dumping changesets, etc), just opinionated that there should be a safety net in place to recover work after a crash.

@kirawi
Copy link
Member

kirawi commented Sep 15, 2023

Part of the problem is that there is a bug that prevents swap files from working: #4276. It's hard to create a minimal reproducible case as well.

@pascalkuthe
Copy link
Member

I also don't really see how this would actually work.

A panic handler only has access to statics so we would need to store our document data in a static which would require locking on every transaction (which I don't see as acceptable).

I think one way to handle this ks to run the main UI loop in it's own thread outside of the tokio executer (tokio recommends that anyway) and have editor as a thread local (it's non-sync anyway). That way the panic handler could access it. But that's a pretty large refactor

@kirawi
Copy link
Member

kirawi commented Sep 15, 2023

I think that #4276 (if the bug is found) + persistent undo would cover this. Although the MVP persistent undo only does it on write, there's no reason mathematically (I think) that it needs that restriction. It should always work for any arbitrary state (as long as it evolved from History::default)

@pascalkuthe
Copy link
Member

persistant undo would work but you wouldn't want to write persistent undo to disk on every keypress either. So you would still need to write to diisk at some point. That point would be when saving so that doens't dicetly solve it until we save to disk in the panic handler (or write the undofile).

That shill has the problem that we don't have access to either undofile or document state in the panic handler (since that can only access statics.

You might work around this with catch unwind instead I suppose but I am not sure how the tokio runtime handles panics (I think the entire runtime just shuts down so you can't catch panics in a a different worker?)

@kirawi
Copy link
Member

kirawi commented Sep 15, 2023

How does vim/kakoune handle this, if at all?

@pascalkuthe
Copy link
Member

Vim only has lockfiles that are auto saved after 4 seoncds of no changes. Since C doesn't really have exceptions writing a proper crash handler there is hard. Once we implement autosave we could simply offer three options:no autafe, swap file and enable.

Kakoune doesn't have swapfiles but since c++ does have exceptions (altough it still can and does hard crash too) they catch exceptions and write all buffers to disk. That is essentially the same I said above but it's harder to do in rust/with how we handle things.

It seems catching the panic may actually work tough.

@ardijanr
Copy link

ardijanr commented Sep 26, 2023

New to helix and just got bitten by this myself, lost about 2 hours of work. Idk why I just assumed I would not loose the work.

If it can be done with panic then I guess I am all for it, but I somehow doubt you could make it as robust as "save every X seconds". Good luck saving when I am holding down the power button, or my server looses mains.

My opinion is that helix should still be able to recover at least some of the work.

I would suggest that swap files and history state is saved in the same location.

If this is still open when I have time, I will give it a go.
I would not be implementing a complicated panic thing though, just a swap and history recovery, if I am able.

@kirawi
Copy link
Member

kirawi commented Sep 26, 2023

See the above linked PRs that already do essentially the same thing. Saving on panic would be more robust than saving every X seconds by its nature, unless somehow the Document state gets messed up which would probably be a major bug in it of itself.

@ardijanr
Copy link

ardijanr commented Oct 2, 2023

I see, just found #5608 which means #401 seems to be the chosen solution. I'll leave it there, cannot seem to find the new PR though so I assume its not ready?

Let us know about new PR when/if you want any help on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements C-enhancement Category: Improvements
Projects
None yet
Development

No branches or pull requests