Check for external file modifications when writing #5805

divarvel · 2023-02-03T14:46:56Z

This is the continuation of work by @kov in #4734

I applied feedback from

@the-mikedavis by adding a new prevent-external-modifications (name is not definitive, bikeshed away)
@pascalkuthe by making the last_saved_time field part of the Debug impl for Document
@LeoniePhiline by renaming a helper for consistency in tests

I also merged master and resolved a couple conflicts (i merged instead of rebasing to keep @kov's commit unchanged). My additions are in a single commit so that the original commit from @kov can be kept unchanged when merging / rebasing this branch.

I’m not super satisfied with editor.save() taking a Config instead of getting it via self, but since editor.save() borrows self as mut, i was not able to borrow it immutably at the same time. not an issue any more.

When saving, check that the file has not changed before our last known save before writing the file. Only overwrite external changes if asked to force.

pascalkuthe

Thanks for taking this over! I left some minor comments.

I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

helix-view/src/document.rs

helix-view/src/editor.rs

helix-view/src/document.rs

divarvel · 2023-02-03T15:17:08Z

alright, i think i've handled everything

pascalkuthe

LGTM, thanks for working on this. Now that the extra config parameter is gone the changes are quite small (half of the additions are just test cases). While providing a pretty great QOL improvement.

kirawi · 2023-02-03T15:22:36Z

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

pascalkuthe · 2023-02-03T15:25:46Z

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

It's exactly the same commit as in the original PR (exact same hash). Seems @kov just used a different email for his commit there. I think leaving the original commit untouched would be the right thing to do

divarvel · 2023-02-03T15:31:52Z

Thanks for taking this over! I left some minor comments.

I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

one improvement would be to periodically check for external modifications on the current buffer (and check when changing buffers), but that's a bit more work.

pascalkuthe · 2023-02-03T15:33:38Z

Thanks for taking this over! I left some minor comments.
I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

one improvement would be to periodically check for external modifications on the current buffer (and check when changing buffers), but that's a bit more work.

That's what is called a polling filewatcher. That doesn't perform great so using something like inotify might be better but getting that right is hard (I am working on it tough). That kind of work is best kept to a separate PR as it potentially solves a whole host of other issues but is quite complex to get right.

divarvel · 2023-02-03T15:34:44Z

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

It's exactly the same commit as in the original PR (exact same hash). Seems @kov just used a different email for his commit there. I think leaving the original commit untouched would be the right thing to do

that was the reason why i merged master instead of rebasing the branch. That makes things a bit more complex but i think that's worth it. As for my commits, keeping them as fixups when merging will make the history strange. Maybe i can merge master first, and then squash my changes in a single commit (idk how important is a clean commit history in helix)

divarvel · 2023-02-03T15:36:57Z

Thanks for taking this over! I left some minor comments.
I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

one improvement would be to periodically check for external modifications on the current buffer (and check when changing buffers), but that's a bit more work.

That's what is called a polling filewatcher. That doesn't perform great so using something like inotify might be better but getting that right is hard (I am working on it tough). That kind of work is best kept to a separate PR as it potentially solves a whole host of other issues but is quite complex to get right.

yeah it's definitely out of scope here. akaik, fanotify is more recent than inotify and performs better (it also provides more features but they're not useful there (i think)).

pascalkuthe · 2023-02-03T15:39:36Z

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

It's exactly the same commit as in the original PR (exact same hash). Seems @kov just used a different email for his commit there. I think leaving the original commit untouched would be the right thing to do

that was the reason why i merged master instead of rebasing the branch. That makes things a bit more complex but i think that's worth it. As for my commits, keeping them as fixups when merging will make the history strange. Maybe i can merge master first, and then squash my changes in a single commit (idk how important is a clean commit history in helix)

Normally we just squash small PRs but in this vase it would be nice to keep the original commit. I think the easiest solution if you just squash your commits together and add a descriptive title for your changes: Make prevent-external-modifications configurable and address minor review comments

Co-authored-by: LeoniePhiline <[email protected]> Co-authored-by: Pascal Kuthe <[email protected]>

divarvel · 2023-02-03T15:53:08Z

alright, i've cleaned up commits, and added @LeoniePhiline and you as co-authors to account for the suggestions you made through the PR UI.

archseer · 2023-02-04T10:13:58Z

book/src/configuration.md

@@ -57,6 +57,7 @@ on unix operating systems.
 | `rulers` | List of column positions at which to display the rulers. Can be overridden by language specific `rulers` in `languages.toml` file. | `[]` |
 | `bufferline` | Renders a line at the top of the editor displaying open buffers. Can be `always`, `never` or `multiple` (only shown if more than one buffer is in use) | `never` |
 | `color-modes` | Whether to color the mode indicator with different colors depending on the mode itself | `false` |
+| `prevent-external-modifications` | Prevent saving a file that has been modified from the outside, based on its last modification date (`mtime`). Disabling this check can be useful on systems where `mtime` values are unreliable | `true` |


I'm a bit hesitant about adding a config option here. We should check for a nonsensical mtime value, but a user should also be able to manually bump the file via touch or just use :w! to bypass the check

It was added at the request of @the-mikedavis: #4734 (comment)

:w! already works, so removing the configuration bit would make things more annoying on systems with unreliable mtime, but not impossible.

We could also give the setting a scary name and a scarier description if it makes sense to keep it (personally, i would rather use :w! than disabling the check entirely, but that's just me)

Adding a config option was originally requested by @the-mikedavis: #4734 (comment). He thought thought that fighting the editor (w!) for every write would be annoying in those nieche cases.

Detecting non-sensival mtime is a bit hard. We xould vhrck that the mtime is between the last saved mtime and now but here is usually a certain amount of racing going with tien/fs events so that could lead to false positives. That also still doesn't necessarily cstvh all false positives.

We could changes this option to fallback to sha1 hashes instead. Mtime is preferable bbecause it js much faster to retrieve and usually reliable. We don't usually want to compute hashes because it requires reading the file again and sha1 runs at around 250MB/sec so there would be some delay for multi gig files (although @cessen has developed a non-cryptographic alternative hash that is much faster but it's not quite mature enough yet but we could switch to that in the future)

I thaugh just adding an option to toggle this on/off would be easiest. What are your thoughts?

although @cessen has developed a non-cryptographic alternative hash that is much faster

A couple of thoughts about using hashes:

In the simple case where you just want to see if the current in-memory version of the file is consistent with the on-disk version, I suspect the fastest thing is to just directly compare their contents. Hashing requires not only reading the file, but also doing the extra work to compute a hash, which even for very fast hashes I suspect is slower than a simple equality test with the in-memory data.

The case where hashes come in useful is when you don't have the data to directly compare against, because storing it doesn't make sense for one reason or another. This could be the case, for example, if the in-memory version of the file has unsaved changes and thus you want to check against the last saved state rather than the current in-memory state. But with Ropey as the backing buffer, I wonder if it doesn't make sense to just clone the rope to track the last-saved state, since rope clones share data, so the only storage overhead would be modified nodes/chunks.

I think where a hash is really going to come in handy is for things like persistent undo, where the hash can be stored with the on-disk undo data so that it can be appropriately invalidated if the file has been changed the next time it's loaded. Because you probably don't want to be storing an entire copy of the file just for undo stack invalidation.

In that case we shkuld read the file into a buffer that is initalzied with 2KB (chunks are tsually 1k but can be a bit larger I believe, I know you don't want to expose ropey internals like that so I would just heal allocate and grow if a chunk really is larger).

I don't think you need to get that fancy. If you take a look at Ropey's PartialEq impl for RopeSlice, it solves basically the same problem (chunks not aligning) without any allocations. (In fact, IIRC you fixed a bug in one of those impls.) I think you can do the same thing here, which would let you read the file in fixed-size chunks, no additional allocations needed.

I guess this approach sort of just defers the clone in a sense

In the sense of actually duplicating data in memory, yeah. And it does that deferred cloning incrementally as well (a single edit will only duplicate log N of the tree's data).

It doesn't really cause any measuable peformance overhead tough

In benchmarks post-clone edits are notably slower (by several times, I think). But given that Ropey already does 1-3 million edits a second (e.g. you could have well over 10,000 cursors and still edit at 60fps), in practice I doubt being a handful of times slower is going to have any real impact. Also, after the first edit in an area of the text, more edits to the same area won't have the post-cloning performance overhead anymore.

Then again, hashing instead of cloning could still make sense. The benefit of the hashing approach is that it only happens during syncing with disk, so even if it's a bit slower than a direct comparison, it probably won't impact usability in any meaningful way. Whereas holding on to a clone impacts the performance of all initial edits after a sync. Additionally, if we're eventually going to use hashing for persistent undo invalidation, then we'll be computing the hash on sync anyway.

Dunno. Both approaches have tradeoffs. I mainly just wanted to put the idea out there as a viable alternative to consider.

I think you are right about cloning the rope here instead of hashing.

Just to clarify with diff gutters we currently always pay the overhead of cloning each chuck when it's edited.
The diff worker holds a clone of the rope in a different thread which is usually only replaced by the new rope shortly after the edit is finished.

I definitely want to remove this cost at some-point. That will likely require the addition of a WeakRope to ropey (just a wrapper around std::arc::Weak which can be upgraded back to a full Rope).

But since this not a big problem in practice I think this simpler approach would be better suited than storing a hash since for this particular case this problem would only appear once.

I would be ok with leaving out the option from this PR. We can revisit it if we see reports of the mtime check failing. I don't really remember the details of when I was fighting with mtime - I doubt that I will run into it again

i've removed the option

kov · 2023-02-05T01:09:00Z

Thank you for picking this up @divarvel! Should we close the other PR, then? I bless this derivative =)

divarvel · 2023-02-05T18:23:42Z

Considering feedback, i think it would make sense to ship mtime-based checks now, even if it can in some cases trigger false positives, since it would prevent data loss (based on chat messages on matrix, it happens regularly).

If we have a convincing plan for handling such false positives later on, it might be better to not add a settings item and suffer the inconvenience on systems with unreliable mtime, so that we don't need to remove it later on when we have a more robust solution.

After discussing it further, it's best to have it done every time, a couple options can be explored to reduce false positives in the future.

archseer · 2023-02-08T07:52:13Z

it might be better to not add a settings item and suffer the inconvenience on systems with unreliable mtime, so that we don't need to remove it later on when we have a more robust solution.

I agree with this! It seems an obscure setting that most users would be better off with leaving on. We should mostly avoid the need for this change once file system notifications are available.

Protect against overwriting external changes

a17e53d

When saving, check that the file has not changed before our last known save before writing the file. Only overwrite external changes if asked to force.

divarvel mentioned this pull request Feb 3, 2023

Protect against overwriting external changes #4734

Closed

pascalkuthe requested changes Feb 3, 2023

View reviewed changes

helix-view/src/document.rs Outdated Show resolved Hide resolved

helix-view/src/editor.rs Outdated Show resolved Hide resolved

helix-view/src/editor.rs Outdated Show resolved Hide resolved

pascalkuthe added C-enhancement Category: Improvements E-easy Call for participation: Experience needed to fix: Easy / not much S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 3, 2023

pascalkuthe reviewed Feb 3, 2023

View reviewed changes

helix-view/src/document.rs Outdated Show resolved Hide resolved

pascalkuthe approved these changes Feb 3, 2023

View reviewed changes

Merge remote-tracking branch 'origin/master' into overwrite-new

db1fbcf

divarvel force-pushed the protect-against-overwrites branch from b2404c4 to d04efed Compare February 3, 2023 15:49

Make overwrite protection configurable + small fixes

0343d92

Co-authored-by: LeoniePhiline <[email protected]> Co-authored-by: Pascal Kuthe <[email protected]>

divarvel force-pushed the protect-against-overwrites branch from d04efed to 0343d92 Compare February 3, 2023 15:51

archseer reviewed Feb 4, 2023

View reviewed changes

remove config field for overwrite protection

f64ff10

After discussing it further, it's best to have it done every time, a couple options can be explored to reduce false positives in the future.

pascalkuthe added S-waiting-on-review Status: Awaiting review from a maintainer. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 6, 2023

archseer requested a review from the-mikedavis February 8, 2023 07:53

archseer approved these changes Feb 8, 2023

View reviewed changes

the-mikedavis approved these changes Feb 8, 2023

View reviewed changes

the-mikedavis changed the title ~~Protect against overwriting external changes (update)~~ Check for external file modifications when writing Feb 8, 2023

the-mikedavis merged commit f386ff7 into helix-editor:master Feb 8, 2023

pascalkuthe mentioned this pull request Feb 12, 2023

Persistent Undo #5608

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for external file modifications when writing #5805

Check for external file modifications when writing #5805

divarvel commented Feb 3, 2023 •

edited

Loading

pascalkuthe left a comment

divarvel commented Feb 3, 2023

pascalkuthe left a comment

kirawi commented Feb 3, 2023 •

edited

Loading

pascalkuthe commented Feb 3, 2023 •

edited

Loading

divarvel commented Feb 3, 2023

pascalkuthe commented Feb 3, 2023 •

edited

Loading

divarvel commented Feb 3, 2023

divarvel commented Feb 3, 2023

pascalkuthe commented Feb 3, 2023

divarvel commented Feb 3, 2023

archseer Feb 4, 2023

divarvel Feb 4, 2023 •

edited

Loading

divarvel Feb 4, 2023

pascalkuthe Feb 4, 2023 •

edited

Loading

cessen Feb 4, 2023

cessen Feb 5, 2023 •

edited

Loading

cessen Feb 5, 2023 •

edited

Loading

pascalkuthe Feb 5, 2023

the-mikedavis Feb 5, 2023

divarvel Feb 6, 2023

kov commented Feb 5, 2023

divarvel commented Feb 5, 2023

archseer commented Feb 8, 2023

Check for external file modifications when writing #5805

Check for external file modifications when writing #5805

Conversation

divarvel commented Feb 3, 2023 • edited Loading

pascalkuthe left a comment

Choose a reason for hiding this comment

divarvel commented Feb 3, 2023

pascalkuthe left a comment

Choose a reason for hiding this comment

kirawi commented Feb 3, 2023 • edited Loading

pascalkuthe commented Feb 3, 2023 • edited Loading

divarvel commented Feb 3, 2023

pascalkuthe commented Feb 3, 2023 • edited Loading

divarvel commented Feb 3, 2023

divarvel commented Feb 3, 2023

pascalkuthe commented Feb 3, 2023

divarvel commented Feb 3, 2023

archseer Feb 4, 2023

Choose a reason for hiding this comment

divarvel Feb 4, 2023 • edited Loading

Choose a reason for hiding this comment

divarvel Feb 4, 2023

Choose a reason for hiding this comment

pascalkuthe Feb 4, 2023 • edited Loading

Choose a reason for hiding this comment

cessen Feb 4, 2023

Choose a reason for hiding this comment

cessen Feb 5, 2023 • edited Loading

Choose a reason for hiding this comment

cessen Feb 5, 2023 • edited Loading

Choose a reason for hiding this comment

pascalkuthe Feb 5, 2023

Choose a reason for hiding this comment

the-mikedavis Feb 5, 2023

Choose a reason for hiding this comment

divarvel Feb 6, 2023

Choose a reason for hiding this comment

kov commented Feb 5, 2023

divarvel commented Feb 5, 2023

archseer commented Feb 8, 2023

divarvel commented Feb 3, 2023 •

edited

Loading

kirawi commented Feb 3, 2023 •

edited

Loading

pascalkuthe commented Feb 3, 2023 •

edited

Loading

pascalkuthe commented Feb 3, 2023 •

edited

Loading

divarvel Feb 4, 2023 •

edited

Loading

pascalkuthe Feb 4, 2023 •

edited

Loading

cessen Feb 5, 2023 •

edited

Loading

cessen Feb 5, 2023 •

edited

Loading