Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for external file modifications when writing #5805

Merged

Conversation

divarvel
Copy link
Contributor

@divarvel divarvel commented Feb 3, 2023

This is the continuation of work by @kov in #4734

I applied feedback from

  • @the-mikedavis by adding a new prevent-external-modifications (name is not definitive, bikeshed away)
  • @pascalkuthe by making the last_saved_time field part of the Debug impl for Document
  • @LeoniePhiline by renaming a helper for consistency in tests

I also merged master and resolved a couple conflicts (i merged instead of rebasing to keep @kov's commit unchanged). My additions are in a single commit so that the original commit from @kov can be kept unchanged when merging / rebasing this branch.

I’m not super satisfied with editor.save() taking a Config instead of getting it via self, but since editor.save() borrows self as mut, i was not able to borrow it immutably at the same time. not an issue any more.

When saving, check that the file has not changed before our last known
save before writing the file. Only overwrite external changes if asked
to force.
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this over! I left some minor comments.

I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

helix-view/src/document.rs Outdated Show resolved Hide resolved
helix-view/src/editor.rs Outdated Show resolved Hide resolved
helix-view/src/editor.rs Outdated Show resolved Hide resolved
@pascalkuthe pascalkuthe added C-enhancement Category: Improvements E-easy Call for participation: Experience needed to fix: Easy / not much S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 3, 2023
@divarvel
Copy link
Contributor Author

divarvel commented Feb 3, 2023

alright, i think i've handled everything

Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this. Now that the extra config parameter is gone the changes are quite small (half of the additions are just test cases). While providing a pretty great QOL improvement.

@kirawi
Copy link
Member

kirawi commented Feb 3, 2023

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 3, 2023

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

It's exactly the same commit as in the original PR (exact same hash). Seems @kov just used a different email for his commit there. I think leaving the original commit untouched would be the right thing to do

@divarvel
Copy link
Contributor Author

divarvel commented Feb 3, 2023

Thanks for taking this over! I left some minor comments.

I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

one improvement would be to periodically check for external modifications on the current buffer (and check when changing buffers), but that's a bit more work.

@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 3, 2023

Thanks for taking this over! I left some minor comments.
I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

one improvement would be to periodically check for external modifications on the current buffer (and check when changing buffers), but that's a bit more work.

That's what is called a polling filewatcher. That doesn't perform great so using something like inotify might be better but getting that right is hard (I am working on it tough). That kind of work is best kept to a separate PR as it potentially solves a whole host of other issues but is quite complex to get right.

@divarvel
Copy link
Contributor Author

divarvel commented Feb 3, 2023

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

It's exactly the same commit as in the original PR (exact same hash). Seems @kov just used a different email for his commit there. I think leaving the original commit untouched would be the right thing to do

that was the reason why i merged master instead of rebasing the branch. That makes things a bit more complex but i think that's worth it. As for my commits, keeping them as fixups when merging will make the history strange. Maybe i can merge master first, and then squash my changes in a single commit (idk how important is a clean commit history in helix)

@divarvel
Copy link
Contributor Author

divarvel commented Feb 3, 2023

Thanks for taking this over! I left some minor comments.
I have been looking more into FS watching recently and think that mtime is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime internally (watchmen does too). With the option to disable this features in odd environments using mtime seems fine tome.

one improvement would be to periodically check for external modifications on the current buffer (and check when changing buffers), but that's a bit more work.

That's what is called a polling filewatcher. That doesn't perform great so using something like inotify might be better but getting that right is hard (I am working on it tough). That kind of work is best kept to a separate PR as it potentially solves a whole host of other issues but is quite complex to get right.

yeah it's definitely out of scope here. akaik, fanotify is more recent than inotify and performs better (it also provides more features but they're not useful there (i think)).

@pascalkuthe
Copy link
Member

It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email?

It's exactly the same commit as in the original PR (exact same hash). Seems @kov just used a different email for his commit there. I think leaving the original commit untouched would be the right thing to do

that was the reason why i merged master instead of rebasing the branch. That makes things a bit more complex but i think that's worth it. As for my commits, keeping them as fixups when merging will make the history strange. Maybe i can merge master first, and then squash my changes in a single commit (idk how important is a clean commit history in helix)

Normally we just squash small PRs but in this vase it would be nice to keep the original commit. I think the easiest solution if you just squash your commits together and add a descriptive title for your changes: Make prevent-external-modifications configurable and address minor review comments

@divarvel divarvel force-pushed the protect-against-overwrites branch from b2404c4 to d04efed Compare February 3, 2023 15:49
Co-authored-by: LeoniePhiline <[email protected]>
Co-authored-by: Pascal Kuthe <[email protected]>
@divarvel divarvel force-pushed the protect-against-overwrites branch from d04efed to 0343d92 Compare February 3, 2023 15:51
@divarvel
Copy link
Contributor Author

divarvel commented Feb 3, 2023

alright, i've cleaned up commits, and added @LeoniePhiline and you as co-authors to account for the suggestions you made through the PR UI.

@@ -57,6 +57,7 @@ on unix operating systems.
| `rulers` | List of column positions at which to display the rulers. Can be overridden by language specific `rulers` in `languages.toml` file. | `[]` |
| `bufferline` | Renders a line at the top of the editor displaying open buffers. Can be `always`, `never` or `multiple` (only shown if more than one buffer is in use) | `never` |
| `color-modes` | Whether to color the mode indicator with different colors depending on the mode itself | `false` |
| `prevent-external-modifications` | Prevent saving a file that has been modified from the outside, based on its last modification date (`mtime`). Disabling this check can be useful on systems where `mtime` values are unreliable | `true` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant about adding a config option here. We should check for a nonsensical mtime value, but a user should also be able to manually bump the file via touch or just use :w! to bypass the check

Copy link
Contributor Author

@divarvel divarvel Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added at the request of @the-mikedavis: #4734 (comment)

:w! already works, so removing the configuration bit would make things more annoying on systems with unreliable mtime, but not impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also give the setting a scary name and a scarier description if it makes sense to keep it (personally, i would rather use :w! than disabling the check entirely, but that's just me)

Copy link
Member

@pascalkuthe pascalkuthe Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a config option was originally requested by @the-mikedavis: #4734 (comment). He thought thought that fighting the editor (w!) for every write would be annoying in those nieche cases.

Detecting non-sensival mtime is a bit hard. We xould vhrck that the mtime is between the last saved mtime and now but here is usually a certain amount of racing going with tien/fs events so that could lead to false positives. That also still doesn't necessarily cstvh all false positives.

We could changes this option to fallback to sha1 hashes instead. Mtime is preferable bbecause it js much faster to retrieve and usually reliable. We don't usually want to compute hashes because it requires reading the file again and sha1 runs at around 250MB/sec so there would be some delay for multi gig files (although @cessen has developed a non-cryptographic alternative hash that is much faster but it's not quite mature enough yet but we could switch to that in the future)

I thaugh just adding an option to toggle this on/off would be easiest. What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although @cessen has developed a non-cryptographic alternative hash that is much faster

A couple of thoughts about using hashes:

  1. In the simple case where you just want to see if the current in-memory version of the file is consistent with the on-disk version, I suspect the fastest thing is to just directly compare their contents. Hashing requires not only reading the file, but also doing the extra work to compute a hash, which even for very fast hashes I suspect is slower than a simple equality test with the in-memory data.
  2. The case where hashes come in useful is when you don't have the data to directly compare against, because storing it doesn't make sense for one reason or another. This could be the case, for example, if the in-memory version of the file has unsaved changes and thus you want to check against the last saved state rather than the current in-memory state. But with Ropey as the backing buffer, I wonder if it doesn't make sense to just clone the rope to track the last-saved state, since rope clones share data, so the only storage overhead would be modified nodes/chunks.

I think where a hash is really going to come in handy is for things like persistent undo, where the hash can be stored with the on-disk undo data so that it can be appropriately invalidated if the file has been changed the next time it's loaded. Because you probably don't want to be storing an entire copy of the file just for undo stack invalidation.

Copy link
Contributor

@cessen cessen Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we shkuld read the file into a buffer that is initalzied with 2KB (chunks are tsually 1k but can be a bit larger I believe, I know you don't want to expose ropey internals like that so I would just heal allocate and grow if a chunk really is larger).

I don't think you need to get that fancy. If you take a look at Ropey's PartialEq impl for RopeSlice, it solves basically the same problem (chunks not aligning) without any allocations. (In fact, IIRC you fixed a bug in one of those impls.) I think you can do the same thing here, which would let you read the file in fixed-size chunks, no additional allocations needed.

Copy link
Contributor

@cessen cessen Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this approach sort of just defers the clone in a sense

In the sense of actually duplicating data in memory, yeah. And it does that deferred cloning incrementally as well (a single edit will only duplicate log N of the tree's data).

It doesn't really cause any measuable peformance overhead tough

In benchmarks post-clone edits are notably slower (by several times, I think). But given that Ropey already does 1-3 million edits a second (e.g. you could have well over 10,000 cursors and still edit at 60fps), in practice I doubt being a handful of times slower is going to have any real impact. Also, after the first edit in an area of the text, more edits to the same area won't have the post-cloning performance overhead anymore.

Then again, hashing instead of cloning could still make sense. The benefit of the hashing approach is that it only happens during syncing with disk, so even if it's a bit slower than a direct comparison, it probably won't impact usability in any meaningful way. Whereas holding on to a clone impacts the performance of all initial edits after a sync. Additionally, if we're eventually going to use hashing for persistent undo invalidation, then we'll be computing the hash on sync anyway.

Dunno. Both approaches have tradeoffs. I mainly just wanted to put the idea out there as a viable alternative to consider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right about cloning the rope here instead of hashing.

Just to clarify with diff gutters we currently always pay the overhead of cloning each chuck when it's edited.
The diff worker holds a clone of the rope in a different thread which is usually only replaced by the new rope shortly after the edit is finished.

I definitely want to remove this cost at some-point. That will likely require the addition of a WeakRope to ropey (just a wrapper around std::arc::Weak which can be upgraded back to a full Rope).

But since this not a big problem in practice I think this simpler approach would be better suited than storing a hash since for this particular case this problem would only appear once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be ok with leaving out the option from this PR. We can revisit it if we see reports of the mtime check failing. I don't really remember the details of when I was fighting with mtime - I doubt that I will run into it again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've removed the option

@kov
Copy link

kov commented Feb 5, 2023

Thank you for picking this up @divarvel! Should we close the other PR, then? I bless this derivative =)

@divarvel
Copy link
Contributor Author

divarvel commented Feb 5, 2023

Considering feedback, i think it would make sense to ship mtime-based checks now, even if it can in some cases trigger false positives, since it would prevent data loss (based on chat messages on matrix, it happens regularly).

If we have a convincing plan for handling such false positives later on, it might be better to not add a settings item and suffer the inconvenience on systems with unreliable mtime, so that we don't need to remove it later on when we have a more robust solution.

After discussing it further, it's best to have it done every time, a
couple options can be explored to reduce false positives in the future.
@pascalkuthe pascalkuthe added S-waiting-on-review Status: Awaiting review from a maintainer. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 6, 2023
@archseer
Copy link
Member

archseer commented Feb 8, 2023

it might be better to not add a settings item and suffer the inconvenience on systems with unreliable mtime, so that we don't need to remove it later on when we have a more robust solution.

I agree with this! It seems an obscure setting that most users would be better off with leaving on. We should mostly avoid the need for this change once file system notifications are available.

@the-mikedavis the-mikedavis changed the title Protect against overwriting external changes (update) Check for external file modifications when writing Feb 8, 2023
@the-mikedavis the-mikedavis merged commit f386ff7 into helix-editor:master Feb 8, 2023
@pascalkuthe pascalkuthe mentioned this pull request Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: Improvements E-easy Call for participation: Experience needed to fix: Easy / not much S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants