-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for external file modifications when writing #5805
Check for external file modifications when writing #5805
Conversation
When saving, check that the file has not changed before our last known save before writing the file. Only overwrite external changes if asked to force.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this over! I left some minor comments.
I have been looking more into FS watching recently and think that mtime
is definitely the right solution. Most fswatcher (even those based on inotify and similar APIs) use mtime
internally (watchmen does too). With the option to disable this features in odd environments using mtime
seems fine tome.
alright, i think i've handled everything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working on this. Now that the extra config
parameter is gone the changes are quite small (half of the additions are just test cases). While providing a pretty great QOL improvement.
It doesn't look like @kov 's account is credited in the first commit. You might also have to add their email? |
It's exactly the same commit as in the original PR (exact same hash). Seems @kov just used a different email for his commit there. I think leaving the original commit untouched would be the right thing to do |
one improvement would be to periodically check for external modifications on the current buffer (and check when changing buffers), but that's a bit more work. |
That's what is called a polling filewatcher. That doesn't perform great so using something like inotify might be better but getting that right is hard (I am working on it tough). That kind of work is best kept to a separate PR as it potentially solves a whole host of other issues but is quite complex to get right. |
that was the reason why i merged master instead of rebasing the branch. That makes things a bit more complex but i think that's worth it. As for my commits, keeping them as fixups when merging will make the history strange. Maybe i can merge master first, and then squash my changes in a single commit (idk how important is a clean commit history in helix) |
yeah it's definitely out of scope here. akaik, |
Normally we just squash small PRs but in this vase it would be nice to keep the original commit. I think the easiest solution if you just squash your commits together and add a descriptive title for your changes: Make prevent-external-modifications configurable and address minor review comments |
b2404c4
to
d04efed
Compare
Co-authored-by: LeoniePhiline <[email protected]> Co-authored-by: Pascal Kuthe <[email protected]>
d04efed
to
0343d92
Compare
alright, i've cleaned up commits, and added @LeoniePhiline and you as co-authors to account for the suggestions you made through the PR UI. |
book/src/configuration.md
Outdated
@@ -57,6 +57,7 @@ on unix operating systems. | |||
| `rulers` | List of column positions at which to display the rulers. Can be overridden by language specific `rulers` in `languages.toml` file. | `[]` | | |||
| `bufferline` | Renders a line at the top of the editor displaying open buffers. Can be `always`, `never` or `multiple` (only shown if more than one buffer is in use) | `never` | | |||
| `color-modes` | Whether to color the mode indicator with different colors depending on the mode itself | `false` | | |||
| `prevent-external-modifications` | Prevent saving a file that has been modified from the outside, based on its last modification date (`mtime`). Disabling this check can be useful on systems where `mtime` values are unreliable | `true` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit hesitant about adding a config option here. We should check for a nonsensical mtime
value, but a user should also be able to manually bump the file via touch
or just use :w!
to bypass the check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added at the request of @the-mikedavis: #4734 (comment)
:w! already works, so removing the configuration bit would make things more annoying on systems with unreliable mtime, but not impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also give the setting a scary name and a scarier description if it makes sense to keep it (personally, i would rather use :w!
than disabling the check entirely, but that's just me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a config option was originally requested by @the-mikedavis: #4734 (comment). He thought thought that fighting the editor (w!
) for every write would be annoying in those nieche cases.
Detecting non-sensival mtime is a bit hard. We xould vhrck that the mtime is between the last saved mtime and now but here is usually a certain amount of racing going with tien/fs events so that could lead to false positives. That also still doesn't necessarily cstvh all false positives.
We could changes this option to fallback to sha1 hashes instead. Mtime is preferable bbecause it js much faster to retrieve and usually reliable. We don't usually want to compute hashes because it requires reading the file again and sha1 runs at around 250MB/sec so there would be some delay for multi gig files (although @cessen has developed a non-cryptographic alternative hash that is much faster but it's not quite mature enough yet but we could switch to that in the future)
I thaugh just adding an option to toggle this on/off would be easiest. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although @cessen has developed a non-cryptographic alternative hash that is much faster
A couple of thoughts about using hashes:
- In the simple case where you just want to see if the current in-memory version of the file is consistent with the on-disk version, I suspect the fastest thing is to just directly compare their contents. Hashing requires not only reading the file, but also doing the extra work to compute a hash, which even for very fast hashes I suspect is slower than a simple equality test with the in-memory data.
- The case where hashes come in useful is when you don't have the data to directly compare against, because storing it doesn't make sense for one reason or another. This could be the case, for example, if the in-memory version of the file has unsaved changes and thus you want to check against the last saved state rather than the current in-memory state. But with Ropey as the backing buffer, I wonder if it doesn't make sense to just clone the rope to track the last-saved state, since rope clones share data, so the only storage overhead would be modified nodes/chunks.
I think where a hash is really going to come in handy is for things like persistent undo, where the hash can be stored with the on-disk undo data so that it can be appropriately invalidated if the file has been changed the next time it's loaded. Because you probably don't want to be storing an entire copy of the file just for undo stack invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we shkuld read the file into a buffer that is initalzied with 2KB (chunks are tsually 1k but can be a bit larger I believe, I know you don't want to expose ropey internals like that so I would just heal allocate and grow if a chunk really is larger).
I don't think you need to get that fancy. If you take a look at Ropey's PartialEq
impl for RopeSlice
, it solves basically the same problem (chunks not aligning) without any allocations. (In fact, IIRC you fixed a bug in one of those impls.) I think you can do the same thing here, which would let you read the file in fixed-size chunks, no additional allocations needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this approach sort of just defers the clone in a sense
In the sense of actually duplicating data in memory, yeah. And it does that deferred cloning incrementally as well (a single edit will only duplicate log N
of the tree's data).
It doesn't really cause any measuable peformance overhead tough
In benchmarks post-clone edits are notably slower (by several times, I think). But given that Ropey already does 1-3 million edits a second (e.g. you could have well over 10,000 cursors and still edit at 60fps), in practice I doubt being a handful of times slower is going to have any real impact. Also, after the first edit in an area of the text, more edits to the same area won't have the post-cloning performance overhead anymore.
Then again, hashing instead of cloning could still make sense. The benefit of the hashing approach is that it only happens during syncing with disk, so even if it's a bit slower than a direct comparison, it probably won't impact usability in any meaningful way. Whereas holding on to a clone impacts the performance of all initial edits after a sync. Additionally, if we're eventually going to use hashing for persistent undo invalidation, then we'll be computing the hash on sync anyway.
Dunno. Both approaches have tradeoffs. I mainly just wanted to put the idea out there as a viable alternative to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right about cloning the rope here instead of hashing.
Just to clarify with diff gutters we currently always pay the overhead of cloning each chuck when it's edited.
The diff worker holds a clone of the rope in a different thread which is usually only replaced by the new rope shortly after the edit is finished.
I definitely want to remove this cost at some-point. That will likely require the addition of a WeakRope
to ropey (just a wrapper around std::arc::Weak
which can be upgraded back to a full Rope).
But since this not a big problem in practice I think this simpler approach would be better suited than storing a hash since for this particular case this problem would only appear once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be ok with leaving out the option from this PR. We can revisit it if we see reports of the mtime check failing. I don't really remember the details of when I was fighting with mtime - I doubt that I will run into it again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've removed the option
Thank you for picking this up @divarvel! Should we close the other PR, then? I bless this derivative =) |
Considering feedback, i think it would make sense to ship mtime-based checks now, even if it can in some cases trigger false positives, since it would prevent data loss (based on chat messages on matrix, it happens regularly). If we have a convincing plan for handling such false positives later on, it might be better to not add a settings item and suffer the inconvenience on systems with unreliable mtime, so that we don't need to remove it later on when we have a more robust solution. |
After discussing it further, it's best to have it done every time, a couple options can be explored to reduce false positives in the future.
I agree with this! It seems an obscure setting that most users would be better off with leaving on. We should mostly avoid the need for this change once file system notifications are available. |
This is the continuation of work by @kov in #4734
I applied feedback from
prevent-external-modifications
(name is not definitive, bikeshed away)last_saved_time
field part of theDebug
impl forDocument
I also merged master and resolved a couple conflicts (i merged instead of rebasing to keep @kov's commit unchanged). My additions are in a single commit so that the original commit from @kov can be kept unchanged when merging / rebasing this branch.
I’m not super satisfied withnot an issue any more.editor.save()
taking aConfig
instead of getting it viaself
, but sinceeditor.save()
borrows self asmut
, i was not able to borrow it immutably at the same time.