Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for external file modifications when writing #5805

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions book/src/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ on unix operating systems.
| `rulers` | List of column positions at which to display the rulers. Can be overridden by language specific `rulers` in `languages.toml` file. | `[]` |
| `bufferline` | Renders a line at the top of the editor displaying open buffers. Can be `always`, `never` or `multiple` (only shown if more than one buffer is in use) | `never` |
| `color-modes` | Whether to color the mode indicator with different colors depending on the mode itself | `false` |
| `prevent-external-modifications` | Prevent saving a file that has been modified from the outside, based on its last modification date (`mtime`). Disabling this check can be useful on systems where `mtime` values are unreliable | `true` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant about adding a config option here. We should check for a nonsensical mtime value, but a user should also be able to manually bump the file via touch or just use :w! to bypass the check

Copy link
Contributor Author

@divarvel divarvel Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added at the request of @the-mikedavis: #4734 (comment)

:w! already works, so removing the configuration bit would make things more annoying on systems with unreliable mtime, but not impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also give the setting a scary name and a scarier description if it makes sense to keep it (personally, i would rather use :w! than disabling the check entirely, but that's just me)

Copy link
Member

@pascalkuthe pascalkuthe Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a config option was originally requested by @the-mikedavis: #4734 (comment). He thought thought that fighting the editor (w!) for every write would be annoying in those nieche cases.

Detecting non-sensival mtime is a bit hard. We xould vhrck that the mtime is between the last saved mtime and now but here is usually a certain amount of racing going with tien/fs events so that could lead to false positives. That also still doesn't necessarily cstvh all false positives.

We could changes this option to fallback to sha1 hashes instead. Mtime is preferable bbecause it js much faster to retrieve and usually reliable. We don't usually want to compute hashes because it requires reading the file again and sha1 runs at around 250MB/sec so there would be some delay for multi gig files (although @cessen has developed a non-cryptographic alternative hash that is much faster but it's not quite mature enough yet but we could switch to that in the future)

I thaugh just adding an option to toggle this on/off would be easiest. What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although @cessen has developed a non-cryptographic alternative hash that is much faster

A couple of thoughts about using hashes:

  1. In the simple case where you just want to see if the current in-memory version of the file is consistent with the on-disk version, I suspect the fastest thing is to just directly compare their contents. Hashing requires not only reading the file, but also doing the extra work to compute a hash, which even for very fast hashes I suspect is slower than a simple equality test with the in-memory data.
  2. The case where hashes come in useful is when you don't have the data to directly compare against, because storing it doesn't make sense for one reason or another. This could be the case, for example, if the in-memory version of the file has unsaved changes and thus you want to check against the last saved state rather than the current in-memory state. But with Ropey as the backing buffer, I wonder if it doesn't make sense to just clone the rope to track the last-saved state, since rope clones share data, so the only storage overhead would be modified nodes/chunks.

I think where a hash is really going to come in handy is for things like persistent undo, where the hash can be stored with the on-disk undo data so that it can be appropriately invalidated if the file has been changed the next time it's loaded. Because you probably don't want to be storing an entire copy of the file just for undo stack invalidation.

Copy link
Contributor

@cessen cessen Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we shkuld read the file into a buffer that is initalzied with 2KB (chunks are tsually 1k but can be a bit larger I believe, I know you don't want to expose ropey internals like that so I would just heal allocate and grow if a chunk really is larger).

I don't think you need to get that fancy. If you take a look at Ropey's PartialEq impl for RopeSlice, it solves basically the same problem (chunks not aligning) without any allocations. (In fact, IIRC you fixed a bug in one of those impls.) I think you can do the same thing here, which would let you read the file in fixed-size chunks, no additional allocations needed.

Copy link
Contributor

@cessen cessen Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this approach sort of just defers the clone in a sense

In the sense of actually duplicating data in memory, yeah. And it does that deferred cloning incrementally as well (a single edit will only duplicate log N of the tree's data).

It doesn't really cause any measuable peformance overhead tough

In benchmarks post-clone edits are notably slower (by several times, I think). But given that Ropey already does 1-3 million edits a second (e.g. you could have well over 10,000 cursors and still edit at 60fps), in practice I doubt being a handful of times slower is going to have any real impact. Also, after the first edit in an area of the text, more edits to the same area won't have the post-cloning performance overhead anymore.

Then again, hashing instead of cloning could still make sense. The benefit of the hashing approach is that it only happens during syncing with disk, so even if it's a bit slower than a direct comparison, it probably won't impact usability in any meaningful way. Whereas holding on to a clone impacts the performance of all initial edits after a sync. Additionally, if we're eventually going to use hashing for persistent undo invalidation, then we'll be computing the hash on sync anyway.

Dunno. Both approaches have tradeoffs. I mainly just wanted to put the idea out there as a viable alternative to consider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right about cloning the rope here instead of hashing.

Just to clarify with diff gutters we currently always pay the overhead of cloning each chuck when it's edited.
The diff worker holds a clone of the rope in a different thread which is usually only replaced by the new rope shortly after the edit is finished.

I definitely want to remove this cost at some-point. That will likely require the addition of a WeakRope to ropey (just a wrapper around std::arc::Weak which can be upgraded back to a full Rope).

But since this not a big problem in practice I think this simpler approach would be better suited than storing a hash since for this particular case this problem would only appear once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be ok with leaving out the option from this PR. We can revisit it if we see reports of the mtime check failing. I don't really remember the details of when I was fighting with mtime - I doubt that I will run into it again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've removed the option


### `[editor.statusline]` Section

Expand Down
2 changes: 1 addition & 1 deletion helix-term/tests/test/commands.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ async fn test_buffer_close_concurrent() -> anyhow::Result<()> {
const RANGE: RangeInclusive<i32> = 1..=1000;

for i in RANGE {
let cmd = format!("%c{}<esc>:w<ret>", i);
let cmd = format!("%c{}<esc>:w!<ret>", i);
command.push_str(&cmd);
}

Expand Down
6 changes: 6 additions & 0 deletions helix-term/tests/test/helpers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,12 @@ impl AppBuilder {
}
}

pub async fn run_event_loop_until_idle(app: &mut Application) {
let (_, rx) = tokio::sync::mpsc::unbounded_channel();
let mut rx_stream = UnboundedReceiverStream::new(rx);
app.event_loop_until_idle(&mut rx_stream).await;
}

pub fn assert_file_has_content(file: &mut File, content: &str) -> anyhow::Result<()> {
file.flush()?;
file.sync_all()?;
Expand Down
36 changes: 34 additions & 2 deletions helix-term/tests/test/write.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use std::{
io::{Read, Write},
io::{Read, Seek, SeekFrom, Write},
ops::RangeInclusive,
};

Expand Down Expand Up @@ -37,6 +37,38 @@ async fn test_write() -> anyhow::Result<()> {
Ok(())
}

#[tokio::test(flavor = "multi_thread")]
async fn test_overwrite_protection() -> anyhow::Result<()> {
let mut file = tempfile::NamedTempFile::new()?;
let mut app = helpers::AppBuilder::new()
.with_file(file.path(), None)
.build()?;

helpers::run_event_loop_until_idle(&mut app).await;

file.as_file_mut()
.write_all(helpers::platform_line("extremely important content").as_bytes())?;

file.as_file_mut().flush()?;
file.as_file_mut().sync_all()?;

test_key_sequence(&mut app, Some(":x<ret>"), None, false).await?;

file.as_file_mut().flush()?;
file.as_file_mut().sync_all()?;

file.seek(SeekFrom::Start(0))?;
let mut file_content = String::new();
file.as_file_mut().read_to_string(&mut file_content)?;

assert_eq!(
helpers::platform_line("extremely important content"),
file_content
);

Ok(())
}

#[tokio::test(flavor = "multi_thread")]
async fn test_write_quit() -> anyhow::Result<()> {
let mut file = tempfile::NamedTempFile::new()?;
Expand Down Expand Up @@ -76,7 +108,7 @@ async fn test_write_concurrent() -> anyhow::Result<()> {
.build()?;

for i in RANGE {
let cmd = format!("%c{}<esc>:w<ret>", i);
let cmd = format!("%c{}<esc>:w!<ret>", i);
command.push_str(&cmd);
}

Expand Down
27 changes: 26 additions & 1 deletion helix-view/src/document.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ use std::future::Future;
use std::path::{Path, PathBuf};
use std::str::FromStr;
use std::sync::Arc;
use std::time::SystemTime;

use helix_core::{
encoding,
Expand Down Expand Up @@ -135,6 +136,10 @@ pub struct Document {

pub savepoint: Option<Transaction>,

// Last time we wrote to the file. This will carry the time the file was last opened if there
// were no saves.
last_saved_time: SystemTime,

last_saved_revision: usize,
version: i32, // should be usize?
pub(crate) modified_since_accessed: bool,
Expand All @@ -160,6 +165,7 @@ impl fmt::Debug for Document {
.field("changes", &self.changes)
.field("old_state", &self.old_state)
// .field("history", &self.history)
.field("last_saved_time", &self.last_saved_time)
.field("last_saved_revision", &self.last_saved_revision)
.field("version", &self.version)
.field("modified_since_accessed", &self.modified_since_accessed)
Expand Down Expand Up @@ -382,6 +388,7 @@ impl Document {
version: 0,
history: Cell::new(History::default()),
savepoint: None,
last_saved_time: SystemTime::now(),
last_saved_revision: 0,
modified_since_accessed: false,
language_server: None,
Expand Down Expand Up @@ -577,9 +584,13 @@ impl Document {

let encoding = self.encoding;

let last_saved_time = self.last_saved_time;

let prevent_external_modifications = self.config.load().prevent_external_modifications;

// We encode the file according to the `Document`'s encoding.
let future = async move {
use tokio::fs::File;
use tokio::{fs, fs::File};
if let Some(parent) = path.parent() {
// TODO: display a prompt asking the user if the directories should be created
if !parent.exists() {
Expand All @@ -591,6 +602,17 @@ impl Document {
}
}

// Protect against overwriting changes made externally
if !force && prevent_external_modifications {
if let Ok(metadata) = fs::metadata(&path).await {
if let Ok(mtime) = metadata.modified() {
if last_saved_time < mtime {
bail!("file modified by an external process, use :w! to overwrite");
}
}
}
}

let mut file = File::create(&path).await?;
to_writer(&mut file, encoding, &text).await?;

Expand Down Expand Up @@ -668,6 +690,8 @@ impl Document {
self.append_changes_to_history(view);
self.reset_modified();

self.last_saved_time = SystemTime::now();

self.detect_indent_and_line_ending();

match provider_registry.get_diff_base(&path) {
Expand Down Expand Up @@ -1016,6 +1040,7 @@ impl Document {
rev
);
self.last_saved_revision = rev;
self.last_saved_time = SystemTime::now();
}

/// Get the document's latest saved revision.
Expand Down
3 changes: 3 additions & 0 deletions helix-view/src/editor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,8 @@ pub struct Config {
/// Whether to color modes with different colors. Defaults to `false`.
pub color_modes: bool,
pub soft_wrap: SoftWrap,
/// Whether to check for external modifications upon saving. Defaults to `true`.
pub prevent_external_modifications: bool,
divarvel marked this conversation as resolved.
Show resolved Hide resolved
}

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
Expand Down Expand Up @@ -764,6 +766,7 @@ impl Default for Config {
indent_guides: IndentGuidesConfig::default(),
color_modes: false,
soft_wrap: SoftWrap::default(),
prevent_external_modifications: true,
}
}
}
Expand Down