-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safety of Mmap::as_slice ? #25
Comments
See also: #4 |
So is the conclusion that file-based mmap is basically impossible to use safely? (Unless you somehow control the entire filesystem.) This does not match how common this pattern seems to be. |
@SimonSapin I feel similarly as you, but the argument in favor of |
Yep, @BurntSushi is correct. I wouldn't expect something like a system package manager updating a package to be an issue, since I would expect it to write a new file instead of modifying an existing one, but obviously that's down to the implementation.
It's impossible for this library to guarantee that the mmap is being used correctly, but it's not impossible for a combination of the application developer and operator. File locks (even advisory), permissions, containers, etc. can all be used to make sure the invariants aren't broken. |
Apologies, but I'm confused. When would safe code mutate a As an aside, an interesting way to address this for new rust code might be another crate wrapping this crate that provides a family of checksums on the file contents and updates them when some reference guard type goes out of scope. You'd want a checksum that was locally mailable in the data in O(1) time, so no cryptographic checksums like polay1305. Also, if the checksum were mailable in both the key and data, then you could mutate the key when mapping the file, so that the current file's checksum key would never exist on disk until you closed it, or maybe wrote it out as part of a commit operation and updated it in memory again. Any processes that want to share the file could share the in-memory key via another IPC channel. All this malleability means a weak checksum, but one could make up for that by making it larger. |
@burdges The |
I miss-read your statement up thread as saying that marking the function as |
@burdges Ah I see ya, definitely didn't mean that! Apologies. |
Couldn't it return an &[Cell] in order to be safe? |
@ubsan Can you expand more on that suggestion? What is |
Seems reasonable. |
Wouldn't simply opening File with write access to ensure that no other process could access it? (at least on Windows I think it would give you some guarantees) |
@DoumanAsh "at least on Windows" I think is the key part. :-) |
Sadly i'm not expert in regard of Linux, but at least there is some capabilities to set lock through (fcntl)[http://man7.org/linux/man-pages/man2/fcntl.2.html] So i suppose it all comes to OS capabilities. Then there is non-posix thing that would work only on Linux Considering that |
Right, advisory locking is opt-in. An application itself can use it to provide a form of internally consistent safety, but it can never prevent some other random process from mutating its |
To summarize, are we saying that concurrently modifying a memmap'd file to which Rust holds a &[u8] or &mut [u8] is officially Undefined Behavior, but "probably" will not cause anything bad if you are just reading bytes out? |
@dtolnay I mean, yeah? but if you're not just reading bytes out, it'll probably bite you at some point. It's the same issue many old C++ codebases are having with newer compilers that assume you're not doing multithreaded accesses without atomics; this would be equivalent to multithreaded accesses without atomics. |
One thing that worries me is that we're changing the API to have a safe deref to |
@danburkert I mean, if you're not doing UB, you're fine ;) |
@ubsan Could you elaborate on that please? |
@BurntSushi Basically, you need to use s/volatile/atomic, thanks talchas |
Would it be possible to open the file, i.e. have a file-handle, delete the file from the filesystem, check that only one file handle to that file exists by iterating through all open files and then return to the user? |
@iqualfragile what specifically do you mean by un-deleted? |
well, you would have to restore the file to the file system when you close the mmap (on drop probably) otherwise the file is just gone. You got a file descriptor so you can re-add it to the file system. |
@iqualfragile what specifically do you mean by re-add it to the file system? I'm not aware of an API to do that (even OS/FS dependent), so I'm intrigued. If there's no API and it requires writing the entire file back out you may as well not use a mmap at all. |
on linux you can access the list of file descriptors of a process in /proc/pid/fd/descriptor. you can then use ln -L to re-link the file (inode) to a name in the file system tree. doing so does not invoke a copy. i do not know how ln does it though. |
doing some further reading this is archived by using the rather new linkat() syscall, passing the AT_SYMLINK_FOLLOW flag. |
Hmm, I don't think this an assumption that could really be baked in here. Inconsistent state doesn't necessarily mean irrecoverable state. |
true, but in that case you are building a database and are probably ok with the unsafe interface? Another even more specific option would be to snapshot/cp --reflink the file before deleting on filesystems that support it. Would effectively limit the full solution to work on linux using btrfs or zfs. (i think xfs is working on patches for cow and there are some patches to make ext do cow). |
The documentation for this method includes:
But since the filesystem is shared with other processes that might do anything, this seems very difficult to ensure. (I imagine that an application could be run in a container like Docker to give it a private filesystem?)
So what’s the worst that could happen? https://stackoverflow.com/questions/21286870/how-safe-are-memory-mapped-files-for-reading-input-files seems a bit hand-wavy but suggests: not much.
I’m considering using this method to read (hopefully more efficiently than with
File::read
) files that are usually not modified, but they might be modified for example when the system’s package manager updates them to a new version.I do not mind if reading a byte at the same location twice gives different values, or if reading two locations give inconsistent values (because a write has happened in between the two reads). This might cause my program to unexpectedly return
Err
or (safely) panic, but that’s ok.I do mind if this is Undefined Behavior of the sort that can cause anything to happen, including potentially being exploited for remote code execution or other fun stuff.
If only the former can happen, should this method really be
unsafe
?The text was updated successfully, but these errors were encountered: