Safety of Mmap::as_slice ? #25

SimonSapin · 2016-11-07T00:10:00Z

The documentation for this method includes:

Unsafety

The caller must ensure that the file is not concurrently modified.

But since the filesystem is shared with other processes that might do anything, this seems very difficult to ensure. (I imagine that an application could be run in a container like Docker to give it a private filesystem?)

So what’s the worst that could happen? https://stackoverflow.com/questions/21286870/how-safe-are-memory-mapped-files-for-reading-input-files seems a bit hand-wavy but suggests: not much.

I’m considering using this method to read (hopefully more efficiently than with File::read) files that are usually not modified, but they might be modified for example when the system’s package manager updates them to a new version.

I do not mind if reading a byte at the same location twice gives different values, or if reading two locations give inconsistent values (because a write has happened in between the two reads). This might cause my program to unexpectedly return Err or (safely) panic, but that’s ok.

I do mind if this is Undefined Behavior of the sort that can cause anything to happen, including potentially being exploited for remote code execution or other fun stuff.

If only the former can happen, should this method really be unsafe?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-11-07T00:32:39Z

See also: #4

SimonSapin · 2016-11-07T00:48:09Z

So is the conclusion that file-based mmap is basically impossible to use safely? (Unless you somehow control the entire filesystem.)

This does not match how common this pattern seems to be.

BurntSushi · 2016-11-07T00:52:42Z

@SimonSapin I feel similarly as you, but the argument in favor of unsafe is somewhat convincing. In particular, without unsafe one could have an &[u8] that can be mutated in safe code, which generally seems like a violation of what one would expect when given an &[u8].

danburkert · 2016-11-07T01:02:28Z

Yep, @BurntSushi is correct. I wouldn't expect something like a system package manager updating a package to be an issue, since I would expect it to write a new file instead of modifying an existing one, but obviously that's down to the implementation.

So is the conclusion that file-based mmap is basically impossible to use safely?

It's impossible for this library to guarantee that the mmap is being used correctly, but it's not impossible for a combination of the application developer and operator. File locks (even advisory), permissions, containers, etc. can all be used to make sure the invariants aren't broken.

burdges · 2016-11-07T14:30:27Z

Apologies, but I'm confused. When would safe code mutate a &[u8] that some unknown function gave it? Wouldn't only a &mut [u8] have that problem? I understand that a function could be marked unsafe to push out the "unsafe boundary" that Niko's blog recent posts discuss in relation to future optimizations, but you'll never know if you're the only owner of a &[u8] given to you by another crate, so that does not seem to apply. What am I missing?

As an aside, an interesting way to address this for new rust code might be another crate wrapping this crate that provides a family of checksums on the file contents and updates them when some reference guard type goes out of scope. You'd want a checksum that was locally mailable in the data in O(1) time, so no cryptographic checksums like polay1305. Also, if the checksum were mailable in both the key and data, then you could mutate the key when mapping the file, so that the current file's checksum key would never exist on disk until you closed it, or maybe wrote it out as part of a commit operation and updated it in memory again. Any processes that want to share the file could share the in-memory key via another IPC channel. All this malleability means a weak checksum, but one could make up for that by making it larger.

BurntSushi · 2016-11-07T14:32:56Z

@burdges The &[u8] from a memory map is backed by a file, which means it can be mutated by modifying the file itself.

burdges · 2016-11-07T14:48:06Z

I miss-read your statement up thread as saying that marking the function as unsafe changed the compiler's behavior in some more subtle way. In this case, the unsafe just reminds the programmer that providing a safe abstraction requires more work though. Ok

BurntSushi · 2016-11-07T14:51:10Z

@burdges Ah I see ya, definitely didn't mean that! Apologies.

strega-nil · 2016-12-05T05:29:41Z

Couldn't it return an &[Cell] in order to be safe?

BurntSushi · 2016-12-05T12:04:45Z

@ubsan Can you expand more on that suggestion? What is Cell?

burdges · 2016-12-05T14:30:53Z

I suppose @ubsan means std::cell::Cell<u8> but to answer the question..

Cell<T> is not Sync. AtomicU8 is Sync, but not stable, and rather expensive. If you want mmap then you likely want a lighter weight courser grained locking mechanism anyways.

strega-nil · 2016-12-06T01:59:36Z

Seems reasonable.

DoumanAsh · 2017-03-30T13:25:02Z

Wouldn't simply opening File with write access to ensure that no other process could access it? (at least on Windows I think it would give you some guarantees)

BurntSushi · 2017-03-30T13:29:55Z

@DoumanAsh "at least on Windows" I think is the key part. :-)

DoumanAsh · 2017-03-30T13:40:33Z

Sadly i'm not expert in regard of Linux, but at least there is some capabilities to set lock through (fcntl)[http://man7.org/linux/man-pages/man2/fcntl.2.html]

So i suppose it all comes to OS capabilities.
fcntl can lock through Advisory record locking, but doesn't guarantee safety i think?

Then there is non-posix thing that would work only on Linux
See section Open file description locks (non-POSIX)
Which is stil advisory...

Considering that Mandatory locking is buggy according to above link, there is no way on Linux and therefore no way to guarantee safety on unix\linux platforms

BurntSushi · 2017-03-30T13:43:38Z

Right, advisory locking is opt-in. An application itself can use it to provide a form of internally consistent safety, but it can never prevent some other random process from mutating its &[u8].

dtolnay · 2017-07-31T06:02:13Z

To summarize, are we saying that concurrently modifying a memmap'd file to which Rust holds a &[u8] or &mut [u8] is officially Undefined Behavior, but "probably" will not cause anything bad if you are just reading bytes out?

strega-nil · 2017-07-31T06:57:03Z

@dtolnay I mean, yeah? but if you're not just reading bytes out, it'll probably bite you at some point. It's the same issue many old C++ codebases are having with newer compilers that assume you're not doing multithreaded accesses without atomics; this would be equivalent to multithreaded accesses without atomics.

danburkert · 2017-08-17T02:42:56Z

One thing that worries me is that we're changing the API to have a safe deref to &[u8] on Mmap, but make it unsafe to create a Mmap from a file. @ubsan is this going to cause problems with the unsafe code guidelines, since it seems like it may allow the unsafe behavior (accessing shared mmap'd memory) to be far from the unsafe annotation (creating the mmap)?

strega-nil · 2017-08-18T07:47:36Z

@danburkert I mean, if you're not doing UB, you're fine ;)

BurntSushi · 2017-08-18T10:47:02Z

@ubsan Could you elaborate on that please?

strega-nil · 2017-08-19T01:18:50Z

@BurntSushi unsafe doesn't affect whether something is UB - at least, I'm really working hard to make that happen. Therefore, the location of the keyword doesn't matter. (and anyways, the kind of UB we're talking about here will never be defined, as it's equivalent to unprotected threaded alias).

Basically, you need to use atomic accesses for every single access to a MAP_SHARED file, if you're going to be changing the backing file. It doesn't matter where the unsafe goes.

s/volatile/atomic, thanks talchas

iqualfragile · 2018-06-30T10:05:34Z

Would it be possible to open the file, i.e. have a file-handle, delete the file from the filesystem, check that only one file handle to that file exists by iterating through all open files and then return to the user?
That would probably cause problems if the file is not un-deleted before the program exits.

danburkert · 2018-06-30T13:58:20Z

@iqualfragile what specifically do you mean by un-deleted?

iqualfragile · 2018-06-30T14:10:29Z

well, you would have to restore the file to the file system when you close the mmap (on drop probably) otherwise the file is just gone. You got a file descriptor so you can re-add it to the file system.

danburkert · 2018-06-30T17:15:55Z

@iqualfragile what specifically do you mean by re-add it to the file system? I'm not aware of an API to do that (even OS/FS dependent), so I'm intrigued.

If there's no API and it requires writing the entire file back out you may as well not use a mmap at all.

iqualfragile · 2018-06-30T17:29:23Z

on linux you can access the list of file descriptors of a process in /proc/pid/fd/descriptor. you can then use ln -L to re-link the file (inode) to a name in the file system tree. doing so does not invoke a copy. i do not know how ln does it though.
so basically: have a file open in a program, optionally delete it, cd to /proc/pid/fd/, ls -lah to find which descriptor you want, ln -L it to where you want it, if you delete it you can restore it to the same name, otherwise you get two hardlinked files.
hope that makes it clear.
thinking about it this solution is probably not as unworkable as i thought when i first suggested it because an unclean shutdown of your program tends to leave an inconsistent state on disk anyway. so it might be ok/better if the file is gone instead.

iqualfragile · 2018-06-30T17:41:52Z

doing some further reading this is archived by using the rather new linkat() syscall, passing the AT_SYMLINK_FOLLOW flag.

KodrAus · 2018-06-30T23:39:57Z

... an unclean shutdown of your program tends to leave an inconsistent state on disk anyway. so it might be ok/better if the file is gone instead.

Hmm, I don't think this an assumption that could really be baked in here. Inconsistent state doesn't necessarily mean irrecoverable state.

iqualfragile · 2018-07-01T08:11:20Z

true, but in that case you are building a database and are probably ok with the unsafe interface? Another even more specific option would be to snapshot/cp --reflink the file before deleting on filesystems that support it. Would effectively limit the full solution to work on linux using btrfs or zfs. (i think xfs is working on patches for cow and there are some patches to make ext do cow).

dtolnay mentioned this issue Aug 6, 2017

Triage the issue tracker #50

Closed

This was referenced Sep 4, 2018

Docs should say why file-backed maps are unsafe #70

Closed

Split library and binary eevee/idchoppers#6

Merged

cramertj mentioned this issue Nov 1, 2018

Use Mmap to open the rmeta file. rust-lang/rust#55556

Merged

quininer mentioned this issue Nov 11, 2020

Should make create Map to be unsafe? kalamay/vmap-rs#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety of Mmap::as_slice ? #25

Safety of Mmap::as_slice ? #25

SimonSapin commented Nov 7, 2016

Unsafety

BurntSushi commented Nov 7, 2016

SimonSapin commented Nov 7, 2016

BurntSushi commented Nov 7, 2016

danburkert commented Nov 7, 2016

burdges commented Nov 7, 2016 •

edited

Loading

BurntSushi commented Nov 7, 2016

burdges commented Nov 7, 2016

BurntSushi commented Nov 7, 2016

strega-nil commented Dec 5, 2016

BurntSushi commented Dec 5, 2016

burdges commented Dec 5, 2016

strega-nil commented Dec 6, 2016

DoumanAsh commented Mar 30, 2017

BurntSushi commented Mar 30, 2017

DoumanAsh commented Mar 30, 2017

BurntSushi commented Mar 30, 2017

dtolnay commented Jul 31, 2017

strega-nil commented Jul 31, 2017

danburkert commented Aug 17, 2017

strega-nil commented Aug 18, 2017

BurntSushi commented Aug 18, 2017

strega-nil commented Aug 19, 2017 •

edited

Loading

iqualfragile commented Jun 30, 2018

danburkert commented Jun 30, 2018

iqualfragile commented Jun 30, 2018

danburkert commented Jun 30, 2018

iqualfragile commented Jun 30, 2018 •

edited

Loading

iqualfragile commented Jun 30, 2018

KodrAus commented Jun 30, 2018

iqualfragile commented Jul 1, 2018

Safety of Mmap::as_slice ? #25

Safety of Mmap::as_slice ? #25

Comments

SimonSapin commented Nov 7, 2016

Unsafety

BurntSushi commented Nov 7, 2016

SimonSapin commented Nov 7, 2016

BurntSushi commented Nov 7, 2016

danburkert commented Nov 7, 2016

burdges commented Nov 7, 2016 • edited Loading

BurntSushi commented Nov 7, 2016

burdges commented Nov 7, 2016

BurntSushi commented Nov 7, 2016

strega-nil commented Dec 5, 2016

BurntSushi commented Dec 5, 2016

burdges commented Dec 5, 2016

strega-nil commented Dec 6, 2016

DoumanAsh commented Mar 30, 2017

BurntSushi commented Mar 30, 2017

DoumanAsh commented Mar 30, 2017

BurntSushi commented Mar 30, 2017

dtolnay commented Jul 31, 2017

strega-nil commented Jul 31, 2017

danburkert commented Aug 17, 2017

strega-nil commented Aug 18, 2017

BurntSushi commented Aug 18, 2017

strega-nil commented Aug 19, 2017 • edited Loading

iqualfragile commented Jun 30, 2018

danburkert commented Jun 30, 2018

iqualfragile commented Jun 30, 2018

danburkert commented Jun 30, 2018

iqualfragile commented Jun 30, 2018 • edited Loading

iqualfragile commented Jun 30, 2018

KodrAus commented Jun 30, 2018

iqualfragile commented Jul 1, 2018

burdges commented Nov 7, 2016 •

edited

Loading

strega-nil commented Aug 19, 2017 •

edited

Loading

iqualfragile commented Jun 30, 2018 •

edited

Loading