Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Mmap to open the rmeta file. #55556

Merged
merged 1 commit into from
Nov 15, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2278,12 +2278,14 @@ version = "0.0.0"
dependencies = [
"flate2 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.5 (registry+https://github.com/rust-lang/crates.io-index)",
"memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"proc_macro 0.0.0",
"rustc 0.0.0",
"rustc_data_structures 0.0.0",
"rustc_errors 0.0.0",
"rustc_target 0.0.0",
"serialize 0.0.0",
"stable_deref_trait 1.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
"syntax 0.0.0",
"syntax_ext 0.0.0",
"syntax_pos 0.0.0",
Expand Down
4 changes: 3 additions & 1 deletion src/librustc_metadata/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@ crate-type = ["dylib"]
[dependencies]
flate2 = "1.0"
log = "0.4"
memmap = "0.6"
proc_macro = { path = "../libproc_macro" }
rustc = { path = "../librustc" }
rustc_target = { path = "../librustc_target" }
rustc_data_structures = { path = "../librustc_data_structures" }
rustc_errors = { path = "../librustc_errors" }
rustc_target = { path = "../librustc_target" }
serialize = { path = "../libserialize" }
stable_deref_trait = "1.0.0"
syntax = { path = "../libsyntax" }
syntax_ext = { path = "../libsyntax_ext" }
syntax_pos = { path = "../libsyntax_pos" }
2 changes: 2 additions & 0 deletions src/librustc_metadata/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
extern crate libc;
#[macro_use]
extern crate log;
extern crate memmap;
extern crate stable_deref_trait;
#[macro_use]
extern crate syntax;
extern crate syntax_pos;
Expand Down
26 changes: 23 additions & 3 deletions src/librustc_metadata/locator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -243,12 +243,14 @@ use std::cmp;
use std::fmt;
use std::fs;
use std::io::{self, Read};
use std::ops::Deref;
use std::path::{Path, PathBuf};
use std::time::Instant;

use flate2::read::DeflateDecoder;

use rustc_data_structures::owning_ref::OwningRef;

pub struct CrateMismatch {
path: PathBuf,
got: String,
Expand Down Expand Up @@ -856,6 +858,19 @@ fn get_metadata_section(target: &Target,
return ret;
}

/// A trivial wrapper for `Mmap` that implements `StableDeref`.
struct StableDerefMmap(memmap::Mmap);

impl Deref for StableDerefMmap {
type Target = [u8];

fn deref(&self) -> &[u8] {
self.0.deref()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred way to write this is &*self.0, but YMMV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I agree, but I think this way is clearer within a deref method :)

}
}

unsafe impl stable_deref_trait::StableDeref for StableDerefMmap {}

fn get_metadata_section_imp(target: &Target,
flavor: CrateFlavor,
filename: &Path,
Expand Down Expand Up @@ -892,9 +907,14 @@ fn get_metadata_section_imp(target: &Target,
}
}
CrateFlavor::Rmeta => {
let buf = fs::read(filename).map_err(|_|
format!("failed to read rmeta metadata: '{}'", filename.display()))?;
rustc_erase_owner!(OwningRef::new(buf).map_owner_box())
// mmap the file, because only a small fraction of it is read.
let file = std::fs::File::open(filename).map_err(|_|
format!("failed to open rmeta metadata: '{}'", filename.display()))?;
let mmap = unsafe { memmap::Mmap::map(&file) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obligatory "not actually safe to map this because of concurrent modification"

See danburkert/memmap-rs#25.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do some simple locking here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the inherent unsafety be clarified? I'm not super familiar with the relation between mmap and concurrent modification, but rustc does sometimes load (and later reject) files which are being concurrently modified. This can happen when cargo fires of a number of rustc builds, and as one is loading dependency information another is writing that information. The one loading may read a halfway-written file, but it'll be guaranteed to reject it at some point later because it's not actually needed (or otherwise Cargo would sequence the compilations differently).

In light of this, though, is this something we actually need to fix in rustc or owrry about from an unsafety point of view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the problem: we mmap the file's contents into a Vec at T1, and that Vec's contents are theoretically set. However, in practice they aren't actually set until they are read, at T2, whereupon the contents will be page-faulted in. But it's possible that the file's contents have changed between T1 and T2. (Even worse, there is actually a different T2 for every separate page in the Vec.)

File locking won't help unless we lock the file for the entire period that its contents may be read, which doesn't seem like a good idea.

Also, even if you use the normal file-reading operations, there's still a chance you'll end up with inconsistent data if the file is overwritten in the middle of that operation. But the window of opportunity is potentially much smaller. (I think this is what @alexcrichton is referring to.)

Given that the gains turned out to be minor, I would be ok with abandoning this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the gains turned out to be minor, I would be ok with abandoning this.

They did? In both time and memory? That's a shame, then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be legal for LLVM to optimize 4 relaxed atomic 1-byte loads into one relaxed atomic 4-byte load. (The same optimization would not be legal for volatile.) Merging loads is okay with atomics, just splitting them is not. I am not sure if LLVM does anything like that though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping it would do something like that, but then my code was 10x slower when I used relaxed atomics :( I didn't spend much time wrestling with it, though, and I'm no optimization expert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I expect that there is lots of unused potential around optimizations for atomics. But of course that doesn't help right now, that would take someone to improve LLVM's analyses.

As far as this PR is concerned, the good news is that this is UB only if mutation actually happens. The fact that this is shared is invisible to the program until something else writes to it. That is unlikely to happen. So the end result is that the compiler may fail in arbitrarily unexpected ways if someone mutates the mmap'ed file, but absent that nothing strange can happen.

There's no way to ask the kernel to do copy-on-write for an mmap'ed file so that memory, once actually mapped (on the first access), is frozen and not affected by others mutating the file -- is there?

Copy link
Member

@eddyb eddyb Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that I think we have to read things byte by byte anyway, so I have a doubt that AtomicU8 will pessimize anything.

EDIT: other than maybe getting a &[u8] to read a String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the whole point of this PR is that typically only 10--20% of the mapped memory in question is actually read. There are no loops here, so the overhead of atomics are probably not high.

let mmap = mmap.map_err(|_|
format!("failed to mmap rmeta metadata: '{}'", filename.display()))?;

rustc_erase_owner!(OwningRef::new(StableDerefMmap(mmap)).map_owner_box())
}
};
let blob = MetadataBlob(raw_bytes);
Expand Down