Skip to content

Commit

Permalink
More fingerprint and metadata comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
ehuss committed Apr 9, 2020
1 parent cd396f3 commit 14e86cc
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 31 deletions.
17 changes: 16 additions & 1 deletion src/cargo/core/compiler/context/compilation_files.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,20 @@ use crate::core::compiler::{CompileMode, CompileTarget, Unit};
use crate::core::{Target, TargetKind, Workspace};
use crate::util::{self, CargoResult};

/// The `Metadata` is a hash used to make unique file names for each unit in a build.
/// The `Metadata` is a hash used to make unique file names for each unit in a
/// build. It is also use for symbol mangling.
///
/// For example:
/// - A project may depend on crate `A` and crate `B`, so the package name must be in the file name.
/// - Similarly a project may depend on two versions of `A`, so the version must be in the file name.
///
/// In general this must include all things that need to be distinguished in different parts of
/// the same build. This is absolutely required or we override things before
/// we get chance to use them.
///
/// It is also used for symbol mangling, because if you have two versions of
/// the same crate linked together, their symbols need to be differentiated.
///
/// We use a hash because it is an easy way to guarantee
/// that all the inputs can be converted to a valid path.
///
Expand All @@ -39,6 +45,15 @@ use crate::util::{self, CargoResult};
/// more space than needed. This makes not including something in `Metadata`
/// a form of cache invalidation.
///
/// You should also avoid anything that would interfere with reproducible
/// builds. For example, *any* absolute path should be avoided. This is one
/// reason that `RUSTFLAGS` is not in `Metadata`, because it often has
/// absolute paths (like `--remap-path-prefix` which is fundamentally used for
/// reproducible builds and has absolute paths in it). Also, in some cases the
/// mangled symbols need to be stable between different builds with different
/// settings. For example, profile-guided optimizations need to swap
/// `RUSTFLAGS` between runs, but needs to keep the same symbol names.
///
/// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a
/// rebuild is needed.
#[derive(Copy, Clone, Hash, Eq, PartialEq, Ord, PartialOrd)]
Expand Down
90 changes: 60 additions & 30 deletions src/cargo/core/compiler/fingerprint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,30 @@
//! (needs to be recompiled) or "fresh" (it does not need to be recompiled).
//! There are several mechanisms that influence a Unit's freshness:
//!
//! - The `Metadata` hash isolates each Unit on the filesystem by being
//! embedded in the filename. If something in the hash changes, then the
//! output files will be missing, and the Unit will be dirty (missing
//! outputs are considered "dirty").
//! - The `Fingerprint` is another hash, saved to the filesystem in the
//! `.fingerprint` directory, that tracks information about the inputs to a
//! Unit. If any of the inputs changes from the last compilation, then the
//! Unit is considered dirty. A missing fingerprint (such as during the
//! first build) is also considered dirty.
//! - Whether or not input files are actually present. For example a build
//! script which says it depends on a nonexistent file `foo` is always rerun.
//! - Propagation throughout the dependency graph of file modification time
//! information, used to detect changes on the filesystem. Each `Fingerprint`
//! keeps track of what files it'll be processing, and when necessary it will
//! check the `mtime` of each file (last modification time) and compare it to
//! dependencies and output to see if files have been changed or if a change
//! needs to force recompiles of downstream dependencies.
//! - The `Fingerprint` is a hash, saved to the filesystem in the
//! `.fingerprint` directory, that tracks information about the Unit. If the
//! fingerprint is missing (such as the first time the unit is being
//! compiled), then the unit is dirty. If any of the fingerprint fields
//! change (like the name of the source file), then the Unit is considered
//! dirty.
//!
//! The `Fingerprint` also tracks the fingerprints of all its dependencies,
//! so a change in a dependency will propagate the "dirty" status up.
//!
//! - Filesystem mtime tracking is also used to check if a unit is dirty.
//! See the section below on "Mtime comparison" for more details. There
//! are essentially two parts to mtime tracking:
//!
//! 1. The mtime of a Unit's output files is compared to the mtime of all
//! its dependencies' output file mtimes (see `check_filesystem`). If any
//! output is missing, or is older than a dependency's output, then the
//! unit is dirty.
//! 2. The mtime of a Unit's source files is compared to the mtime of its
//! dep-info file in the fingerprint directory (see `find_stale_file`).
//! The dep-info file is used as an anchor to know when the last build of
//! the unit was done. See the "dep-info files" section below for more
//! details. If any input files are missing, or are newer than the
//! dep-info, then the unit is dirty.
//!
//! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking
//! is notoriously imprecise and problematic. Only a small part of the
Expand All @@ -33,6 +40,12 @@
//!
//! ## Fingerprints and Metadata
//!
//! The `Metadata` hash is a hash added to the output filenames to isolate
//! each unit. See the documentation in the `compilation_files` module for
//! more details. NOTE: Not all output files are isolated via filename hashes
//! (like dylibs), but the fingerprint directory always has the `Metadata`
//! hash in its directory name.
//!
//! Fingerprints and Metadata are similar, and track some of the same things.
//! The Metadata contains information that is required to keep Units separate.
//! The Fingerprint includes additional information that should cause a
Expand Down Expand Up @@ -69,10 +82,11 @@
//!
//! When deciding what should go in the Metadata vs the Fingerprint, consider
//! that some files (like dylibs) do not have a hash in their filename. Thus,
//! if a value changes, only the fingerprint will detect the change. Fields
//! that are only in Metadata generally aren't relevant to the fingerprint
//! because they fundamentally change the output (like target vs host changes
//! the directory where it is emitted).
//! if a value changes, only the fingerprint will detect the change (consider,
//! for example, swapping between different features). Fields that are only in
//! Metadata generally aren't relevant to the fingerprint because they
//! fundamentally change the output (like target vs host changes the directory
//! where it is emitted).
//!
//! ## Fingerprint files
//!
Expand Down Expand Up @@ -378,19 +392,35 @@ pub fn prepare_target<'a, 'cfg>(

// Clear out the old fingerprint file if it exists. This protects when
// compilation is interrupted leaving a corrupt file. For example, a
// project with a lib.rs and integration test:
// project with a lib.rs and integration test (two units):
//
// 1. Build the integration test.
// 2. Make a change to lib.rs.
// 3. Build the integration test, hit Ctrl-C while linking (with gcc).
// 1. Build the library and integration test.
// 2. Make a change to lib.rs (NOT the integration test).
// 3. Build the integration test, hit Ctrl-C while linking. With gcc, this
// will leave behind an incomplete executable (zero size, or partially
// written). NOTE: The library builds successfully, it is the linking
// of the integration test that we are interrupting.
// 4. Build the integration test again.
//
// Without this line, then step 4 will think the integration test is
// "fresh" because the mtime of the output file is newer than all of its
// dependencies. But the executable is corrupt and needs to be rebuilt.
// Clearing the fingerprint ensures that Cargo never mistakes it as
// up-to-date until after a successful build.
// Without the following line, then step 3 will leave a valid fingerprint
// on the disk. Then step 4 will think the integration test is "fresh"
// because:
//
// - There is a valid fingerprint hash on disk (written in step 1).
// - The mtime of the output file (the corrupt integration executable
// written in step 3) is newer than all of its dependencies.
// - The mtime of the integration test fingerprint dep-info file (written
// in step 1) is newer than the integration test's source files, because
// we haven't modified any of its source files.
//
// But the executable is corrupt and needs to be rebuilt. Clearing the
// fingerprint at step 3 ensures that Cargo never mistakes a partially
// written output as up-to-date.
if loc.exists() {
// Truncate instead of delete so that compare_old_fingerprint will
// still log the reason for the fingerprint failure instead of just
// reporting "failed to read fingerprint" during the next build if
// this build fails.
paths::write(&loc, b"")?;
}

Expand Down

0 comments on commit 14e86cc

Please sign in to comment.