document content-based chunk hashing (#16)

evanw · Jul 29, 2020 · dcf5560 · dcf5560
1 parent 3f04a08
commit dcf5560
Show file tree

Hide file tree

Showing 2 changed files with 56 additions and 0 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,21 @@
 # Changelog
 
+## Unreleased
+
+* Code splitting chunks now use content hashes ([#16](https://github.com/evanw/esbuild/issues/16))
+
+    Code that is shared between multiple entry points is separated out into "chunk" files when code splitting is enabled. These files are named `chunk.HASH.js` where `HASH` is a string of characters derived from a hash (e.g. `chunk.iJkFSV6U.js`).
+
+    Previously the hash was computed from the paths of all entry points which needed that chunk. This was done because it was a simple way to ensure that each chunk was unique, since each chunk represents shared code from a unique set of entry points. But it meant that changing the contents of the chunk did not cause the chunk name to change.
+
+    Now the hash is computed from the contents of the chunk file instead. This better aligns esbuild with the behavior of other bundlers. If changing the contents of the file always causes the name to change, you can serve these files with a very large `max-age` so the browser knows to never re-request them from your server if they are already cached.
+
+    Note that the names of entry points _do not_ currently contain a hash, so this optimization does not apply to entry points. Do not serve entry point files with a very large `max-age` or the browser may not re-request them even when they are updated. Including a hash in the names of entry point files has not been done in this release because that would be a breaking change. This release is an intermediate step to a state where all output file names contain content hashes.
+
+    The reason why this hasn't been done before now is because this change makes chunk generation more complex. Generating the contents of a chunk involves generating import statements for the other chunks which that chunk depends on. However, if chunk names now include a content hash, chunk generation must wait until the dependency chunks have finished. This more complex behavior has now been implemented.
+
+    Care was taken to still parallelize as much as possible despite parts of the code having to block. Each input file in a chunk is still printed to a string fully in parallel. Waiting was only introduced in the chunk assembly stage where input file strings are joined together. In practice, this change doesn't appear to have slowed down esbuild by a noticeable amount.
+
 ## 0.6.10
 
 * Revert the binary operator chain change

diff --git a/internal/bundler/bundler_splitting_test.go b/internal/bundler/bundler_splitting_test.go
@@ -708,3 +708,43 @@ export {
 		},
 	})
 }
+
+func TestSplittingDuplicateChunkCollision(t *testing.T) {
+	expectBundled(t, bundled{
+		files: map[string]string{
+			"/a.js": `
+				import "./ab"
+			`,
+			"/b.js": `
+				import "./ab"
+			`,
+			"/c.js": `
+				import "./cd"
+			`,
+			"/d.js": `
+				import "./cd"
+			`,
+			"/ab.js": `
+				console.log(123)
+			`,
+			"/cd.js": `
+				console.log(123)
+			`,
+		},
+		entryPaths: []string{"/a.js", "/b.js", "/c.js", "/d.js"},
+		options: config.Options{
+			IsBundling:       true,
+			CodeSplitting:    true,
+			RemoveWhitespace: true,
+			OutputFormat:     config.FormatESModule,
+			AbsOutputDir:     "/out",
+		},
+		expected: map[string]string{
+			"/out/a.js":              "import\"./chunk.sQ4Fr0TC.js\";\n",
+			"/out/b.js":              "import\"./chunk.sQ4Fr0TC.js\";\n",
+			"/out/c.js":              "import\"./chunk.sQ4Fr0TC.js\";\n",
+			"/out/d.js":              "import\"./chunk.sQ4Fr0TC.js\";\n",
+			"/out/chunk.sQ4Fr0TC.js": "console.log(123);\n",
+		},
+	})
+}