-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't build with webpack on content changes (no more data.json) #11982
Comments
I really like this approach and we can probably even make it better in the long run.
This one is really hard to solve if your build system does not have incremental builds itself. Perhaps Webpack will support this soon or a bazel rule that can help us with this. With webpack, we might be able to save the dep tree ourselves and manually traverse it to see if work needs to be done (unsure if this will work).
On Gatsby cloud, we could even use h2 push to reduce the network time so they get both requests at once.
We can still do something similar on the client to actually prefetch manifest files depending on all |
@wardpeet Just trying to understand your first comment. Maybe I'm missing something, but webpack provides a compilation hash for the entire build in the stats.json output. So every time there is a new build, there should be a new hash. This is what I'm proposing we use. To be clear, I'm not relying on any incremental functionality in webpack itself. Any time there is any source file change anywhere, we'll rebuild everything. We just won't have to rebuild on data changes.
Yep, exactly. I propose we keep Gatsby's existing functionality here. I.e it triggers a prefetch whenever a |
We could also use something like https://github.com/gaearon/react-side-effect for the |
@KyleAMathews nice, we'd still have the back references problem, but |
@KyleAMathews raised the idea of including query results in the new {
"componentChunkName": "...",
"data": {
"allMarkdownRemark": {}
},
"pageContext": {
"path": "/blah",
}
} The downside obviously is that query results would no longer be infinitely cacheable. But, since page-manifests force the browser to download something each time, it's not so bad. On the plus side, the server will be able to monitor whether the underlying file has changed and send back |
@Moocar sorry for taking so long to get back to you but compilation hash seems great, I was looking at file hashes, so don't mind that comment 😄 Looking forward to some code! |
Fixed by #13004 and released in |
Summary
I've been looking into ways to get rid of
data.json
. It's a complicated problem so any feedback on the below would be appreciated.Background
Gatsby writes a
data.json
(also calledpages-manifest
) file on every build that maps pages to theircomponentChunkName
anddataPath
. This is imported byasync-requires.js
, which is in turn imported byproduction-app.js
.The upsides of this approach are:
Link
elements, retrieve their "to" path, and start prefetching the data for each page the user might click on.A global
data.json
has downsides though:Solution: compilation-specific non-cached page-manifest
no more
data.json
, thereforeasync-requires.js
only contains import statements for components. Most sites with lots of pages use templated components so this should be a small file.Before building page html, we produce a page-manifest file. It contains the componentChunk name and dataPath for the page. It is named with a webpack compilation hash and json name.
[webpack-compilation-hash]/[jsonName]-manifest.json
cache-dir/static-entry.js
no longer referencesdata.json
. Instead, it reads the the page's manifest file. It also adds the webpack compilation hash to the window CDATA.When navigating to a page, the gatsby app behaves the same. Except that when resolving a page's component/dataPath, it makes a request for the page manifest file. It knows the compilation hash and json name, so can get this info. Once it has the manifest, it can use the dataPath to download the page's data.
When a query result changes, we only need to update that page's manifest. Since webpack has not been rerun, the data in the query result will be compatible with the running browser's component implementation (right? might need to double check this).
When a webpack rebuild occurs, we must generate new page manifest files for every page.
pros
data.json
data.json
. So prefetching can occur earlier.cons
NOTE: for sites with 100'000 pages, we'll end up with a compilation directory with 100'000 files. So we might need to use a similar approach to
static/d
to bucket those files under sub directories.Shout out to @pieh and @KyleAMathews for the ideas/brainstorming
Alternatives
drop compilation-specific component
In this approach, we'd just save the page-manifest without the compilation hash in the filename. The problem is that the page-manifest lists the componentChunkName, not a link to the actual component. So if a build occurs in the background and the component changes, the frontend won't know to refresh. It might then try and load the new query result into an old component resulting in all kinds of errors (e.g
field
is undefined onresult.data.node.field
).A middleground is to include the compilation hash as a field in each manifest. That way, at the very least, the frontend can compare the manifest compilation hash to its own to see if a rebuild has occured. The benefit of this would be less disk usage since we wouldn't have multiple manifests per page.
Calculate linked pages server side
Using react-tree-walker, while server-side rendering components, we can walk the page component and find all the
Link
elements. We therefore know which pages a page links to at build time. In theory therefore, we could construct alinked-pages.json
for each page. For each linked page, it would include thecomponentChunkName
,dataPath
, andlinkedPagePath
. The entire file would be content hashed and immutable so that it was specific to this build. And then, when we build each page html, we could reference its linkedPagePath in thewindow
data.Now, when we navigate to a new page, the app simply looks up the appropriate entry in linkedPages, and knows exactly which component and dataPath to load. Even better, it also knows the navigated to page's linkedPagesPath too, so can repeat the process.
Unfortuantely, I can't figure out a way to reliably build the
linkedPage.json
files for each path. The problem is that when you draw out a graph of the links on a site, there are many back references. E.g index -> blogs -> blog -> index (via header div for example). So it's a cyclic graph, and we therefore can't build a dependency graph due to circular dependencies.path-dependant buckets of page metadata
@kyle wrote an awesome PR (#6651) that produces buckets of page manifests depending on path segments. It would work great if we were going to run webpack on every data change, but that's what we're trying to avoid.
Related Issues
The text was updated successfully, but these errors were encountered: