-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve recompilation avoidance in the presence of TH #2316
Conversation
This is too big to review in one go. Any chance that you could break it in smaller PRs? |
I guess I could separate out the VFS and versioning changes out, but they aren't that big. The bulk of it is synergystic changes to the recompilation checking logic, it is not immediately clear to me how it could be separated out |
@shapr you might want to try this out. |
This is fantastic, thank you = D Is this synergistic with @pepeiborra architecture changes in the v1.5 release that he discussed in his recent talk, or is there some overlap? |
I wish rebase web UI of allowed to do it in many commits. Many of these files seem pretty rebaseable. |
@wz1000 What is the status of this PR from your point of view? @pepeiborra explained that it was a huge perf gain for HLS use cases, and would work for previous GHCs as well, so I'd love to see it pushed through. Do you need help? Thank you for doing this! |
3c0a9f6
to
e84c6ed
Compare
Looks like it needs a rebase and CI is failing. |
e84c6ed
to
dc08ce4
Compare
This patch does two things: 1. It allows us to track the versions of `Values` which don't come from the VFS, as long as those particular `Values` depended on the `GetModificationTime` rule This is necessary for the recompilation avoidance scheme implemented in #2316 2. It removes the VFSHandle type and instead relies on snapshots of the VFS state taken on every rebuild of the shake session to ensure that we see a consistent VFS state throughout each individual build. With regards to 2, this is necessary because the lsp library mutates its VFS file store as changes come in. This can lead to scenarios where the HLS build session can see inconsistent views of the VFS. One such scenario is. 1. HLS build starts, with VFS state A 2. LSP Change request comes in and lsp updates its internal VFS state to B 3. HLS build continues, now consulting VFS state B 4. lsp calls the HLS file change handler, interrupting the build and restarting it. However, the build might have completed, or cached results computed using an inconsistent VFS state.
dc08ce4
to
d4a6141
Compare
This patch does two things: 1. It allows us to track the versions of `Values` which don't come from the VFS, as long as those particular `Values` depended on the `GetModificationTime` rule This is necessary for the recompilation avoidance scheme implemented in #2316 2. It removes the VFSHandle type and instead relies on snapshots of the VFS state taken on every rebuild of the shake session to ensure that we see a consistent VFS state throughout each individual build. With regards to 2, this is necessary because the lsp library mutates its VFS file store as changes come in. This can lead to scenarios where the HLS build session can see inconsistent views of the VFS. One such scenario is. 1. HLS build starts, with VFS state A 2. LSP Change request comes in and lsp updates its internal VFS state to B 3. HLS build continues, now consulting VFS state B 4. lsp calls the HLS file change handler, interrupting the build and restarting it. However, the build might have completed, or cached results computed using an inconsistent VFS state.
d4a6141
to
9d2f06a
Compare
80689e9
to
35aad7b
Compare
My last commit tries to change the lsp-types benchmark to better showcase the improvements to recompilation avoidance because of this patch, but unfortunately This patch shows the most improvement when you have a "deep" module graph and you ask for recompilation of modules both at the bottom and the top of the graph - and the modules in between are a mixture of requiring and not requiring TH or linking. Here are the benchmarks for
The "after edit" benchmarks show some reduction in total time and rules built. Unfortunately some of the code action benchmarks fail (on both this patch and upstream), so I will revert the changes from the last commit in this patch. Here is another set of benchmarks for "get definition after edit" run on https://github.com/hasura/graphql-engine which is a large project satisfying the constraints mentioned above:
|
cfdb552
to
0987bef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive results. The next HLS release is going to be so good for TH users
There's a lot of careful reasoning about how this scheme works in the PR description. Can we make sure that this is captured somewhere for posterity, maybe a note in the code? |
8dd8349
to
aa7a9c1
Compare
I've added a note to this effect |
… at the top of the hierarchy (no incoming edges) weren't being recorded properly
…e it. The SourceUnmodifiedAndStable check in loadInterface wasn't doing much for us because we use bytecode and there is no bytecode on disk so we always had to recompile.
The old recompilation avoidance scheme performs quite poorly when code generation is needed. We end up needed to recompile modules basically any time anything in their transitive dependency closure changes. Most versions of GHC we currently support don't have a working implementation of code unloading for object code, and no version of GHC supports this on certain platforms like Windows. This makes it completely infeasible for interactive use, as symbols from previous compiles will shadow over all future compiles. This means that we need to use bytecode when generating code for Template Haskell. Unfortunately, we can't serialize bytecode, so we will always need to recompile when the IDE starts. However, we can put in place a much tighter recompilation avoidance scheme for subsequent compiles: 1. If the source file changes, then we always need to recompile a. For files of interest, we will get explicit `textDocument/change` events that will let us invalidate our build products b. For files we read from disk, we can detect source file changes by comparing the mtime of the source file with the build product (.hi/.o) file on disk. 2. If GHC's recompilation avoidance scheme based on interface file hashes says that we need to recompile, the we need to recompile. 3. If the file in question requires code generation then, we need to recompile if we don't have the appropriate kind of build products. a. If we already have the build products in memory, and the conditions 1 and 2 hold, then we don't need to recompile b. If we are generating object code, then we can also search for it on disk and ensure it is up to date. Notably, we did _not_ previously re-use old bytecode from memory when hls-graph/shake decided to rebuild the 'HiFileResult' for some reason 4. If the file in question used Template Haskell on the previous compile, then we need to recompile if any `Linkable` in its transitive closure changed. This sounds bad, but it is possible to make some improvements. In particular, we only need to recompile if any of the `Linkable`s actually used during the previous compile change. How can we tell if a `Linkable` was actually used while running some TH? GHC provides a `hscCompileCoreExprHook` which lets us intercept bytecode as it is being compiled and linked. We can inspect the bytecode to see which `Linkable` dependencies it requires, and record this for use in recompilation checking. We record all the home package modules of the free names that occur in the bytecode. The `Linkable`s required are then the transitive closure of these modules in the home-package environment. This is the same scheme as used by GHC to find the correct things to link in before running bytecode. This works fine if we already have previous build products in memory, but what if we are reading an interface from disk? Well, we can smuggle in the necessary information (linkable `Module`s required as well as the time they were generated) using `Annotation`s, which provide a somewhat general purpose way to serialise arbitrary information along with interface files. Then when deciding whether to recompile, we need to check that the versions of the linkables used during a previous compile match whatever is currently in the HPT. The changes that were made to `ghcide` in order to implement this scheme include: 1. Add `RuleWithOldValue` to define Rules which have access to the previous value. This is the magic bit that lets us re-use bytecode from previous compiles 2. `IsHiFileStable` rule was removed as we don't need it with this scheme in place. 3. Everything in the store is properly versioned with a `FileVersion`, not just FOIs. 4. The VFSHandle type was removed. Instead we now take a VFS snapshot on every restart, and use this snapshot for all the `Rules` in that build. This ensures that Rules see a consistent version of the VFS and also makes The `setVirtualFileContents` function was removed since it was not being used anywhere. If needed in the future, we can easily just modify the VFS using functions from `lsp`. 5. Fix a bug with the `DependencyInformation` calculation, were modules at the top of the hierarchy (no incoming edges) weren't being recorded properly A possible future improvement is to use object-code on the first load (so we have a warm cache) and use bytecode for subsequent compiles.
avoid coerce redundant import remove trace and format imports hlints and comments
The old recompilation avoidance scheme performs quite poorly when code generation is needed. We end up needing to recompile modules basically any time anything in their transitive dependency closure changes.
Most versions of GHC we currently support don't have a working implementation of code unloading for object code, and no version of GHC supports this on certain platforms like Windows. This makes it completely infeasible for interactive use, as symbols from previous compiles will shadow over all future compiles.
This means that we need to use bytecode when generating code for Template Haskell. Unfortunately, we can't serialize bytecode, so we will always need to recompile when the IDE starts. However, we can put in place a much tighter recompilation avoidance scheme for subsequent compiles:
If the source file changes, then we always need to recompile
a. For files of interest, we will get explicit
textDocument/change
events that will let us invalidate our build productsb. For files we read from disk, we can detect source file changes by comparing the
mtime
of the source file with the build product (.hi/.o) file on disk.If GHC's recompilation avoidance scheme based on interface file hashes says that we need to recompile, the we need to recompile.
If the file in question requires code generation then, we need to recompile if we don't have the appropriate kind of build products.
a. If we already have the build products in memory, and the conditions 1 and 2 above hold, then we don't need to recompile
b. If we are generating object code, then we can also search for it on disk and ensure it is up to date.
Notably, we did not previously re-use old bytecode from memory when
hls-graph
/shake
decided to rebuild theHiFileResult
for some reasonIf the file in question used Template Haskell on the previous compile, then we need to recompile if any
Linkable
in its transitive closure changed. This sounds bad, but it is possible to make some improvements.In particular, we only need to recompile if any of the
Linkable
s actually used during the previous compile change.How can we tell if a
Linkable
was actually used while running some TH?GHC provides a
hscCompileCoreExprHook
which lets us intercept bytecode as it is being compiled and linked. We can inspect the bytecode to see whichLinkable
dependencies it requires, and record this for use inrecompilation checking.
We record all the home package modules of the free names that occur in the bytecode. The
Linkable
s required are then the transitive closure of these modules in the home-package environment. This is the same scheme as used by GHC to find the correct things to link in before running bytecode.This works fine if we already have previous build products in memory, but what if we are reading an interface from disk? Well, we can smuggle in the necessary information (linkable
Module
s required as well as the time theywere generated) using
Annotation
s, which provide a somewhat general purpose way to serialise arbitrary information along with interface files.Then when deciding whether to recompile, we need to check that the versions of the linkables used during a previous compile match whatever is currently in the HPT.
The changes that were made to
ghcide
in order to implement this scheme include:RuleWithOldValue
to define Rules which have access to the previous value.This is the magic bit that lets us re-use bytecode from previous compiles
IsHiFileStable
rule was removed as we don't need it with this scheme in place.FileVersion
, not just FOIs.VFSHandle
type was removed. Instead we now take a VFS snapshot on every restart, and use this snapshot for all theRules
in that build. This ensures that Rules see a consistent version of the VFS for each run and also makes 3 possible.The
setVirtualFileContents
function was removed since it was not being used anywhere.If needed in the future, we can easily just modify the VFS using functions from
lsp
.DependencyInformation
calculation, were modules at the top of the hierarchy (no incoming edges) weren't being recorded properlyA possible future improvement is to use object-code on the first load (so we have a warm cache) and use bytecode for subsequent compiles.