Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technology preview: Keep track of changes to minimize rebuilds #1862

Merged
merged 4 commits into from
Jun 2, 2021

Conversation

pepeiborra
Copy link
Collaborator

@pepeiborra pepeiborra commented May 23, 2021

NOTE: This is a technology preview of something I have been working on for several months. It requires extending his-graph to accept the set of keys that have changed since the last build as an input, of which there isn't a final implementation yet. There is a Shake implementation (ndmitchell/shake#802), but it might end up being reimplemented in #1759 instead. Details still being worked out.

This PR extends ghcide to keep track of the set of rules that have changed since the last successful build. This information is then shard with his-graph, which can then use it to minimize the amount of build rules visited.

Motivation

When his-graph runs a rule, it first tells it whether its dependencies have changed. Ghcide uses this information to skip evaluation and perform a lookup on the Values state when the dependencies have not changed. Even though this lookup is fast, the constant overheads are significant and the cost can add up. Currently, in order to evaluate a TypeCheck Some.Module rule, Ghcide needs to perform a number of lookups proportional to the transitive imports of Some.Module. This PR makes it technically possible to make the number of lookups independent of the number of transitive imports.

Example

Consider the "edit" experiment on the Cabal 3.0.0.0 example of the benchmark suite, which performs a whitespace edit on the Distribution.Simple module and measures the typecheck time. Ghcide installs an LSP handler for the TextDocumentDidChange event that triggers a rebuild of the TypeCheck Distribution.Simple rule. However, this doesn't tell his-graph that the only thing that has changed since the last build is the GetModificationTime Distribution.Simple rule. Without this information, hls-graph needs to evaluate all the dependencies (collected in the previous run) of the TypeCheck Distribution.Simple rule to find out if they have changed since last run, and only then it can run the TypeCheck rule.

The TypeCheck Distribution.Simple rule depends on two other rules:

  • GhcSessionDeps Distribution.Simple
  • GetParsedModule Distribution.Simple

Again, before evaluating either of these two other rules, his-graph needs to run their dependencies to find out whether they have changed. In the case of GhcSessionDeps, this includes:

  1. GetTransitiveDependencies D.S
  2. GetModIface X for all X in the transitive dependencies of D.S
  3. GhcSession D.S

Because of 2, his-graph needs to run a number of build rules proportional to the transitive deps of the module being type checked. We know that these runs will be just lookups, since nothing has changed in their dependencies all the way to the leaf rules, but still it's a significant overhead that could be avoided if we were to tell his-graph that the only rule that has changed is the file contents of Distribution.Simple. That's what this PR does.

Visually, the open telemetry traces below show the before and after (using ndmitchell/shake#802) of the example. The traces show all the rules evaluated by Shake during the evaluation of TypeCheck Distribution.Simple.

Before

Takes around 700ms to do a TypeCheck of the module and the statistics table shows all the other rules visited during the evaluation.
Before detail

After

Typecheck completes in only 70ms and the number of rules visited, while still proportional to the count of transitive imports, is much lower. Why is it still proportional? I haven't investigated but I suspect it's related to the NeedsCompilation rules depending on the whole project via the ModuleGraph rule.
After detail

Implementation Details

There is a new dirtyKeys mutable set to keep track of world changes, which are collected from LSP notifications. We already register file watchers for haskell source files, but this needs to be extended to any other files that we care about, e.g. cradle dependencies. Need to check whether this is already done.

  • File watches for cradle dependencies and TH dependencies

The set of dirty keys is handed over to his-graph in the call to shakeRunDatabase. his-graph is responsible for keeping track of the transitive dirty set across builds. For example, if GetModificationTime D.S has become dirty, all its reverse dependencies must be tagged as dirty too. his-graph can remove a rule from the dirty set once it's been evaluated, which may not happen in this build.

Other bits:

  • Rules that are not in the dirty set will not be evaluated, even if they are alwaysRerun. Missing rules will lead to stale results and hard to diagnose bugs. There is a flag optRunSubset that controls whether we call his-graph in the old way (everything has potentially changed) or in the new way (only these rules have changed) intended to debug these issues.
  • Rules cannot be removed from the dirty set until positive that they have been evaluated by hls-graph, specially since builds can be interrupted at any time.
  • The tracking of files of interest now becomes much more precise: currently everything depends on GetFilesOfInterest, this rule is now gone.

Merging

This work has been sitting in my fork for too long. Before I forget the details, I want to get it reviewed and possibly merged too even if there isn't a working hls-graph implementation available yet. The changes should be a no-op in terms of functionality and the costs of tracking dirty rules are negligible, so I think it should be safe to merge.

A version of these changes running against ndmitchell/shake#802 and exercising the test and bench suites is available at pepeiborra#9

@pepeiborra pepeiborra requested review from ndmitchell and wz1000 May 23, 2021 11:42
@@ -116,7 +116,7 @@ highlightAtPoint file pos = runMaybeT $ do
refsAtPoint :: NormalizedFilePath -> Position -> Action [Location]
refsAtPoint file pos = do
ShakeExtras{hiedb} <- getShakeExtras
fs <- HM.keys <$> getFilesOfInterest
fs <- HM.keys <$> getFilesOfInterestUntracked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, why is it renamed?

@pepeiborra pepeiborra force-pushed the keysChanged-placeholders branch from 904122c to 3bbdf40 Compare May 23, 2021 16:38
@mpickering
Copy link
Contributor

Did you fix the bug I reported on IRC about stale diagnostics in parent files?

@pepeiborra
Copy link
Collaborator Author

Did you fix the bug I reported on IRC about stale diagnostics in parent files?

That bug is still there. There's probably more bugs, and it will take some effort to weed them out. But right now there isn't a convenient way to test the new functionality in CI, so I intend to wait until hls-graph supports reverse dependencies

@pepeiborra pepeiborra force-pushed the keysChanged-placeholders branch from e8aa39b to 3ebe65d Compare June 2, 2021 10:07
@pepeiborra
Copy link
Collaborator Author

Rebased on top of #1880 which fixes all the remaining test failures with optRunSubset enabled (see pepeiborra#9 )

Ready to merge, can I have a stamp @wz1000 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants