-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect reverse dependencies #802
Conversation
d31fc6d
to
d48eddc
Compare
58be020
to
b96ee3b
Compare
There is a tricky issue with async exceptions that I am struggling to solve. To me, shakeRunDatabaseForKeys is an unsound approximation since it bypasses the version number checks. It is possible that a key K is not up to date and yet, because it doesn't belong to the dirty set, gets reused without updating. In what cases can this happen? The most common one is when a previous call to shakeRunDatabaseForKeys marked K dirty, but the evaluation didn't touch K so it didn't get recomputed. If the current call to shakeRunDatabaseForKeys isn't aware of this, and if the new dirty set doesn't include K but the evaluation uses it, the stale value will be reused. To avoid this, I have added state to the database to track the dirty set across runs. A key is removed from the persisted dirty set as it is refreshed. The tricky bit with async exceptions is that a call to shakeRunDatabaseForKeys can get interrupted before it has a chance to add the keys to the dirty set. I'm not sure how to solve this. The obvious solution is to mask, but it seems ugly and out of place in the Shake codebase. We could put the mask in the caller, by changing shakeRunDatabaseForKeys to return an |
43b94b8
to
e811813
Compare
I opted for masking in Shake. The masking minimal and happens in |
5971acc
to
c0b281e
Compare
-- of actions to run after the database was closed, as added with | ||
-- 'Development.Shake.runAfter' and 'Development.Shake.removeFilesAfter'. | ||
shakeRunDatabaseForKeys | ||
:: Maybe [SomeShakeValue] -- ^ Set of keys changed since last run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find the caller where Just
value is used. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an external API. See for example how it is used in HLS:
09e5396
to
857044d
Compare
I think this masking is not essential and can be removed, since the caller is always able to find out whether a rule was actually evaluated and use this information to keep an up-to-date state of the dirty keys on the caller side. |
If the caller provides us with a set of keys that have changed, the reverse dependencies are used to compute the transitive dependencies changed avoiding a call to firstJustWaitUnordered
SomeShakeValue is a solution for heterogeneous lists of keys
shakeRunDatabaseForKeys is an unsound approximation since it bypasses the version number checks. It is possible that a key K is not up to date and yet, because it doesn't belong to the dirty set, gets reused without updating. In what cases can this happen? The most common one is when a previous call to shakeRunDatabaseForKeys marked K dirty, but the evaluation didn't touch K so it didn't get recomputed. The current call to shakeRunDatabaseForKeys doesn't know this, and if the new dirty set doesn't include K but the evaluation uses it, the stale value will be reused. To avoid this, we persist (in memory only) the dirty set and remember it across runs. A key is removed from the persisted dirty set as it is refreshed.
Add an option to enable the tracking of reverse dependencies. Zero overhead for clients who do not care about them
When I ported reverse dependencies from Shake[1] I missed an important detail. While Shake models alwaysRerun as a dependency on an actual rule (AlwaysRerun), hls-graph models alwaysRerun by setting actionDeps to Nothing. This is important because dependencies are not computed for these rules, and therefore reverse dependency tracking doesn't do anything, which breaks correctness of dirty rebuilds This commit adds dependency tracking for alwaysRerun rules, and fixes reverse dependency tracking. The alternative would be following the Shake approach but I'm not sure what other implications this might have. [1] - ndmitchell/shake#802
When I ported reverse dependencies from Shake[1] I missed an important detail. While Shake models alwaysRerun as a dependency on an actual rule (AlwaysRerun), hls-graph models alwaysRerun by setting actionDeps to Nothing. This is important because dependencies are not computed for these rules, and therefore reverse dependency tracking doesn't do anything, which breaks correctness of dirty rebuilds This commit adds dependency tracking for alwaysRerun rules, and fixes reverse dependency tracking. The alternative would be following the Shake approach but I'm not sure what other implications this might have. [1] - ndmitchell/shake#802
When I ported reverse dependencies from Shake[1] I missed an important detail. While Shake models alwaysRerun as a dependency on an actual rule (AlwaysRerun), hls-graph models alwaysRerun by setting actionDeps to Nothing. This is important because dependencies are not computed for these rules, and therefore reverse dependency tracking doesn't do anything, which breaks correctness of dirty rebuilds This commit adds dependency tracking for alwaysRerun rules, and fixes reverse dependency tracking. The alternative would be following the Shake approach but I'm not sure what other implications this might have. [1] - ndmitchell/shake#802
When I ported reverse dependencies from Shake[1] I missed an important detail. While Shake models alwaysRerun as a dependency on an actual rule (AlwaysRerun), hls-graph models alwaysRerun by setting actionDeps to Nothing. This is important because dependencies are not computed for these rules, and therefore reverse dependency tracking doesn't do anything, which breaks correctness of dirty rebuilds This commit adds dependency tracking for alwaysRerun rules, and fixes reverse dependency tracking. The alternative would be following the Shake approach but I'm not sure what other implications this might have. [1] - ndmitchell/shake#802
When I ported reverse dependencies from Shake[1] I missed an important detail. While Shake models alwaysRerun as a dependency on an actual rule (AlwaysRerun), hls-graph models alwaysRerun by setting actionDeps to Nothing. This is important because dependencies are not computed for these rules, and therefore reverse dependency tracking doesn't do anything, which breaks correctness of dirty rebuilds This commit adds dependency tracking for alwaysRerun rules, and fixes reverse dependency tracking. The alternative would be following the Shake approach but I'm not sure what other implications this might have. [1] - ndmitchell/shake#802
When I ported reverse dependencies from Shake[1] I missed an important detail. While Shake models alwaysRerun as a dependency on an actual rule (AlwaysRerun), hls-graph models alwaysRerun by setting actionDeps to Nothing. This is important because dependencies are not computed for these rules, and therefore reverse dependency tracking doesn't do anything, which breaks correctness of dirty rebuilds This commit adds dependency tracking for alwaysRerun rules, and fixes reverse dependency tracking. The alternative would be following the Shake approach but I'm not sure what other implications this might have. [1] - ndmitchell/shake#802
Closing this since the work has moved on to HLS-graph |
Thanks for all the hard work doing this @pepeiborra - it's very nice work. I think the end conclusion is that HLS-graph and Shake have diverged significantly enough that doing one for both isn't all that helpful. I shared some more thoughts about the size of build systems and where Shake fits in https://neilmitchell.blogspot.com/2021/09/reflecting-on-shake-build-system.html |
This PR makes the following changes:
extends Shake to collect the reverse dependencies of every 'Result' by performing some bookkeeping on every evaluation. The amount of work done is O(E) on the number of edges in the graph and I have measured it to be around 3% for a very dense graph with 150.000 nodes and 1e8 edges, so it's not bad at all, and could be put behind a flag if desired. Note that in this PR the set of reverse dependencies is overestimated since ephemeral reverse edges are never deleted.
adds a new function
shakeRunDatabaseFromKeys
which extendsshakeRunDatabase
by accepting, in addition to a set of actions (the goals), a set D of keys that have changed since the last run. The transitive closure of D is the set of all keys that have potentially changed since the last run - all keys that do not belong to this set are guaranteed unchanged and can be excluded from thebuildRunDependenciesChanged
query.When the set D is small compared to the size of the build graph, this optimisation renders performance gains of one order of magnitude, up to
70X20X in some cases. I have extended[1] HLS to keep track of what has changed since the last full Shake run and performed several experiments:[1] - https://github.com/pepeiborra/ide/tree/keysChanged
Closes #800
EDIT: updated performance numbers