Remove IDs from definitions, and unify local and global variables #323

georgefst · 2022-03-17T13:07:04Z

Based on #322.

The original motivation for using IDs was to make renaming easy, as mentioned in https://github.com/hackworthltd/vonnegut/pull/248/commits/7d888645f642831a719f4c4dd72ae8fe4d080aea. But we eventually decided to take a traditional approach to name shadowing, so we lose much of the benefit of using IDs, but continue to carry around both a Name and an ID for little reason. Furthermore, we never applied the same logic to other constructs such as TypeDefs, to which we don't assign IDs. So in this PR we simplify everything by going back to using IDs only for their original purpose of identifying things which don't have names, i.e. subexpressions.

Once we've done that, there's little benefit in keeping Var and GlobalVar in separate constructors in Expr, so the second commit also unifies these. An expression shouldn't care whether a variable it references is global or not. Note that we do still separate the local and global typechecker context, even though they now have very similar types. The distinction seems like it may be important to how the typechecker works, but I'm not sure that I fully understand it. Perhaps this too could be simplified.

From speaking to @brprice, the first part of this should be uncontroversial, but the second maybe less so. I've kept them together for now, not least because the test Action.Prog.unit_rename_def fails at the in-between point, because I implemented renaming for global variables by reusing renameVar. If we were to decide to split them up, this could be averted easily. This is still marked as a draft while we decide on this (and while the commit messages have not been fleshed out). edit: We've agreed on a modified solution, where we unify them behind a new type VarRef, which allows us to distinguish the two types of variables in the cases where this is necessary, and gives us the best of both worlds.

Pros

consistency with typedefs, which don't have IDs
debugging is often easier, as we see names rather than IDs in Show output
simpler API, potentially making things easier for a frontend
when we have module identifiers, it may be simpler if these were always associated with a name rather than an ID
- Add identifiers to modules #320 refers to "globally-referable identifiers (Names and IDs)" - now it's just Names
removing hardcoded IDs makes a number of tests less brittle, and much more readable
we are able to easily check for variable capture with globals: the added test unit_rename_def_capture would have previously failed
- although this is really a nice side effect, as opposed to something we couldn't do before
we get nicer output from API.viewTreeExpr because we have the name to hand (and we no longer need a special case for global variables)
- though again, I'm sure in the long run this wouldn't have been too much effort without this change
code is simplified in various other places, including:
- all tests using withPrimDefs (which is more-or-less all tests involving primitives) become a lot simpler, as we don't need to remember the IDs of definitions
- due ultimately to the ChooseVariable action, where we don't really care whether the chosen variable is local or global, there were previously a lot of places in Primer.Action where we passed around a variable reference which can be local or global as an Either ID Name - aside from the fact that this is a weak and undescriptive type, each use site went on to require two code paths for treating local and global variables in essentially the same way

Cons

The RenameDef action becomes more expensive, having to traverse all expressions. Though this is only as expensive as is already the case for renaming local variables, types etc.

The diff here is almost so big as to be unreviewable. But the changes are largely mechanical, and tests pass.

I can't shake a nagging feeling that this is morally the the wrong way round, and that we should instead separate display names from identifiers across the board. But that's a much bigger change that we don't have time for any time soon. And it's not clear that it would actually buy us any concrete benefit aside from cheaper renaming.

dhess · 2022-03-17T13:20:37Z

Sorry, I guess I'm out of the loop right now, but what purpose does this PR serve? It's not at all clear from the PR title, the OP, or the commit messages what the point of these changes are.

georgefst · 2022-03-17T13:51:50Z

Sorry, I guess I'm out of the loop right now, but what purpose does this PR serve? It's not at all clear from the PR title, the OP, or the commit messages what the point of these changes are.

Basically, @brprice and I have some changes that we've been making as part of / precursors to #267 and #320 in which we're concerned about merge conflicts. So we're splitting them out in to separate PRs so that we can get back on the same page as soon as possible, rather than continuing to diverge. #267 is still a few days away, not least because I've already had to perform some very awkward rebasing on top of #293.

I'll be filling in details (including in commit messages) shortly, but seeing as the two of us have just discussed all this over a video call, that needn't block @brprice when it comes to rebasing.

dhess · 2022-03-17T14:12:04Z

Thanks for the explanation. That all sounds smart. Please just make sure you backfill some motivation in the OP & commit messages before you commit anything, assuming this is the PR you end up using.

brprice · 2022-03-17T15:42:02Z

Could you add a test that renaming a referred-to definition updates references properly? e.g. in foo = ... ; bar = foo we should be able to rename foo to quux and get bar = quux

georgefst · 2022-03-17T18:25:45Z

Could you add a test that renaming a referred-to definition updates references properly? e.g. in foo = ... ; bar = foo we should be able to rename foo to quux and get bar = quux

Done: 1206e61. With different names, for ease of reusing main and other as present for other tests.

brprice · 2022-03-21T10:49:45Z

I'm unsure about unifying local and global variables, wrt my upcoming change to add module names. I am thinking that references to globals will need a module name, but locals will not: always "in the current module". This could be handled by something like data Expr = ... | Var (Maybe ModuleName, Name), but it may be nicer to just keep them separate?

primer/test/Tests/Action/Prog.hs

primer/src/Primer/ZipperCxt.hs

primer/src/Primer/Action.hs

dhess · 2022-03-22T16:44:16Z

Is this old issue relevant here? #125

primer/src/Primer/App.hs

primer/src/Primer/Action/Available.hs

primer/src/Primer/Eval.hs

primer/test/Gen/Core/Raw.hs

brprice · 2022-03-23T16:39:49Z

The RenameDef action becomes more expensive, having to traverse all expressions. Though this is only as expensive as is already the case for renaming local variables, types etc.

Nitpicking: This is as expensive as renaming types, but renaming local variables should be much cheaper as we need only traverse the subexpression they scope over, rather than all definitions!

primer/src/Primer/Action/Available.hs

primer/src/Primer/App.hs

primer/src/Primer/ZipperCxt.hs

primer/test/Tests/Action/Prog.hs

primer/test/Tests/Eval.hs

primer/test/Tests/EvalFull.hs

primer/src/Primer/Eval.hs

brprice

This looks pretty good, but I have a few small requests above

georgefst · 2022-03-24T12:00:05Z

Thanks for the very thorough review @brprice! I clearly should have been a bit more thorough myself when this switched from a quick exploration in to something we were actually likely to merge.

History here is now slightly messy:

I put all the review stuff in one commit for simplicity. There are a lot of very small changes.
The two remaining unresolved comments are closely related, and are about the history being confusing, due to a test failing at one point, pending the second part of the refactor.

Is it worth going back, marking the test broken, editing the commit message to mention this, and possibly squashing all the other changes in to other commits, and force-pushing? I'm inclined to say no, but I'm happy to do it if anyone feels strongly.

dhess · 2022-03-24T12:09:55Z

I'm ambivalent. I certainly don't think we should make a habit of developing this way, as having a clean history is important as documentation. On the other hand, we increasingly have less time to be pedantic about these things and your commit messages are good enough to explain what's going on, at least. In any case, I'll defer to @brprice.

dhess · 2022-03-24T12:12:00Z

FYI, I noticed that this test failed in CI on your final push: https://buildkite.com/hackworthltd/primer/builds/321#d684977b-6fe8-4dac-9af5-bf42cbc3d37f

That looks like a race condition with tmp-postgres, so I restarted it and it passed. We'll need to keep an eye on that one in case it comes up repeatedly, but in any case, your PR looks good from the CI perspective now.

brprice · 2022-03-24T12:36:30Z

I personally would like to see

Remove IDs from definitions, and unify local and global variables #323 (review) (fixing the logic of RenameDef) split out from the "Unify handling" commit, as it is unrelated and makes for very confusing history. I don't mind whether it gets squashed into a different commit though, or how it is ordered wrt the "Unify handling" commit.
Some change to how we reinstantiate the comments: either a commit message addition saying "re-add (and reword) some accidentally-deleted comments) or rebasing so they never were removed

My aim is that the history is not misleading: from looking at the contents of one commit and the messages of the surrounding few commits it would be nice if one wasn't left wondering "why are the IDs in the types of even and odd affected?" (hopefully clarified by adding a line to the commit message saying "re-add some comments", then it would be obvious that one should look at the diff of that commit), or "how on earth should this test pass?" (hopefully clarified by splitting that out into a separate commit, rather than smuggling it with unrelated stuff, so one can see there is a related change in another commit)

dhess · 2022-03-24T13:11:59Z

OK @georgefst, I said I'd defer to @brprice, so could you please work with him to get this PR fixed up per his requests?

We now identify them by `Name` instead. The original motivation for using IDs was to make renaming easy, as mentioned in hackworthltd/vonnegut@7d88864. But we eventually decided to take a traditional approach to name shadowing, so we have to ensure that names don't clash anyway, and thus lose much of the benefit of using IDs, but continue to carry around both a Name and an ID for little reason. Furthermore, we never applied the same logic to other constructs such as TypeDefs, to which we don't assign IDs. So in this commit we simplify everything by going back to using IDs only for their original purpose of identifying things which don't have names, i.e. subexpressions. For now, the `RenameDef` action is broken, since we don't update references to the renamed definition. This will be fixed in the subsequent commit, as we'll be able to use the `renameVar` function which currently only works for local variables.

georgefst · 2022-03-24T15:26:55Z

OK, I've doctored the history as requested.

(fixing the logic of RenameDef) split out from the "Unify handling" commit, as it is unrelated

It's not completely unrelated. The implementation was broken by changes in the first commit, but then able to be fixed due to changes in the second. But you're right that we could move half of the diff in to the first commit, which is actually enough to have tests always passing. The test exercising the broken part wasn't introduced until later on with unit_rename_def_referenced (I did consider moving the commit which introduced that test forward, and perhaps marking it with tasty-expected-failure, but that really seemed like overkill).

Anyway, the history here is now nice and clean, giving the appearance that we got everything right first time.

We introduce a new type `VarRef`. This will allow us to abstract over differences between local and global variables, especially once we have modules, and global variable references need to be scoped to a module. Most of the changes here are very mechanical. Exceptions are: - We factor out some commonality between `Typecheck.synth` and `Action.getVarType` in to `Typecheck.lookupVar`. - This also gives us slightly more structured errors for the failure cases in `getVarType` (after modifying `SaturatedApplicationError` and `RefineError` to be able to contain a `TypeError`). - We add an `UnknownTypeVariable` constructor to `TypeError`, to be thrown in the obvious scenario. This is necessary since `UnknownVariable` now takes a `VarRef`, and thus only makes sense for term-level variables.

This would have failed prior to the previous commit.

This is mostly in light of the recent change to unify global and local variables. Although `var` has always been a bad name, since it actually constructs only a particular sort of variable (a local one). Note that this commit is the result of running: ``` retrie --adhoc 'var = lvar' retrie --adhoc 'global = gvar' retrie --adhoc 'varref = var' fourmolu -i $(find primer primer-rel8 primer-service -name '*.hs' -not -path 'primer/test/outputs/*') ``` and manually fixing up a few imports, due to limitations of `retrie`.

brprice · 2022-03-24T16:57:02Z

OK, I've doctored the history as requested.

Thanks! I'll merge this now

georgefst mentioned this pull request Mar 17, 2022

Carry ID rather than full definition in Selection #322

Merged

georgefst mentioned this pull request Mar 21, 2022

Refactoring names #326

Merged

georgefst commented Mar 21, 2022

View reviewed changes

primer/test/Tests/Action/Prog.hs Show resolved Hide resolved

brprice reviewed Mar 21, 2022

View reviewed changes

primer/src/Primer/ZipperCxt.hs Show resolved Hide resolved

brprice reviewed Mar 21, 2022

View reviewed changes

primer/src/Primer/Action.hs Outdated Show resolved Hide resolved

georgefst force-pushed the georgefst/def-no-id branch from 2400a3f to 07f39e9 Compare March 22, 2022 12:53

brprice reviewed Mar 22, 2022

View reviewed changes

primer/src/Primer/App.hs Outdated Show resolved Hide resolved

brprice reviewed Mar 22, 2022

View reviewed changes

primer/src/Primer/Action/Available.hs Show resolved Hide resolved

georgefst force-pushed the georgefst/def-no-id branch 4 times, most recently from 09fc928 to 9f8592d Compare March 22, 2022 21:03

georgefst mentioned this pull request Mar 23, 2022

What can we do to help the user avoid free variable capture (and should we)? #120

Open