Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove IDs from definitions, and unify local and global variables #323

Merged
merged 5 commits into from
Mar 24, 2022

Conversation

georgefst
Copy link
Contributor

@georgefst georgefst commented Mar 17, 2022

Based on #322.

The original motivation for using IDs was to make renaming easy, as mentioned in https://github.com/hackworthltd/vonnegut/pull/248/commits/7d888645f642831a719f4c4dd72ae8fe4d080aea. But we eventually decided to take a traditional approach to name shadowing, so we lose much of the benefit of using IDs, but continue to carry around both a Name and an ID for little reason. Furthermore, we never applied the same logic to other constructs such as TypeDefs, to which we don't assign IDs. So in this PR we simplify everything by going back to using IDs only for their original purpose of identifying things which don't have names, i.e. subexpressions.

Once we've done that, there's little benefit in keeping Var and GlobalVar in separate constructors in Expr, so the second commit also unifies these. An expression shouldn't care whether a variable it references is global or not. Note that we do still separate the local and global typechecker context, even though they now have very similar types. The distinction seems like it may be important to how the typechecker works, but I'm not sure that I fully understand it. Perhaps this too could be simplified.

From speaking to @brprice, the first part of this should be uncontroversial, but the second maybe less so. I've kept them together for now, not least because the test Action.Prog.unit_rename_def fails at the in-between point, because I implemented renaming for global variables by reusing renameVar. If we were to decide to split them up, this could be averted easily. This is still marked as a draft while we decide on this (and while the commit messages have not been fleshed out). edit: We've agreed on a modified solution, where we unify them behind a new type VarRef, which allows us to distinguish the two types of variables in the cases where this is necessary, and gives us the best of both worlds.

Pros

  • consistency with typedefs, which don't have IDs
  • debugging is often easier, as we see names rather than IDs in Show output
  • simpler API, potentially making things easier for a frontend
  • when we have module identifiers, it may be simpler if these were always associated with a name rather than an ID
  • removing hardcoded IDs makes a number of tests less brittle, and much more readable
  • we are able to easily check for variable capture with globals: the added test unit_rename_def_capture would have previously failed
    • although this is really a nice side effect, as opposed to something we couldn't do before
  • we get nicer output from API.viewTreeExpr because we have the name to hand (and we no longer need a special case for global variables)
    • though again, I'm sure in the long run this wouldn't have been too much effort without this change
  • code is simplified in various other places, including:
    • all tests using withPrimDefs (which is more-or-less all tests involving primitives) become a lot simpler, as we don't need to remember the IDs of definitions
    • due ultimately to the ChooseVariable action, where we don't really care whether the chosen variable is local or global, there were previously a lot of places in Primer.Action where we passed around a variable reference which can be local or global as an Either ID Name - aside from the fact that this is a weak and undescriptive type, each use site went on to require two code paths for treating local and global variables in essentially the same way

Cons

The RenameDef action becomes more expensive, having to traverse all expressions. Though this is only as expensive as is already the case for renaming local variables, types etc.

The diff here is almost so big as to be unreviewable. But the changes are largely mechanical, and tests pass.

I can't shake a nagging feeling that this is morally the the wrong way round, and that we should instead separate display names from identifiers across the board. But that's a much bigger change that we don't have time for any time soon. And it's not clear that it would actually buy us any concrete benefit aside from cheaper renaming.

@dhess
Copy link
Member

dhess commented Mar 17, 2022

Sorry, I guess I'm out of the loop right now, but what purpose does this PR serve? It's not at all clear from the PR title, the OP, or the commit messages what the point of these changes are.

@georgefst
Copy link
Contributor Author

Sorry, I guess I'm out of the loop right now, but what purpose does this PR serve? It's not at all clear from the PR title, the OP, or the commit messages what the point of these changes are.

Basically, @brprice and I have some changes that we've been making as part of / precursors to #267 and #320 in which we're concerned about merge conflicts. So we're splitting them out in to separate PRs so that we can get back on the same page as soon as possible, rather than continuing to diverge. #267 is still a few days away, not least because I've already had to perform some very awkward rebasing on top of #293.

I'll be filling in details (including in commit messages) shortly, but seeing as the two of us have just discussed all this over a video call, that needn't block @brprice when it comes to rebasing.

@dhess
Copy link
Member

dhess commented Mar 17, 2022

Thanks for the explanation. That all sounds smart. Please just make sure you backfill some motivation in the OP & commit messages before you commit anything, assuming this is the PR you end up using.

@brprice
Copy link
Contributor

brprice commented Mar 17, 2022

Could you add a test that renaming a referred-to definition updates references properly? e.g. in foo = ... ; bar = foo we should be able to rename foo to quux and get bar = quux

@georgefst
Copy link
Contributor Author

Could you add a test that renaming a referred-to definition updates references properly? e.g. in foo = ... ; bar = foo we should be able to rename foo to quux and get bar = quux

Done: 1206e61. With different names, for ease of reusing main and other as present for other tests.

@brprice
Copy link
Contributor

brprice commented Mar 21, 2022

I'm unsure about unifying local and global variables, wrt my upcoming change to add module names. I am thinking that references to globals will need a module name, but locals will not: always "in the current module". This could be handled by something like data Expr = ... | Var (Maybe ModuleName, Name), but it may be nicer to just keep them separate?

@georgefst georgefst mentioned this pull request Mar 21, 2022
@georgefst georgefst force-pushed the georgefst/def-no-id branch from 2400a3f to 07f39e9 Compare March 22, 2022 12:53
@dhess
Copy link
Member

dhess commented Mar 22, 2022

Is this old issue relevant here? #125

@georgefst georgefst marked this pull request as ready for review March 23, 2022 12:22
@georgefst georgefst requested a review from brprice March 23, 2022 12:23
@georgefst georgefst force-pushed the georgefst/def-no-id branch 3 times, most recently from fcce2dd to 6a15639 Compare March 23, 2022 13:06
@brprice
Copy link
Contributor

brprice commented Mar 23, 2022

The RenameDef action becomes more expensive, having to traverse all expressions. Though this is only as expensive as is already the case for renaming local variables, types etc.

Nitpicking: This is as expensive as renaming types, but renaming local variables should be much cheaper as we need only traverse the subexpression they scope over, rather than all definitions!

Copy link
Contributor

@brprice brprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, but I have a few small requests above

@georgefst georgefst force-pushed the georgefst/def-no-id branch from 6a15639 to d8ab446 Compare March 24, 2022 11:59
@georgefst
Copy link
Contributor Author

Thanks for the very thorough review @brprice! I clearly should have been a bit more thorough myself when this switched from a quick exploration in to something we were actually likely to merge.

History here is now slightly messy:

  • I put all the review stuff in one commit for simplicity. There are a lot of very small changes.
  • The two remaining unresolved comments are closely related, and are about the history being confusing, due to a test failing at one point, pending the second part of the refactor.

Is it worth going back, marking the test broken, editing the commit message to mention this, and possibly squashing all the other changes in to other commits, and force-pushing? I'm inclined to say no, but I'm happy to do it if anyone feels strongly.

@dhess
Copy link
Member

dhess commented Mar 24, 2022

I'm ambivalent. I certainly don't think we should make a habit of developing this way, as having a clean history is important as documentation. On the other hand, we increasingly have less time to be pedantic about these things and your commit messages are good enough to explain what's going on, at least. In any case, I'll defer to @brprice.

@dhess
Copy link
Member

dhess commented Mar 24, 2022

FYI, I noticed that this test failed in CI on your final push: https://buildkite.com/hackworthltd/primer/builds/321#d684977b-6fe8-4dac-9af5-bf42cbc3d37f

That looks like a race condition with tmp-postgres, so I restarted it and it passed. We'll need to keep an eye on that one in case it comes up repeatedly, but in any case, your PR looks good from the CI perspective now.

@brprice
Copy link
Contributor

brprice commented Mar 24, 2022

I personally would like to see

  • Remove IDs from definitions, and unify local and global variables #323 (review) (fixing the logic of RenameDef) split out from the "Unify handling" commit, as it is unrelated and makes for very confusing history. I don't mind whether it gets squashed into a different commit though, or how it is ordered wrt the "Unify handling" commit.
  • Some change to how we reinstantiate the comments: either a commit message addition saying "re-add (and reword) some accidentally-deleted comments) or rebasing so they never were removed

My aim is that the history is not misleading: from looking at the contents of one commit and the messages of the surrounding few commits it would be nice if one wasn't left wondering "why are the IDs in the types of even and odd affected?" (hopefully clarified by adding a line to the commit message saying "re-add some comments", then it would be obvious that one should look at the diff of that commit), or "how on earth should this test pass?" (hopefully clarified by splitting that out into a separate commit, rather than smuggling it with unrelated stuff, so one can see there is a related change in another commit)

@dhess
Copy link
Member

dhess commented Mar 24, 2022

OK @georgefst, I said I'd defer to @brprice, so could you please work with him to get this PR fixed up per his requests?

We now identify them by `Name` instead.

The original motivation for using IDs was to make renaming easy, as mentioned in hackworthltd/vonnegut@7d88864. But we eventually decided to take a traditional approach to name shadowing, so we have to ensure that names don't clash anyway, and thus lose much of the benefit of using IDs, but continue to carry around both a Name and an ID for little reason. Furthermore, we never applied the same logic to other constructs such as TypeDefs, to which we don't assign IDs. So in this commit we simplify everything by going back to using IDs only for their original purpose of identifying things which don't have names, i.e. subexpressions.

For now, the `RenameDef` action is broken, since we don't update references to the renamed definition. This will be fixed in the subsequent commit, as we'll be able to use the `renameVar` function which currently only works for local variables.
@georgefst georgefst force-pushed the georgefst/def-no-id branch from d8ab446 to fe47d6c Compare March 24, 2022 15:26
@georgefst
Copy link
Contributor Author

OK, I've doctored the history as requested.

(fixing the logic of RenameDef) split out from the "Unify handling" commit, as it is unrelated

It's not completely unrelated. The implementation was broken by changes in the first commit, but then able to be fixed due to changes in the second. But you're right that we could move half of the diff in to the first commit, which is actually enough to have tests always passing. The test exercising the broken part wasn't introduced until later on with unit_rename_def_referenced (I did consider moving the commit which introduced that test forward, and perhaps marking it with tasty-expected-failure, but that really seemed like overkill).

Anyway, the history here is now nice and clean, giving the appearance that we got everything right first time.

We introduce a new type `VarRef`. This will allow us to abstract over differences between local and global variables, especially once we have modules, and global variable references need to be scoped to a module.

Most of the changes here are very mechanical. Exceptions are:
- We factor out some commonality between `Typecheck.synth` and `Action.getVarType` in to `Typecheck.lookupVar`.
  - This also gives us slightly more structured errors for the failure cases in `getVarType` (after modifying `SaturatedApplicationError` and `RefineError` to be able to contain a `TypeError`).
- We add an `UnknownTypeVariable` constructor to `TypeError`, to be thrown in the obvious scenario. This is necessary since `UnknownVariable` now takes a `VarRef`, and thus only makes sense for term-level variables.
This would have failed prior to the previous commit.
This is mostly in light of the recent change to unify global and local variables. Although `var` has always been a bad name, since it actually constructs only a particular sort of variable (a local one).

Note that this commit is the result of running:
```
retrie --adhoc 'var = lvar'
retrie --adhoc 'global = gvar'
retrie --adhoc 'varref = var'
fourmolu -i $(find primer primer-rel8 primer-service -name '*.hs' -not -path 'primer/test/outputs/*')
```
and manually fixing up a few imports, due to limitations of `retrie`.
@georgefst georgefst force-pushed the georgefst/def-no-id branch from fe47d6c to a6b5c9b Compare March 24, 2022 15:41
@brprice
Copy link
Contributor

brprice commented Mar 24, 2022

OK, I've doctored the history as requested.

Thanks! I'll merge this now

@brprice brprice added the Ready to merge Ready to merge label Mar 24, 2022
@brprice brprice self-requested a review March 24, 2022 16:58
mergify bot added a commit that referenced this pull request Mar 24, 2022
@mergify mergify bot merged commit 40b71c9 into main Mar 24, 2022
@mergify mergify bot deleted the georgefst/def-no-id branch March 24, 2022 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready to merge Ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants