Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Un-inline the enums in the CST #698

Merged
merged 3 commits into from
Dec 8, 2023
Merged

Conversation

Xanewok
Copy link
Contributor

@Xanewok Xanewok commented Dec 6, 2023

Part of #638

As we discussed, inlining some nodes types causes us to lose information that is otherwise hard/inconvenient to reconstruct and this hopefully increases the usability of CST alone.

@Xanewok Xanewok requested a review from a team as a code owner December 6, 2023 15:07
Copy link

changeset-bot bot commented Dec 6, 2023

⚠️ No Changeset found

Latest commit: 5329d30

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Comment on lines +111 to +114
- VariableDeclarationType (Rule): # 157..166 "\t\tuint256"
- TypeName (Rule): # 157..166 "\t\tuint256"
- ElementaryType (Rule): # 157..166 "\t\tuint256"
- UintKeyword (Token): "uint256" # 159..166
Copy link
Contributor

@OmarTawfik OmarTawfik Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea we discussed before when we talked about this is flattening the tree structure of the tests a little bit to make it more readable/usable. So, for the below example:

  • Instead of each node on a separate line, we can group rules that have only one child (that is also a rule) into the same line, as they will have the same exact range and preview comment.
  • Removing (Rule) and (Token) suffixes, since we already make the dinstinction through the YAML value/LHS on the same line.

Just putting the idea out here. Definetely not blocking for this PR, as we probably should do it in a subsequent PR to make it easier to review.

Suggested change
- VariableDeclarationType (Rule): # 157..166 "\t\tuint256"
- TypeName (Rule): # 157..166 "\t\tuint256"
- ElementaryType (Rule): # 157..166 "\t\tuint256"
- UintKeyword (Token): "uint256" # 159..166
- VariableDeclarationType > TypeName > ElementaryType: # 157..166 "\t\tuint256"
- UintKeyword: "uint256" # 159..166

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a TODO for myself.

@@ -322,8 +322,7 @@ fn resolve_grammar_element(ident: &Identifier, ctx: &mut ResolveCtx) -> GrammarE
let thunk = Rc::new(NamedParserThunk {
name: ident.to_string().leak(),
context: lex_ctx,
// Enums have a single reference per variant, so they should be inlined.
is_inline: matches!(elem.as_ref(), Item::Enum { .. }),
is_inline: false,
Copy link
Contributor

@OmarTawfik OmarTawfik Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to get @AntonyBlakey's eyes on this one. Some of these nodes look very useful, for example ContractMember and ConstructorAttribute, as it makes it easier to find/match on these elements.

However, some of them look extraneous indeed, and are only there for purpose of authoring the grammar/versioning. For example:

  • ElementaryType and YulLiteral which will (almost) always have a unique parent that can be matched against. Maybe we can refactor the grammar a bit to make this more accurate?

  • TypedTupleMember and UntypedTupleMember which only exist to make parsing/backtracking correct, but provide no additional meaning. Not sure how to avoid it.

I’m trying to avoid adding optional inlining to the DSL unless we absolutely need it. As based on all of our new design decisions, and AST structure, it will make it less obvious to go from grammar to CST/AST, and add another layer of complexity that users have to deal with ..

Without inlining, people can easily depend on the fact that any NonTerminal node is convertible to its matching AST type, and vice versa. If we start to have some inlined enums, it won’t be obvious which CST nodes can be converted to AST types directly. And vice versa, it will be confusing if some AST types started returning a root node that have a different NonTerminalKind.

So, if we are happy with these changes, I’m fine with merging the PR as-is for now, and I can manually go over the enums to see if any of them can be better structured. For example, inlining something like ElementaryType variants into the types it references, since it is almost always used inside another Enum. It will probably be more nuanced than that though.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums that are artefacts of the grammar machinations are not of interest to the CST or the AST. We should I think find a better way to achieve whatever they accomplish. Although I am concerned about the 'almost' comment, because that seems to imply that the constraint is not a logical necessity, and so it must be surfaced as a parent + child.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the non-interesting enum into something else in the grammar i.e. effectively add a non-outlined characteristic by using a different name for the concept. It sure sounds like it has a different purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f2f: we think the PR is good enough for now, and let's review the added kinds later to see if anything can be pruned/removed.

@Xanewok
Copy link
Contributor Author

Xanewok commented Dec 8, 2023

Rebased and force-pushed; this is still just c4c2f4a + re-generated files and adjusted test.

@Xanewok Xanewok enabled auto-merge December 8, 2023 11:24
@Xanewok Xanewok added this pull request to the merge queue Dec 8, 2023
Merged via the queue into NomicFoundation:main with commit 678545e Dec 8, 2023
1 check passed
@Xanewok Xanewok deleted the uninline-enums branch December 8, 2023 11:46
OmarTawfik added a commit to OmarTawfik-forks/slang that referenced this pull request Dec 12, 2023
github-merge-queue bot pushed a commit that referenced this pull request Jan 10, 2024
Context:
#698 (comment)

- combined parents with a single child on the same line
- using the `꞉` unicode character instead of colon `:` to separate node
name and kind, in order not to break YAML parsing/formatting.
- surround entire nodes with parenthesis instead of just the kind, to
make it easier to read.
- include whitespace in the snapshots, since they now take less visual
space, and it will make it easier to spot changes to trivia during
development.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants