Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition rustc Parser to proc_macro token model #63689

Open
matklad opened this issue Aug 18, 2019 · 8 comments
Open

Transition rustc Parser to proc_macro token model #63689

matklad opened this issue Aug 18, 2019 · 8 comments
Labels
A-parser Area: The parsing of Rust source code to an AST C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@matklad
Copy link
Member

matklad commented Aug 18, 2019

Currently, there are two different approaches for dealing with composite tokens like >> in rustc.

  1. Keep tokens in composed form, and split into pieces, > and >, when necessary.
  2. Keep tokens decomposed, with jointness information, and join tokens when necessary.

At the moment, the first approach is used by the parser, and the second approach is used by the proc_macro API. It would be good to move the parser to the decomposed approach as well, as it is somewhat more natural, more future-compatible (one can introduce new tokens) and having two of a thing is bad in itself!

Here are some relevant bits of the code that handle composed model:

  • Composed tokens as produced by rustc_lexer
  • Composed tokens preserved by the token cooking
  • Here's the bit when we produce a TokenTree, consumed by the parser. Note that, although we are tracking jointness here, the tokens are composed.
  • Here's the bit of the parser which decomposes tokens on the fly.

Here are the bits relevant to decomposed model:

Note that the tt matcher in macro_rules eats one composed token, and this is affects language specification.
That is, when we transition to decomposed model, we'll need to fix this code to eat one composed token to maintain backwards compatibility.

@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 18, 2019
@matklad
Copy link
Member Author

matklad commented Aug 18, 2019

cc @petrochenkov

Centril added a commit to Centril/rust that referenced this issue Aug 20, 2019
…henkov

Move token gluing to token stream parsing

work towards rust-lang#63689, this moves token gluing from the lexer to the token tree layer. This is only a minimal step, but I like the negative diff here.

r? @petrochenkov
bors added a commit that referenced this issue Aug 20, 2019
Move token gluing to token stream parsing

work towards #63689, this moves token gluing from the lexer to the token tree layer. This is only a minimal step, but I like the negative diff here.

r? @petrochenkov
@matklad
Copy link
Member Author

matklad commented Aug 31, 2019

made some initial stabs in matklad@0d46730.

The idea is to remove cases from Token::glue one by one, until no tokens are glued together, except for tt matcher.

Faced a couple of problems:

  • parse_assert accepts an &[TokenTree], which throughs away jointness info, so assert!(1 != 1) does not parse
  • NtTT should be changed from holding a TokenTree to holding an TokenStream, to account for the fact that $tt:tt eats <<, which are two tokens in the new model

EDIT: more stabs at https://github.com/matklad/rust/tree/decomposed-neq. fixed all parser problems, not it looks like we are loosing jointness info somewhere..

@matklad
Copy link
Member Author

matklad commented Sep 16, 2019

Fond next obstacle: in macros by example, quoted::TokenTree erases jointness information, so

macro_rules! m { () => (==) }

produces = =. Note that jointness seems to be correctly preserved by macro invocations. I guess we should change quoted::Delimited to store token trees with jointness, to better mirror the TokenStream.

@matklad
Copy link
Member Author

matklad commented Sep 16, 2019

@petrochenkov is ^ a good plan? Or are there any bigger plans for refactoring quote, which we should do first?

@petrochenkov
Copy link
Contributor

I never thought about this case.
Preserving jointness would probably be a good start unconditionally.

IIRC, the stuff in syntax::ext::tt::quoted should behave exactly like usual token streams, it's just re-hashed slightly for more convenient work with macro_rules DSL.
(Maybe it doesn't even add too much convenience and can be removed? Who knows.)
Anyway, if it behaves differently than regular token streams, it's something that's better fixed.

@matklad
Copy link
Member Author

matklad commented Sep 16, 2019

(Maybe it doesn't even add too much convenience and can be removed? Who knows.)

Hm, so that we represent $var:ty as literally $var:ty, and just match them on the fly while transcribing? I guess I'll try this approach for mbe in rust-analyzer. Currently, we blindly copy rustc approach with duplicated TokenStream.

For rustc, I feel like it's better to stick with the current model for the time being

@matklad
Copy link
Member Author

matklad commented Sep 16, 2019

Note there's another place, besides tt matcher, where we leak jointness: separators in repetitions:

macro_rules! m {
    ($()>>=*) => ()
}

m!(>>=  >>=);

playground

@matklad
Copy link
Member Author

matklad commented Sep 17, 2019

Maybe it doesn't even add too much convenience and can be removed?

This worked out quite nicely for rust-analyzer: rust-lang/rust-analyzer#1858. I think we should maybe do this for rustc as well, but probably after moving to disjoint model.

matklad added a commit to matklad/rust that referenced this issue Aug 14, 2020
matklad added a commit to matklad/rust that referenced this issue Aug 31, 2020
After the recent refactorings, we can actually completely hide this
type. It should help with rust-lang#63689.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants