-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove type parameter from parse_*
methods
#9466
Conversation
ae6b1ac
to
3112468
Compare
CodSpeed Performance ReportMerging #9466 will degrade performances by 6.76%Comparing Summary
Benchmarks breakdown
|
6552632
to
7403d2d
Compare
7403d2d
to
9d17580
Compare
|
parse_*
methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me given the explanation (especially around the perceived regression) in the summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy what you're selling.
let marker_token = (Tok::start_marker(mode), TextRange::default()); | ||
let lexer = std::iter::once(Ok(marker_token)).chain(lxr); | ||
let lexer = std::iter::once(Ok(marker_token)).chain(TokenSource::new(tokens)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to figure out why this is a dedicated type instead of a tokens.into_iter().filter(...)
but couldn't come up with a reason. (Usually it's because you want to name the type somewhere, but maybe I'm missing that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mainly to have a named type in #9152 and a place where we can implement lookahead without using PeekMany
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotya, don't mind me then. :)
LGTM. I was actually going to something like this later, glad you saved me the work 😊 |
This PR removes the
I
(tokens iterator) type parameter from allparse_*
methods. This is done in preparation for #9152 to avoid accidentally monomorphizing the parser more than once.The
Parser
struct defined in #9152 is parametrized by the underlying tokens iterator. This is dangerous because Rust will monomorphize the entire parser for each distinctI
type parameter.This PR removes the
I
type parameters from allparse_*
methods and introduces a newTokenSource
type in the parser that:MultiPeek
(which has an overhead when reading tokens)Parser
to read tokens.The downside is that all
parse_
methods now take aVec<LexResult>
, which requires collecting the lexer result into a Vec before parsing. Our micro-benchmarks show that this is slower than streaming the tokens one by one.However, it turns out that both the linter and formatter both collect the tokens before parsing them to an AST. That means the path tested by the micro-benchmark isn't used in the actual product and the introduced regression doesn't affect users.
You may wonder why this is only a problem now but hasn't been a problem with lalrpop. The problem existed with lalrpop as well but to a much smaller extent because lalrpop separates the parser into two parts:
I
that consumes the tokens and calls into the parser methods (only passing the tokens)Only the state machine gets monomorphized, which is fine because it is limited in size. Another way to think about it is that the lalrpop pushes the tokens into the parser (and, therefore, the parser is decoupled from I). In contrast, the handwritten parser pulls the tokens (and, therefore, is generic over I).
Alternatives
Alternatives that may be less affected by the performance regression are
TokenSource
enum with two variants: One storing the lexed tokens and the other storing the lexer to consume the tokens lazily.Box<impl Iterator>
in the hand written parserI went with the above approach because we don't need the flexibility of either lexing lazily or eagerly in Ruff outside the parser benchmark. Thus, using a
Vec<LexResult>
seemed the easiest solution.CC: @LaBatata101