-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up type literal lexing and make it more strict. #4430
Conversation
llvm::StringRef suffix = word.substr(1); | ||
if (!CanLexInt(emitter_, suffix)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at other callers for CanLexInt, should string literals be sharing the logic? i.e., should CanLexInt be factored into just numeric literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I'm happy to look at those next?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about this further... maybe you should look at it here, because you're taking a shared function and implementing a specialized version, rather than continuing sharing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, happy to look at this, but would prefer in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, PTAL!
llvm::StringRef suffix = word.substr(1); | ||
if (!CanLexInt(emitter_, suffix)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I'm happy to look at those next?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, because I expect you're going to be of a different mind about the importance of sharing of code.
Thanks, will look at the other things in follow-ups! |
This rejects type literals with more digits than we can lex without APInt's help, and using a custom diagnostic. This is a pretty arbitrary implementation limit, I'm wide open to even more strict rules here. Despite no special casing and a very simplistic approach, by not using APInt this completely eliminates the lexing overhead for `i32` in the generated compilation benchmark where that specific type literal is very common. We see a 10% improvement in lexing there: ``` BM_CompileAPIFileDenseDecls<Phase::Lex>/256 39.0µs ± 4% 34.8µs ± 2% -10.86% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/1024 180µs ± 1% 158µs ± 2% -12.22% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/4096 731µs ± 2% 641µs ± 1% -12.31% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/16384 3.20ms ± 2% 2.86ms ± 2% -10.47% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/65536 13.8ms ± 1% 12.4ms ± 2% -9.78% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/262144 64.0ms ± 2% 58.4ms ± 2% -8.70% (p=0.000 n=19+18) ``` This starts to fix a TODO in the diagnostic for these by giving a reasonably good diagnostic about a very large type literal. However, in practice it regresses the diagnostics because error tokens produce noisy extraneous diagnostics from parse and check currently. Leaving the TODO there, and I have a follow-up PR to start improving the extraneous diagnostics.
:sigh: Looks like |
I've restored the for loop for now, but maybe there is another approach? Nothing really simple comes to mind, and the loop is at least somewhat simple... FWIW, it appears to be a bit faster to use the loop on x86. (About the same on Arm for me.) |
An invalid parse due to an error token isn't likely a great diagnostic as it will already have been diagnosed by the lexer. A common case to start handling that is when the parser encounters an invalid token when expecting an expression. This removes a number of unhelpful diagnostics after the lexer has done a good job diagnosing. This also means that there may be parse tree errors that aren't diagnosed when there are lexer-diagnosed errors, so track that. Follow-up to #4430 that almost finishes addressing its diagnostic TODO.
This rejects type literals with more digits than we can lex without APInt's help, and using a custom diagnostic. This is a pretty arbitrary implementation limit, I'm wide open to even more strict rules here.
Despite no special casing and a very simplistic approach, by not using APInt this completely eliminates the lexing overhead for
i32
in the generated compilation benchmark where that specific type literal is very common. We see a 10% improvement in lexing there:This starts to fix a TODO in the diagnostic for these by giving a reasonably good diagnostic about a very large type literal. However, in practice it regresses the diagnostics because error tokens produce noisy extraneous diagnostics from parse and check currently. Leaving the TODO there, and I have a follow-up PR to start improving the extraneous diagnostics.