-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(pycodestyle): Reduce allocations when computing logical lines #3715
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinux
Windows
|
f0fff40
to
f86cb60
Compare
f86cb60
to
2c8fbb7
Compare
@@ -106,61 +358,54 @@ fn build_line<'a>( | |||
|| ((prev_text != "{" && prev_text != "[" && prev_text != "(") | |||
&& (text != "}" && text != "]" && text != ")")) | |||
{ | |||
logical.push(' '); | |||
length += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use text.len()
to compute the offsets.
2c8fbb7
to
58b2a7a
Compare
58b2a7a
to
8a08a96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is stellar, I feel like I'm learning a lot even just from reading through your code here.
if is_keyword_token(tok) { | ||
flags.insert(TokenFlags::KEYWORD); | ||
if is_keyword_token(token) { | ||
line.flags.insert(TokenFlags::KEYWORD); | ||
} | ||
|
||
// TODO(charlie): "Mute" strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this TODO is still relevant, it looks like we are "muting" strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary until we remove the regex handling (I think I commented the reason why it exists in one of the upper PRs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, is this comment still relevant? We already do "mute" strings. So isn't the TODO resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still necessary (at least for this PR). I understood this is a TODO to avoid the muting :D. I'll leave it there because it will be gone soon
Tok::Rbrace | Tok::Rpar | Tok::Rsqb => { | ||
parens -= 1; | ||
} | ||
Tok::Newline | Tok::NonLogicalNewline | Tok::Comment(_) if parens == 0 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own understanding, does this mean that the comment consumes the newline after it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so... I don't really know to be honest. I only moved this code around.
I remember that I tried removing the comment branch because I assumed that it isn't needed because there should be a new line token after, finishing the line. But some tests started failing. It may be worth looking into this again on the top branch.
Yep, it seems to be the case that there's no newline
token after Comment
token:
def f():
"""Docstring goes here."""
# Comment goes here.
x = 1
f()"#;
Generates the following lines without the Tok::Comment
[
"def f():",
""""Docstring goes here."""",
"x = 1",
"f()"
]
Where the comment is missing (because comments don't get written to the text string)
8a08a96
to
f62f750
Compare
f62f750
to
7389522
Compare
This PR refactors the logical lines computation that the pycodestyle rules use from using
O(lines)
allocations toO(1)
.The current implementation allocates for every line:
String
for the line's textVec
for the logical-line textoffset
toLocation
mappingsVec
for the line's tokenThis PR reduces the allocations by allocating a single
String
for the text, two Vec's for the mappings and tokens, and oneVec
for storing the line-metadata. The implementation slices into the shared text, mappings, and tokens structures to get the view for a single line.Allocating the data structures for all lines has the advantage of reserving the containers with the right capacity because we know the number of tokens. This avoids the guessing of the old implementation that used some more-or-less arbitrary numbers for the tokens-capacity.
Performance
no-logical
: Logical lines feature disabledbase-logical
: Logical lines feature enabled on mainpr3715
: This branchThis change improves performance overall but the pycodestyle still add a significant overhead.