Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lexing escapes in string literal and minor refactor #1079

Merged
merged 23 commits into from
Jan 20, 2021

Conversation

jevancc
Copy link
Contributor

@jevancc jevancc commented Jan 18, 2021

This Pull Request fixes/closes #1078 and some bugs mentioned in the discussion of this PR.

It changes the following:

  • Minor refactor string literal lexer (for future use in lexing unicode escape in identifier name)
  • Fix lexing octal escape sequence in string literal
  • Fix behavior of zero escape sequence
  • Fix \ followed by a line terminator sequence
  • Fix \v
  • Fix lexing \8 and \9 in strict mode.
  • Add more tests

@codecov
Copy link

codecov bot commented Jan 18, 2021

Codecov Report

Merging #1079 (17eb410) into master (5a5061c) will increase coverage by 0.04%.
The diff coverage is 61.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1079      +/-   ##
==========================================
+ Coverage   58.70%   58.74%   +0.04%     
==========================================
  Files         176      176              
  Lines       12391    12459      +68     
==========================================
+ Hits         7274     7319      +45     
- Misses       5117     5140      +23     
Impacted Files Coverage Δ
boa/src/syntax/lexer/template.rs 48.78% <ø> (ø)
boa/src/syntax/lexer/string.rs 60.74% <60.00%> (+7.65%) ⬆️
boa/src/builtins/map/mod.rs 71.08% <70.00%> (-1.29%) ⬇️
boa/src/builtins/map/ordered_map.rs 57.57% <0.00%> (-6.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a5061c...6fa0642. Read the comment docs.

@tofpie tofpie added bug Something isn't working lexer Issues surrounding the lexer labels Jan 18, 2021
@Razican
Copy link
Member

Razican commented Jan 18, 2021

Test262 conformance changes:

Test result master count PR count difference
Total 78,497 78,497 0
Passed 24,813 24,836 +23
Ignored 15,587 15,587 0
Failed 38,097 38,074 -23
Panics 20 20 0
Conformance 31.61 31.64 +0.03%

@jevancc
Copy link
Contributor Author

jevancc commented Jan 18, 2021

Test262 conformance changes:

Test result master count PR count difference
Total 78,497 78,497 0
Passed 24,805 24,811 +6
Ignored 15,587 15,587 0
Failed 38,105 38,099 -6
Panics 26 27 +1
Conformance 31.60 31.61 +0.01%

The new panic is fixed. Now the changes become Passed +7.

Copy link
Member

@RageKnify RageKnify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job!

@tofpie
Copy link
Contributor

tofpie commented Jan 18, 2021

Thanks a lot. This looks good to me.

There is a very last tiny corner case that is not handled: when a string contains \8 or \9. This results into a 8 or 9 in non-strict mode, but in strict mode (or in template literals), this should throw a syntax error.

And this is not related to octal escape sequence, but this is part of escape sequence, there is an issue with \v that is currently not handled correctly, it produces v instead of Vertical Tabulation (Unicode 0x000B). Maybe you could also add this fix to this PR.

@jevancc jevancc changed the title Fix octal escape sequence in string literal Fix lexing escapes in string literal and minor refactor Jan 18, 2021
@jevancc
Copy link
Contributor Author

jevancc commented Jan 18, 2021

Thanks a lot. This looks good to me.

There is a very last tiny corner case that is not handled: when a string contains \8 or \9. This results into a 8 or 9 in non-strict mode, but in strict mode (or in template literals), this should throw a syntax error.

And this is not related to octal escape sequence, but this is part of escape sequence, there is an issue with \v that is currently not handled correctly, it produces v instead of Vertical Tabulation (Unicode 0x000B). Maybe you could also add this fix to this PR.

Ok, I just renamed this PR and will do all the fixes in it. I will re-request reviews after I finish it.

For recording the issue that will be fixed in this PR: \ followed by a line terminator is also wrong.

@jevancc
Copy link
Contributor Author

jevancc commented Jan 19, 2021

@tofpie the bugs you mentioned are addressed. I cannot request your review through github so I tagged you in this comment :)

Copy link
Member

@RageKnify RageKnify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the refactor, separating into functions helps a lot in understanding the code.
Don't like the loose unsafe, either we remove it for a minor performance penalty or add a comment (I'm fine with either solution).

Updated @Razican's comment with the conformance numbers, improved a bit more.

boa/src/syntax/lexer/string.rs Outdated Show resolved Hide resolved
boa/src/syntax/lexer/string.rs Outdated Show resolved Hide resolved
boa/src/syntax/lexer/string.rs Outdated Show resolved Hide resolved
@RageKnify RageKnify merged commit 00fc5e2 into boa-dev:master Jan 20, 2021
Razican pushed a commit that referenced this pull request May 22, 2021
* Refactor StringLiteral

* Fix octal escape in string literal

* Add tests

* Fix zero escape

* Fix zero escape lookahead

* Rename variables

* Rename helper functions

* Refactor match arms

* Fix escape line terminator sequence

* Fix single character escape

* Fix escape followed by unicode char

* Add NonOctalDecimalEscapeSequence

* Fix comment

* Refactor

* Modify error message

* Add tests

* Rename tests

* Add test for error

* Add comments for unsafe bytes to str

* Update boa/src/syntax/lexer/string.rs

Co-authored-by: tofpie <[email protected]>

* Minor refactor

* Remove unsafe bytes to str

* Fix panic when reading invalid utf-8 chars

Co-authored-by: tofpie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lexer Issues surrounding the lexer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Octal escape sequence in string doesn't work
5 participants