-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UTF-16 input parsing #3538
Add UTF-16 input parsing #3538
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #3538 +/- ##
==========================================
- Coverage 47.42% 47.37% -0.05%
==========================================
Files 470 472 +2
Lines 45690 45643 -47
==========================================
- Hits 21667 21625 -42
+ Misses 24023 24018 -5 ☔ View full report in Codecov by Sentry. |
Test262 conformance changes
Fixed tests (8):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great to finally have proper UTF-16 parsing! I just have some nitpicks that don't block merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @raskad!
Just a nitpick :)
We currently have UTF-16 handling at runtime trough our
JsString
type. But one thing that is missing is support for UTF-16 input handling itself. One case where this is relevant for boa itself it in the handling ofeval
. Because we only allow UTF-8 input, we have to convert the input ofeval
to UTF-8 before parsing it.To solve this, I added a
ReadChar
trait to the parser crate that allows the parser to handle different input encodings. The parsing itself is done on unicode code points. To make this work I removed all of the parsing that was done on bytes, since that was assuming that we only handle UTF-8 inputs. Most of that work is in 55752f2.I added a UTF-16 input type. To get some first positive results I also adjusted the regex parsing to work for non UTF-8 inputs. That should give us some
eval
and regex related tests that pass now.