Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement latin1 encoded JsStrings #3450

Merged
merged 1 commit into from
May 1, 2024
Merged

Implement latin1 encoded JsStrings #3450

merged 1 commit into from
May 1, 2024

Conversation

HalidOdat
Copy link
Member

@HalidOdat HalidOdat commented Nov 3, 2023

This fixes #3097

This memory optimization is implemented by both spidermonkey and v8.

This PR encodes strings that can be represented as latin1 as a byte array, instead of u16 array, this cuts latin1 strings size by 2x.

This may also have some interesting optimization that could be applied when we know that the string is in the ASCII range. Decided to not preserve asciiness though it can work with asciiness preserved it would complicate other string optimizations. AFIK v8 does not preserve the asciiness. Preserving the asciiness could lead to some interesting optimizations but at the cost of a lot of complexity which from my testing doesn't seem to be worth it.

@HalidOdat HalidOdat added the performance Performance related changes and issues label Nov 3, 2023
Copy link

github-actions bot commented Nov 3, 2023

Test262 conformance changes

Test result main count PR count difference
Total 50,731 50,731 0
Passed 42,973 42,973 0
Ignored 1,395 1,395 0
Failed 6,363 6,363 0
Panics 18 18 0
Conformance 84.71% 84.71% 0.00%

@HalidOdat HalidOdat force-pushed the ascii-string branch 3 times, most recently from 2bcb954 to b542069 Compare November 4, 2023 00:23
Copy link

codecov bot commented Nov 4, 2023

Codecov Report

Attention: Patch coverage is 62.01084% with 631 lines in your changes are missing coverage. Please review.

Project coverage is 44.47%. Comparing base (6ddc2b4) to head (52d192a).
Report is 148 commits behind head on main.

❗ Current head 52d192a differs from pull request most recent head bc980e4. Consider uploading reports for the commit bc980e4 to get more accurate results

Files Patch % Lines
boa_cli/src/debug/string.rs 0.00% 47 Missing ⚠️
boa_engine/src/builtins/string/mod.rs 66.41% 44 Missing ⚠️
boa_engine/src/string/mod.rs 79.27% 40 Missing ⚠️
boa_engine/src/builtins/temporal/calendar/mod.rs 47.76% 35 Missing ⚠️
boa_engine/src/string/slice.rs 65.11% 30 Missing ⚠️
boa_engine/src/builtins/intl/collator/mod.rs 10.00% 27 Missing ⚠️
boa_engine/src/builtins/regexp/mod.rs 66.66% 27 Missing ⚠️
boa_engine/src/string/str.rs 52.08% 23 Missing ⚠️
boa_engine/src/builtins/intl/segmenter/mod.rs 15.00% 17 Missing ⚠️
boa_engine/src/builtins/intl/plural_rules/mod.rs 15.78% 16 Missing ⚠️
... and 75 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3450      +/-   ##
==========================================
- Coverage   47.24%   44.47%   -2.77%     
==========================================
  Files         476      490      +14     
  Lines       46892    50360    +3468     
==========================================
+ Hits        22154    22398     +244     
- Misses      24738    27962    +3224     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@HalidOdat HalidOdat force-pushed the ascii-string branch 2 times, most recently from 7829b91 to c71731c Compare November 4, 2023 19:50
@HalidOdat HalidOdat mentioned this pull request Nov 5, 2023
@jedel1043 jedel1043 added the waiting-on-author Waiting on PR changes from the author label Nov 29, 2023
@HalidOdat HalidOdat force-pushed the ascii-string branch 9 times, most recently from 6b1a8b5 to 2873a3b Compare March 11, 2024 01:10
@HalidOdat HalidOdat changed the title Implement WIP Ascii JsString Implement latin1 encoded JsStrings Mar 31, 2024
@HalidOdat HalidOdat force-pushed the ascii-string branch 2 times, most recently from 09305a7 to cea4f38 Compare March 31, 2024 21:11
@HalidOdat HalidOdat added this to the v0.18.1 milestone Mar 31, 2024
@HalidOdat HalidOdat marked this pull request as ready for review April 1, 2024 14:32
@HalidOdat
Copy link
Member Author

This is ready for review/merge 🥳

In terms of performance it's pretty much the same as on main, a very slight regression that doesn't effect the overall score.

I did some analysis on what is the percentage of latin1 and u16 through out the combined.js execution for allocation strings (so static strings are not counted) and found that ~99.63 are latin1 strings, some methods still converts to u16, we could reduce this number even more by not eagerly converting to u16.

latin1: 9459373,
latin1_size: 64732594 (in bytes),
u16: 34755,
u16_size: 4346430 (in bytes),

There was also a reduction on binary size by ~64KB!

Checked locally and there is not difference between main and this PR in conformance, it seems that data repo's test262 data is not being updated?

@HalidOdat HalidOdat requested a review from a team April 1, 2024 15:09
@jedel1043
Copy link
Member

Checked locally and there is not difference between main and this PR in conformance, it seems that data repo's test262 data is not being updated?

That's weird. I checked the output of more recent PRs such as dependabot's and everything looks fine.

@HalidOdat HalidOdat added memory PRs and Issues related to the memory management or memory footprint. Internal Category for changelog labels Apr 1, 2024
@HalidOdat
Copy link
Member Author

That's weird. I checked the output of more recent PRs such as dependabot's and everything looks fine.

I reran the test262 workflow and now it shows the correct conformance! :)

@HalidOdat HalidOdat added waiting-on-review Waiting on reviews from the maintainers and removed waiting-on-author Waiting on PR changes from the author labels Apr 1, 2024
Copy link
Member

@raskad raskad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just got a few comments, otherwise looks great!

core/engine/src/string/mod.rs Outdated Show resolved Hide resolved
docs/boa_object.md Outdated Show resolved Hide resolved
docs/boa_object.md Outdated Show resolved Hide resolved
@HalidOdat HalidOdat requested a review from a team April 4, 2024 00:41
@HalidOdat HalidOdat requested a review from a team April 10, 2024 02:55
Copy link
Member

@jedel1043 jedel1043 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really impressive work!

@jedel1043 jedel1043 added this pull request to the merge queue May 1, 2024
Merged via the queue into main with commit 3f6ee22 May 1, 2024
13 checks passed
@jedel1043 jedel1043 deleted the ascii-string branch May 1, 2024 19:30
@raskad raskad modified the milestones: v0.18.1, v0.19.0 Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internal Category for changelog memory PRs and Issues related to the memory management or memory footprint. performance Performance related changes and issues waiting-on-review Waiting on reviews from the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ASCII encoding JsStrings
3 participants