Implement latin1 encoded `JsString`s #3450

HalidOdat · 2023-11-03T22:06:06Z

This fixes #3097

This memory optimization is implemented by both spidermonkey and v8.

This PR encodes strings that can be represented as latin1 as a byte array, instead of u16 array, this cuts latin1 strings size by 2x.

~~This may also have some interesting optimization that could be applied when we know that the string is in the ASCII range.~~ Decided to not preserve asciiness though it can work with asciiness preserved it would complicate other string optimizations. AFIK v8 does not preserve the asciiness. Preserving the asciiness could lead to some interesting optimizations but at the cost of a lot of complexity which from my testing doesn't seem to be worth it.

github-actions · 2023-11-03T22:25:35Z

Test262 conformance changes

Test result	main count	PR count	difference
Total	50,731	50,731	0
Passed	42,973	42,973	0
Ignored	1,395	1,395	0
Failed	6,363	6,363	0
Panics	18	18	0
Conformance	84.71%	84.71%	0.00%

codecov · 2023-11-04T17:55:09Z

Codecov Report

Attention: Patch coverage is 62.01084% with 631 lines in your changes are missing coverage. Please review.

Project coverage is 44.47%. Comparing base (6ddc2b4) to head (52d192a).
Report is 148 commits behind head on main.

❗ Current head 52d192a differs from pull request most recent head bc980e4. Consider uploading reports for the commit bc980e4 to get more accurate results

Files	Patch %	Lines
boa_cli/src/debug/string.rs	0.00%	47 Missing ⚠️
boa_engine/src/builtins/string/mod.rs	66.41%	44 Missing ⚠️
boa_engine/src/string/mod.rs	79.27%	40 Missing ⚠️
boa_engine/src/builtins/temporal/calendar/mod.rs	47.76%	35 Missing ⚠️
boa_engine/src/string/slice.rs	65.11%	30 Missing ⚠️
boa_engine/src/builtins/intl/collator/mod.rs	10.00%	27 Missing ⚠️
boa_engine/src/builtins/regexp/mod.rs	66.66%	27 Missing ⚠️
boa_engine/src/string/str.rs	52.08%	23 Missing ⚠️
boa_engine/src/builtins/intl/segmenter/mod.rs	15.00%	17 Missing ⚠️
boa_engine/src/builtins/intl/plural_rules/mod.rs	15.78%	16 Missing ⚠️
... and 75 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3450      +/-   ##
==========================================
- Coverage   47.24%   44.47%   -2.77%     
==========================================
  Files         476      490      +14     
  Lines       46892    50360    +3468     
==========================================
+ Hits        22154    22398     +244     
- Misses      24738    27962    +3224

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

boa_engine/src/string/mod.rs

HalidOdat · 2024-04-01T15:08:35Z

This is ready for review/merge 🥳

In terms of performance it's pretty much the same as on main, a very slight regression that doesn't effect the overall score.

I did some analysis on what is the percentage of latin1 and u16 through out the combined.js execution for allocation strings (so static strings are not counted) and found that ~99.63 are latin1 strings, some methods still converts to u16, we could reduce this number even more by not eagerly converting to u16.

latin1: 9459373,
latin1_size: 64732594 (in bytes),
u16: 34755,
u16_size: 4346430 (in bytes),

There was also a reduction on binary size by ~64KB!

Checked locally and there is not difference between main and this PR in conformance, it seems that data repo's test262 data is not being updated?

jedel1043 · 2024-04-01T15:34:13Z

Checked locally and there is not difference between main and this PR in conformance, it seems that data repo's test262 data is not being updated?

That's weird. I checked the output of more recent PRs such as dependabot's and everything looks fine.

HalidOdat · 2024-04-01T16:39:44Z

That's weird. I checked the output of more recent PRs such as dependabot's and everything looks fine.

I reran the test262 workflow and now it shows the correct conformance! :)

raskad

Just got a few comments, otherwise looks great!

core/engine/src/string/mod.rs

docs/boa_object.md

core/engine/src/object/mod.rs

jedel1043

Really impressive work!

HalidOdat added the performance Performance related changes and issues label Nov 3, 2023

HalidOdat force-pushed the ascii-string branch from 89ca28e to 4b344b9 Compare November 3, 2023 22:16

HalidOdat force-pushed the ascii-string branch 3 times, most recently from 2bcb954 to b542069 Compare November 4, 2023 00:23

HalidOdat force-pushed the ascii-string branch 2 times, most recently from 7829b91 to c71731c Compare November 4, 2023 19:50

HalidOdat mentioned this pull request Nov 5, 2023

Refactor interner #3452

Closed

jedel1043 reviewed Nov 7, 2023

View reviewed changes

boa_engine/src/string/mod.rs Outdated Show resolved Hide resolved

HalidOdat force-pushed the ascii-string branch from c71731c to 52d192a Compare November 12, 2023 20:51

jedel1043 added the waiting-on-author Waiting on PR changes from the author label Nov 29, 2023

HalidOdat force-pushed the ascii-string branch 9 times, most recently from 6b1a8b5 to 2873a3b Compare March 11, 2024 01:10

HalidOdat mentioned this pull request Mar 24, 2024

Dense array storage variants for i32 and f64 #3760

Merged

HalidOdat force-pushed the ascii-string branch from 2873a3b to 6a7c327 Compare March 28, 2024 05:05

HalidOdat changed the title ~~Implement WIP Ascii JsString~~ Implement latin1 encoded JsStrings Mar 31, 2024

HalidOdat force-pushed the ascii-string branch 2 times, most recently from 09305a7 to cea4f38 Compare March 31, 2024 21:11

HalidOdat added this to the v0.18.1 milestone Mar 31, 2024

HalidOdat marked this pull request as ready for review April 1, 2024 14:32

HalidOdat requested a review from a team April 1, 2024 15:09

HalidOdat added memory PRs and Issues related to the memory management or memory footprint. Internal Category for changelog labels Apr 1, 2024

HalidOdat added waiting-on-review Waiting on reviews from the maintainers and removed waiting-on-author Waiting on PR changes from the author labels Apr 1, 2024

raskad reviewed Apr 3, 2024

View reviewed changes

core/engine/src/string/mod.rs Outdated Show resolved Hide resolved

docs/boa_object.md Outdated Show resolved Hide resolved

docs/boa_object.md Outdated Show resolved Hide resolved

HalidOdat force-pushed the ascii-string branch from cea4f38 to 1614e3e Compare April 4, 2024 00:41

HalidOdat requested a review from a team April 4, 2024 00:41

raskad approved these changes Apr 4, 2024

View reviewed changes

HalidOdat requested a review from a team April 10, 2024 02:55

jedel1043 reviewed Apr 12, 2024

View reviewed changes

core/engine/src/object/mod.rs Outdated Show resolved Hide resolved

HalidOdat force-pushed the ascii-string branch from 1614e3e to 707cddb Compare April 18, 2024 05:00

HalidOdat force-pushed the ascii-string branch from 707cddb to 1da4a27 Compare April 25, 2024 19:07

HalidOdat requested a review from a team April 26, 2024 14:54

HalidOdat mentioned this pull request Apr 27, 2024

Separate JsString into its own crate #3831

Closed

Implement Latin1 JsString

bc980e4

HalidOdat force-pushed the ascii-string branch from 1da4a27 to bc980e4 Compare April 28, 2024 17:21

jedel1043 approved these changes May 1, 2024

View reviewed changes

jedel1043 added this pull request to the merge queue May 1, 2024

Merged via the queue into main with commit 3f6ee22 May 1, 2024
13 checks passed

jedel1043 deleted the ascii-string branch May 1, 2024 19:30

raskad modified the milestones: v0.18.1, v0.19.0 Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement latin1 encoded `JsString`s #3450

Implement latin1 encoded `JsString`s #3450

HalidOdat commented Nov 3, 2023 •

edited

Loading

github-actions bot commented Nov 3, 2023 •

edited

Loading

codecov bot commented Nov 4, 2023 •

edited

Loading

HalidOdat commented Apr 1, 2024

jedel1043 commented Apr 1, 2024

HalidOdat commented Apr 1, 2024

raskad left a comment

jedel1043 left a comment

Implement latin1 encoded JsStrings #3450

Implement latin1 encoded JsStrings #3450

Conversation

HalidOdat commented Nov 3, 2023 • edited Loading

github-actions bot commented Nov 3, 2023 • edited Loading

Test262 conformance changes

codecov bot commented Nov 4, 2023 • edited Loading

Codecov Report

HalidOdat commented Apr 1, 2024

jedel1043 commented Apr 1, 2024

HalidOdat commented Apr 1, 2024

raskad left a comment

Choose a reason for hiding this comment

jedel1043 left a comment

Choose a reason for hiding this comment

Implement latin1 encoded `JsString`s #3450

Implement latin1 encoded `JsString`s #3450

HalidOdat commented Nov 3, 2023 •

edited

Loading

github-actions bot commented Nov 3, 2023 •

edited

Loading

codecov bot commented Nov 4, 2023 •

edited

Loading