[browser][non-icu] `HybridGlobalization` checking for prefix/suffix #84920

ilonatommy · 2023-04-17T10:26:03Z

Implements a chunk of web-api based globalization. Is a part of HybridGlobalization feature and contributes to #79989.

Old, icu-based private API: GlobalizationNative_StartsWith, GlobalizationNative_EndsWith

New, non-icu private API: Interop.JsGlobalization.StartsWith, Interop.JsGlobalization.EndsWith

Affected public API (see: tests in CompareInfoTests.IsPrefix.cs, CompareInfoTests.IsSuffix):

CompareInfo.IsPrefix
CompareInfo.IsSuffix
String.StartsWith
String.EndsWith

This implementation hovers with 64kB strings. Average processing times for 16kB is ~200ms.

measurement (64kB strings)	time ICU	time Hybrid	time CoreCLR
String, CompareInfo IsPrefix	5.3665ms	? ms	4.300ms
String, CompareInfo IsSuffix	10.1670ms	? ms	6.700ms
String, String StartsWith	5.4105ms	? ms	4.900ms
String, String EndsWith	10.4519ms	? ms	6.700ms

All changes in behavior are listed in docs\design\features\hybrid-globalization.md.

cc @SamMonoRT

src/mono/wasm/runtime/net6-legacy/hybrid-globalization.ts

pavelsavara · 2023-04-17T11:57:56Z

src/mono/wasm/runtime/net6-legacy/hybrid-globalization.ts

+    // unless we have < 2 empty chars at the beginning
+    if (segment.length === 1 || (segment.length > 1 && "".localeCompare(segment[1].segment, undefined) !== 0))
+    {
+        segment.shift();


Could we pass currentIndex: number and increment it rather than mutate the array with .shift() ?

I think we could, I will try to edit the logic.

pavelsavara · 2023-04-17T12:13:10Z

src/mono/wasm/runtime/net6-legacy/hybrid-globalization.ts

+
+export function segment_string_locale_sensitive(string: string, locale: string | undefined) : Intl.SegmentData[]
+{
+    const segmenter = new Intl.Segmenter(locale, { granularity: "grapheme" });


I would appreciate comment about why we are segmenting the string into "user perceived letters" instead of comparing the whole string ? Is that because of the need to know the match index ?

Would it be possible it to assume that most calls return false (no match) and optimize for it by comparing unsegmented string first ?

Is creation of Segmenter object expensive ?

It's because we need locale-sesitive comparison. We cannot guess how many chars does n letters consist of. Consider the case:

s1 = "A\u0300" // 2 chars, 1 letter s2 = "\u00C0" // 1 char, 1 letter

segmenter recognises both as 1 letter and we can easily compare them. If we were to compare s1[0] === s2[0] then we would get false.

Could we normalize both strings instead ?
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

pavelsavara · 2023-04-17T12:14:14Z

src/mono/wasm/runtime/net6-legacy/hybrid-globalization.ts

+    const segmenter = new Intl.Segmenter(locale, { granularity: "grapheme" });
+    return Array.from(segmenter.segment(string));
+}
+


locale.split on each call to compare_strings is probably expensive.

I added performance tests (for the whole operation, not only segmentation) and it seems it's not that bad.

measurement time ICU time Hybrid

String, CompareInfo IsPrefix 5.3330ms 6.0471ms

String, CompareInfo IsSuffix 0.0091ms 0.0093ms

String, String StartsWith 5.5313ms 6.0296ms

String, String EndsWith 0.0094ms 0.0102ms

I will have to re-investigate it, the numbers are a bit too similar in both cases. I expected bigger differences, something might be wrong with this PR.

Checked. There was an error connected with prefix testing but it does not change the range of ICU vs HG difference. Updated numbers:

measurement time ICU time Hybrid time CoreCLR

String, CompareInfo IsPrefix 5.3665ms 5.6897ms 4.300ms

String, CompareInfo IsSuffix 10.1670ms 10.7293ms 6.700ms

String, String StartsWith 5.4105ms 6.4558ms 4.900ms

String, String EndsWith 10.4519ms 12.8267ms 6.700ms

ilonatommy · 2023-04-19T11:09:37Z

Closing. The PR will get re-designed to improve the performance and re-posted. We will need to drop support for matchLength calculation for this reason but based on the usage check, it is more beneficial to not support it for the cost of higher performance.

ilonatommy added 2 commits April 17, 2023 10:47

Working version.

9a232e5

Optimization.

198a9ab

ilonatommy added arch-wasm WebAssembly architecture area-System.Globalization labels Apr 17, 2023

ilonatommy requested a review from mkhamoyan April 17, 2023 10:26

ilonatommy self-assigned this Apr 17, 2023

ilonatommy requested review from lewing, pavelsavara and kg as code owners April 17, 2023 10:26

pavelsavara reviewed Apr 17, 2023

View reviewed changes

build-analysis bot mentioned this pull request Apr 17, 2023

WasmTestOnBrowser-System.Text.Json.Tests.WorkItemExecution timing out #84434

Closed

ilonatommy added 5 commits April 17, 2023 15:39

Store exception texts in resource strings.

5785cd2

Added performance tests.

5262338

Merge branch 'main' into hg-starts-ends-with

58ed40e

Fix typo.

0ed62ca

Merge typo.

97723c7

build-analysis bot mentioned this pull request Apr 18, 2023

Tracking issue for CI build timeouts #76454

Closed

Fix IsPrefix test data.

85dda03

build-analysis bot mentioned this pull request Apr 18, 2023

nativeaot/SmokeTests/DwarfDump failing on linux-x64 Debug #84979

Closed

Nit.

27786ae

ilonatommy closed this Apr 19, 2023

build-analysis bot mentioned this pull request Apr 20, 2023

[wasm] interpreter timeouts when WebSocket closes unexpectedly #84101

Closed

ghost locked as resolved and limited conversation to collaborators May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[browser][non-icu] `HybridGlobalization` checking for prefix/suffix #84920

[browser][non-icu] `HybridGlobalization` checking for prefix/suffix #84920

ilonatommy commented Apr 17, 2023 •

edited

Loading

pavelsavara Apr 17, 2023

ilonatommy Apr 17, 2023

pavelsavara Apr 17, 2023

ilonatommy Apr 17, 2023 •

edited

Loading

pavelsavara Apr 18, 2023

pavelsavara Apr 17, 2023

ilonatommy Apr 17, 2023 •

edited

Loading

ilonatommy Apr 17, 2023

ilonatommy Apr 18, 2023 •

edited

Loading

ilonatommy commented Apr 19, 2023

measurement	time ICU	time Hybrid
String, CompareInfo IsPrefix	5.3330ms	6.0471ms
String, CompareInfo IsSuffix	0.0091ms	0.0093ms
String, String StartsWith	5.5313ms	6.0296ms
String, String EndsWith	0.0094ms	0.0102ms

measurement	time ICU	time Hybrid	time CoreCLR
String, CompareInfo IsPrefix	5.3665ms	5.6897ms	4.300ms
String, CompareInfo IsSuffix	10.1670ms	10.7293ms	6.700ms
String, String StartsWith	5.4105ms	6.4558ms	4.900ms
String, String EndsWith	10.4519ms	12.8267ms	6.700ms

[browser][non-icu] HybridGlobalization checking for prefix/suffix #84920

[browser][non-icu] HybridGlobalization checking for prefix/suffix #84920

Conversation

ilonatommy commented Apr 17, 2023 • edited Loading

pavelsavara Apr 17, 2023

Choose a reason for hiding this comment

ilonatommy Apr 17, 2023

Choose a reason for hiding this comment

pavelsavara Apr 17, 2023

Choose a reason for hiding this comment

ilonatommy Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

pavelsavara Apr 18, 2023

Choose a reason for hiding this comment

pavelsavara Apr 17, 2023

Choose a reason for hiding this comment

ilonatommy Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

ilonatommy Apr 17, 2023

Choose a reason for hiding this comment

ilonatommy Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

ilonatommy commented Apr 19, 2023

[browser][non-icu] `HybridGlobalization` checking for prefix/suffix #84920

[browser][non-icu] `HybridGlobalization` checking for prefix/suffix #84920

ilonatommy commented Apr 17, 2023 •

edited

Loading

ilonatommy Apr 17, 2023 •

edited

Loading

ilonatommy Apr 17, 2023 •

edited

Loading

ilonatommy Apr 18, 2023 •

edited

Loading