Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize storage for IcuLocaleData data #45129

Closed
marek-safar opened this issue Nov 23, 2020 · 4 comments · Fixed by #45296
Closed

Optimize storage for IcuLocaleData data #45129

marek-safar opened this issue Nov 23, 2020 · 4 comments · Fixed by #45296
Labels
area-System.Globalization linkable-framework Issues associated with delivering a linker friendly framework
Milestone

Comments

@marek-safar
Copy link
Contributor

The massive string which is declared in

is cost close to 10kb on the disc. It should be possible to optimize this to reduce the size as the data are in a very narrow range (low case ASCII only).

@eerhardt

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Nov 23, 2020
@ghost
Copy link

ghost commented Nov 23, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

The massive string which is declared in

is cost close to 10kb on the disc. It should be possible to optimize this to reduce the size as the data are in a very narrow range (low case ASCII only).

@eerhardt

Author: marek-safar
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@marek-safar marek-safar added the linkable-framework Issues associated with delivering a linker friendly framework label Nov 23, 2020
@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

[edited comment]

@marek-safar do you suggest we compress it and expand it at runtime? I think we can store it as ASCII encoding or store it as compressed binary and expand it at runtime.

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Nov 23, 2020
@tarekgh tarekgh added this to the 6.0.0 milestone Nov 23, 2020
@krwq
Copy link
Member

krwq commented Nov 24, 2020

@tarekgh I think suggestion might be to use UTF-8 or ASCII rather than UTF-16, that should remove almost half of the bytes

@marek-safar
Copy link
Contributor Author

do you suggest we compress it and expand it at runtime

Yeah, story it more efficiently so we don't have a massive string in metadata and on a whole copy of it on the heap. Most apps will use a few lcid strings only anyway. There are numerous way how to do it one of them could be to replace s_lcids, s_localeNamesIndices arrays with something like

static ReadOnlySpan<byte>values => new byte[] {
   // "az" or "az-cyrl" or "az-cyrl-az" 
   (byte)'a', (byte)'z', (byte)'-', (byte)'c', (byte)'y', (byte)'r', (byte)'l', (byte)'-', (byte)'a', (byte)'z', 
};

// LCID values don't match the real ones in this example
static readonly int[] s_lcids = new int[]
{
   // 16bit LCID | index into string values | string value length
   0x10 | 0 << 12 | 7 << 4, // az-cyrl
   0xbb | 0 << 12 | 2 << 4, // az
};

There is a dependency from c_localeNames but that could be improved as well

marek-safar added a commit to marek-safar/runtime that referenced this issue Nov 28, 2020
to reduce SPC size by about 10k and avoid allocating huge string on the heap

Fixes dotnet#45129
marek-safar added a commit that referenced this issue Nov 30, 2020
* Optimize storage of icu locale data

to reduce SPC size by about 10k and avoid allocating huge string on the heap

Fixes #45129

* Formatting fixes
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization linkable-framework Issues associated with delivering a linker friendly framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants