Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with full-icu by default #19214

Closed
silverwind opened this issue Mar 7, 2018 · 113 comments
Closed

Building with full-icu by default #19214

silverwind opened this issue Mar 7, 2018 · 113 comments
Assignees
Labels
feature request Issues that request new features to be added to Node.js. i18n-api Issues and PRs related to the i18n implementation.

Comments

@silverwind
Copy link
Contributor

silverwind commented Mar 7, 2018

Currently, users cannot rely on full i18n support to be present cross-platform and even cross-distribution mainly because different package maintainers use different configurations for ICU and if Node.js was built with system-icu one still has to have libicu installed. Browsers on the other hand generally do support full i18n out of the box.

There is the option to use the full-icu package but it is somewhat awkward to use as it requires a environment variable or commandline switch to work.

Building with full-icu is currently a ~40% increase binary size (on macOS, it goes from 35M to 49M). Is this an acceptable tradeoff? I'm thinking that if we build with it, ICU data should be moved in-tree so the build does not rely on external downloads.

cc: @nodejs/intl

@silverwind silverwind added feature request Issues that request new features to be added to Node.js. i18n-api Issues and PRs related to the i18n implementation. labels Mar 7, 2018
@jasnell
Copy link
Member

jasnell commented Mar 7, 2018

The binary size has always been the key issue with ICU, particularly on resource constrained devices where Node.js is expected to run. As long as we still have the ability to produce small-icu builds, my personal preference would be to build with full-icu by default.

@devsnek
Copy link
Member

devsnek commented Mar 8, 2018

+1 to this, 35M to 49M is totally worth full i18n

@mscdex
Copy link
Contributor

mscdex commented Mar 8, 2018

-1

@devsnek
Copy link
Member

devsnek commented Mar 8, 2018

@mscdex can you elaborate? i assume you're -1 because of binary size?

@srl295
Copy link
Member

srl295 commented Mar 8, 2018 via email

@srl295
Copy link
Member

srl295 commented Mar 8, 2018

#3460

@mscdex
Copy link
Contributor

mscdex commented Mar 8, 2018

@devsnek Yes.

@TimothyGu
Copy link
Member

I'm also -1 on this for the same reason as @mscdex. I'd rather find a way to fix #3460 instead.

@bnoordhuis
Copy link
Member

I was strongly against full-icu when ICU was introduced but I've since made a 180 flip. Ease of use and being able to rely on it just being there outweigh the downside of a bigger binary.

The binary size doesn't particularly trouble me anymore. If that was really an issue we would build without debug symbols or make them available as a separate download, but we don't.

Bigger runtime memory footprint is something that should be checked. ICU is pretty parsimonious though, what I know of it.

@silverwind
Copy link
Contributor Author

As long as we still have the ability to produce small-icu builds, my personal preference would be to build with full-icu by default.

Yes, building the smaller variants should still be an option. Thought, I don't see platforms where a few MBs more are an issue as the primary consumers for Node.js.

One option is to get the full-icu package working without options.

That would be an improvement, but it's still an issue that users first have to discover this package, probably after wondering why i18n APIs don't work like they do in browsers.

As a package author, I'm reluctant to include full-icu as dependency because of application size concerns (which must be carried by every application as opposed to once if it were built into Node.js).

@srl295
Copy link
Member

srl295 commented Mar 10, 2018

@silverwind

users first have to discover this package

it could be a checkbox in the installer to install… or a suggested package within ubuntu, etc

which must be carried by every application

couldn't it be deduped via npm?

I was strongly against full-icu when ICU was introduced but I've since made a 180 flip. Ease of use and being able to rely on it just being there outweigh the downside of a bigger binary.

I wasn't trying to do a bait and switch :) full by default I like for many reasons, simplicity, cultural-linguistic correctness, dropping the words 'English only' from the vocabulary , etc.. But I do understand that the size issues are real. Not insurmountable but real.

@devsnek
Copy link
Member

devsnek commented Mar 10, 2018

the way i see this, node has been working with negative size constraints because it was always missing full-icu. i build with full-icu everywhere anyway and like @silverwind said the primary consumers of node are not the people who can't download 40mb binaries. that being said, we can still keep our small-icu builds and just add full-icu builds

@srl295
Copy link
Member

srl295 commented Mar 12, 2018

ICU should be moved in-tree so the build does not rely on external downloads.

  • ICU is already in-tree, but is specially trimmed to just have small data. Part of the size issue was repo size. So, one step could be include full data in the repo (Unless there's some other clever idea - subrepo? git LFS? lossy compression?)
  • Then, make 'full' the default out of the repo when you do configure && make (Ideally, packagers and other users should be specifying --with-intl ( none, full-icu, small-icu ) already and so this only affects the default.

One option as far as download could be to have two sets of downloads (small and full). That means complexity for the build and website project, though.

Some statistics:
full-icu gets 47k downloads a month, the underlying data package icu4c-data 41k a month. In Github, 112 repos and 45 packages depend on full-icu.

@silverwind
Copy link
Contributor Author

ICU is already in-tree, but is specially trimmed to just have small data. Part of the size issue was repo size. So, one step could be include full data in the repo

Yes, that's what I meant, the actual dataset. I don't think we'll need LFS or other nonstandard git extensions. Just need make sure that there are no unneccesary files in the tree.

@mhart
Copy link
Contributor

mhart commented Mar 12, 2018

Two use cases for smaller binaries: docker containers, and serverless bundles (some ppl package the node binary in their Lambda so they don't need to rely on AWS' outdated versions – even more would if the size was smaller, and bundle size is a big factor in startup speed)

As the maintainer of https://github.com/mhart/alpine-node I'm going to be in a bit of bind with this one – I could just force the builds to be small-icu, but then there'll be a slew of issues from users who haven't expected it to diverge from the other Linux builds. I could also offer the option for a small and a full build, but then I'm doubling the work that needs to be done each release.

I'm one of the ppl that has despaired as the binary size has crept up from 18MB over the years 😞

@srl295
Copy link
Member

srl295 commented Mar 12, 2018

@mhart:

I could also offer the option for a small and a full build, but then I'm doubling the work that needs to be done each release.

This is hopefully less than double the work, but you could force the builds to be small-icu, and then have another layer(/tag) that just adds the full datafile (via npm i [-g] full-icu perhaps) and sets NODE_ICU_DATA before launching node. Then it's at least not 2x the full container size to maintain.

@mhart
Copy link
Contributor

mhart commented Mar 12, 2018

@srl295 ooh, that's an interesting option actually. So will the full-icu builds be basically 100% compatible with someone installing the full-icu npm module? In that case... I'm not sure why we even include it in node core...? 🤔

@devsnek
Copy link
Member

devsnek commented Mar 12, 2018

i'm still unsure why we have to stop providing small-icu builds if we add full-icu builds. in my mind we can provide both for every release? all this talk about binary size could even get us on track for streamlining "normal" and "small" node releases with several factors such a debug symbols and icu data etc.

@srl295
Copy link
Member

srl295 commented Mar 12, 2018

@mhart the full-icu npm module just helps you get the right binary data file. Once that's there, node starts up and uses that data, rather than the embedded English-only content. A --with-intl=full-icu build on the other hand builds ICU normally, and doesn't strip out the non-English content.

@srl295
Copy link
Member

srl295 commented Mar 12, 2018

@devsnek

in my mind we can provide both for every release?

that's an option I mentioned in #19214 (comment) — it could certainly be done that way.

@mhart
Copy link
Contributor

mhart commented Mar 12, 2018

@srl295 what I was getting was more: if this is available as a module, then I think it's worth thinking long and hard about adding it to core in the first place – considering that 95-100% of it won't be used by most node applications.

@silverwind are there some compelling arguments besides "browsers have this built-in"? Are there an unusual number of issues that arise from the current level of icu support?

@srl295
Copy link
Member

srl295 commented Mar 12, 2018

@mhart There isn't a 'real' module. It's not modularizable. The full-icu npm module by side effect of installing it causes the data file to be put somewhere, and the end user still has to tell node before startup to go look for it. #3460 mitigates that a little bit, by making node look in a well known place (help needed).

considering that 95-100% of it won't be used by most node applications.

Maybe, but more and more functionality is going through ICU. But as I said above and elsewhere, the pain point of package size is also very real.

There's also the less tangible questions such as, how many developers' experience is impacted by lack of support out of the box for their language, such as #13907

Expected result (from Chrome 58.0.3029.110): '25 июня 2017 г.'
Actual result:

v8.1.2: '2017 M06 25'
v7.9.0: 'June 25, 2017

What if the user just walked away from Node instead of filing a report? some other quotes from the above and linked issues:

Edit: nevermind, from searching around I discovered that the ICU auto-detection thing is sorta bunk and that I need to explicitly point Node at the ICU package.

Thanks for info. But there's a problem with full-icu package: nodejs/full-icu-npm#6 (comment)
Node v8 not auto detects it..

@mhart
Copy link
Contributor

mhart commented Mar 12, 2018

@srl295 guess i'm very late to the party here, and this is probably off-topic, but is there a way for it to be loaded lazily? ie, post startup? Not looking for a solution for the standard distribution of the Node.js binary with this question, but more for my own knowledge to know if users could potentially install a stripped down version of node with no icu (or system icu) and then require a full icu later if they wanted.

@srl295
Copy link
Member

srl295 commented Mar 12, 2018

Define 'late' :) (old context: nodejs/node-v0.x-archive#7676 and nodejs/node-v0.x-archive#6371 )

is there a way for it to be loaded lazily? ie, post startup?

no.

that's the difficulty in #3460 - it can't be implemented as a normal module, or using javascript to do the discovery. everything must be setup before ICU is called first. The only alternative is to shut down everything that is using ICU (including all Intl objects) and unload/reload ICU. All intl objects become unusable. That seems untenable.

@mhart
Copy link
Contributor

mhart commented Mar 12, 2018

@srl295 Is there anything internal that calls ICU in the normal Node.js startup though? So long as it's the first thing a user does in their Node.js script, it should be fine... right?

@mhdawson
Copy link
Member

I see a couple of different issues being discussed

  1. What should be the default? Should full-icu be the default because otherwise, we may lose people because the initial experience is not as good or is it ok to have people do an extra step to download/opt into full icu.

  2. Is download size an issue. If download size is not a problem, but the installation size is, then a larger download binary followed by stripping out of the ICU data file might be an option. If download size does matter, then the only way to make full ICU the default but retain the option for smaller downloads is 2 downloads. That would be more work although we might be able to build/test once and simply package twice by creating one package with the ICU data file and one without.

I don't have enough data to argue for one side or the other on either of the 2 questions yet.

Do we think trying to survey Node.js users through an online survey would provide good enough data to help make the decision? (FYI @dshaw, in case we want help from the user-feedback group to do a survey). I so aww @jasnell has a twitter poll to get some data but something broader might also make sense,

@srl295
Copy link
Member

srl295 commented Aug 23, 2019

building/providing more than 1 binary would be a significant additional overhead on the build/release side

right.

@srl295
Copy link
Member

srl295 commented Aug 23, 2019

OK, this isn't ready for PRime time, but i started a wip at https://github.com/srl295/node/commits/full-icu-default

  • full-icu is the default ( 🎉 )
  • gzipped datafile (10M compressed, vs 3M old english-only file) gets temporarily uncompressed to deps/icu-tmp/icudt64l.dat
  • i think i left in the ability to build small-icu still (though it may require an (unnecessary) re-download of icu

maybe someone likes to make a customized node source tree that still has small icu… also, while this is 'new', it might be helpful to be able to compare both side by side.

@gengjiawen
Copy link
Member

Hopefully full-ICU will work with https://v8.dev/features/intl-numberformat.

@libook
Copy link

libook commented Sep 6, 2019

Current version of Node.js does support locale argument, but only supports English. This is tricky to check.

How about this idea:
For developers who expect to get the same browser-like results on Node.js(I am one of them);
shows a warning message while calling methods with locale argument which is not supported by current build.

Well, full feature support is better.

@srl295
Copy link
Member

srl295 commented Sep 6, 2019

Current version of Node.js does support locale argument, but only supports English. This is tricky to check.

It's better than that. It's not just English. Please see https://nodejs.org/api/intl.html#intl_providing_icu_data_at_runtime

There's no API that tells you what the exact situation is. However, there are internal parameters that will indicate that, as well as results returned by https://www.npmjs.com/package/full-icu ( the package ) or https://github.com/srl295/btest402

How about this idea:
For developers who expect to get the same browser-like results on Node.js(I am one of them);
shows a warning message while calling methods with locale argument which is not supported by current build.

There will always (well, at least for a long long time) be content not supported by the current build. The Torwali language isn't yet supported for example, so trw won't be supported. So I don't think a warning is needed.

I think the best is to make it easier to get 'full icu' to more people. And so I'm working on a PR for this issue.

@mhdawson
Copy link
Member

mhdawson commented Sep 6, 2019

@srl295 thanks and looking forward to the PR.

@srl295
Copy link
Member

srl295 commented Sep 12, 2019

icudt64l.dat compression:

type Size %
none 27,531,792 0%
compress 16,359,873 68%
gzip 11,007,314 60%
bzip2 9,781,482 64%
xz 6,669,432 76%

Looks like bz2 is even in python 2.7 so may be worth while. (1.2Mb)

srl295 added a commit to srl295/node that referenced this issue Sep 30, 2019
Instead of an English-only icudt64l.dat in the repo,
we now have icudt64l.dat.gz with all locales.

- updated READMEs and docs
- shrinker now copies source, and compresses (bzip2) the ICU data file
- configure expects deps/icu-small to be full ICU with a full
compressed data file

Fixes: nodejs#19214
Co-Authored-By: Richard Lau <[email protected]>
Co-Authored-By: Jan Olaf Krems <[email protected]>
Co-Authored-By: James M Snell <[email protected]>
PR-URL: nodejs#29522
@Trott Trott closed this as completed in 1a25e90 Oct 3, 2019
@GitTom
Copy link

GitTom commented Oct 29, 2019

I tested v13.0.1 on Windows and found that Date.toLocaleString (and related) now respect their parameters and produce expected results (they do not on v12), so I created a PR on the repo for MDN's compatibility data.

mdn/browser-compat-data#5068

The MDN compatibility table for these functions currently show "?".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. i18n-api Issues and PRs related to the i18n implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.