-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building with full-icu by default #19214
Comments
The binary size has always been the key issue with ICU, particularly on resource constrained devices where Node.js is expected to run. As long as we still have the ability to produce small-icu builds, my personal preference would be to build with full-icu by default. |
+1 to this, 35M to 49M is totally worth full i18n |
-1 |
@mscdex can you elaborate? i assume you're -1 because of binary size? |
One option is to get the full-icu package working without options. I need
help getting the platform part to work.
|
@devsnek Yes. |
I was strongly against full-icu when ICU was introduced but I've since made a 180 flip. Ease of use and being able to rely on it just being there outweigh the downside of a bigger binary. The binary size doesn't particularly trouble me anymore. If that was really an issue we would build without debug symbols or make them available as a separate download, but we don't. Bigger runtime memory footprint is something that should be checked. ICU is pretty parsimonious though, what I know of it. |
Yes, building the smaller variants should still be an option. Thought, I don't see platforms where a few MBs more are an issue as the primary consumers for Node.js.
That would be an improvement, but it's still an issue that users first have to discover this package, probably after wondering why i18n APIs don't work like they do in browsers. As a package author, I'm reluctant to include |
it could be a checkbox in the installer to install… or a suggested package within ubuntu, etc
couldn't it be deduped via npm?
I wasn't trying to do a bait and switch :) full by default I like for many reasons, simplicity, cultural-linguistic correctness, dropping the words 'English only' from the vocabulary , etc.. But I do understand that the size issues are real. Not insurmountable but real. |
the way i see this, node has been working with negative size constraints because it was always missing full-icu. i build with full-icu everywhere anyway and like @silverwind said the primary consumers of node are not the people who can't download 40mb binaries. that being said, we can still keep our small-icu builds and just add full-icu builds |
One option as far as download could be to have two sets of downloads (small and full). That means complexity for the build and website project, though. Some statistics: |
Yes, that's what I meant, the actual dataset. I don't think we'll need LFS or other nonstandard git extensions. Just need make sure that there are no unneccesary files in the tree. |
Two use cases for smaller binaries: docker containers, and serverless bundles (some ppl package the node binary in their Lambda so they don't need to rely on AWS' outdated versions – even more would if the size was smaller, and bundle size is a big factor in startup speed) As the maintainer of https://github.com/mhart/alpine-node I'm going to be in a bit of bind with this one – I could just force the builds to be small-icu, but then there'll be a slew of issues from users who haven't expected it to diverge from the other Linux builds. I could also offer the option for a small and a full build, but then I'm doubling the work that needs to be done each release. I'm one of the ppl that has despaired as the binary size has crept up from 18MB over the years 😞 |
This is hopefully less than double the work, but you could force the builds to be small-icu, and then have another layer(/tag) that just adds the full datafile (via |
@srl295 ooh, that's an interesting option actually. So will the |
i'm still unsure why we have to stop providing small-icu builds if we add full-icu builds. in my mind we can provide both for every release? all this talk about binary size could even get us on track for streamlining "normal" and "small" node releases with several factors such a debug symbols and icu data etc. |
@mhart the |
that's an option I mentioned in #19214 (comment) — it could certainly be done that way. |
@srl295 what I was getting was more: if this is available as a module, then I think it's worth thinking long and hard about adding it to core in the first place – considering that 95-100% of it won't be used by most node applications. @silverwind are there some compelling arguments besides "browsers have this built-in"? Are there an unusual number of issues that arise from the current level of icu support? |
@mhart There isn't a 'real' module. It's not modularizable. The full-icu npm module by side effect of installing it causes the data file to be put somewhere, and the end user still has to tell node before startup to go look for it. #3460 mitigates that a little bit, by making node look in a well known place (help needed).
Maybe, but more and more functionality is going through ICU. But as I said above and elsewhere, the pain point of package size is also very real. There's also the less tangible questions such as, how many developers' experience is impacted by lack of support out of the box for their language, such as #13907
What if the user just walked away from Node instead of filing a report? some other quotes from the above and linked issues:
|
@srl295 guess i'm very late to the party here, and this is probably off-topic, but is there a way for it to be loaded lazily? ie, post startup? Not looking for a solution for the standard distribution of the Node.js binary with this question, but more for my own knowledge to know if users could potentially install a stripped down version of node with no icu (or system icu) and then require a full icu later if they wanted. |
Define 'late' :) (old context: nodejs/node-v0.x-archive#7676 and nodejs/node-v0.x-archive#6371 )
no. that's the difficulty in #3460 - it can't be implemented as a normal module, or using javascript to do the discovery. everything must be setup before ICU is called first. The only alternative is to shut down everything that is using ICU (including all Intl objects) and unload/reload ICU. All intl objects become unusable. That seems untenable. |
@srl295 Is there anything internal that calls ICU in the normal Node.js startup though? So long as it's the first thing a user does in their Node.js script, it should be fine... right? |
I see a couple of different issues being discussed
I don't have enough data to argue for one side or the other on either of the 2 questions yet. Do we think trying to survey Node.js users through an online survey would provide good enough data to help make the decision? (FYI @dshaw, in case we want help from the user-feedback group to do a survey). I so aww @jasnell has a twitter poll to get some data but something broader might also make sense, |
right. |
OK, this isn't ready for PRime time, but i started a wip at https://github.com/srl295/node/commits/full-icu-default
maybe someone likes to make a customized node source tree that still has small icu… also, while this is 'new', it might be helpful to be able to compare both side by side. |
Hopefully full-ICU will work with https://v8.dev/features/intl-numberformat. |
Current version of Node.js does support How about this idea: Well, full feature support is better. |
It's better than that. It's not just English. Please see https://nodejs.org/api/intl.html#intl_providing_icu_data_at_runtime There's no API that tells you what the exact situation is. However, there are internal parameters that will indicate that, as well as results returned by https://www.npmjs.com/package/full-icu ( the package ) or https://github.com/srl295/btest402
There will always (well, at least for a long long time) be content not supported by the current build. The Torwali language isn't yet supported for example, so I think the best is to make it easier to get 'full icu' to more people. And so I'm working on a PR for this issue. |
@srl295 thanks and looking forward to the PR. |
icudt64l.dat compression:
Looks like bz2 is even in python 2.7 so may be worth while. (1.2Mb) |
Instead of an English-only icudt64l.dat in the repo, we now have icudt64l.dat.gz with all locales. - updated READMEs and docs - shrinker now copies source, and compresses (bzip2) the ICU data file - configure expects deps/icu-small to be full ICU with a full compressed data file Fixes: nodejs#19214 Co-Authored-By: Richard Lau <[email protected]> Co-Authored-By: Jan Olaf Krems <[email protected]> Co-Authored-By: James M Snell <[email protected]> PR-URL: nodejs#29522
I tested v13.0.1 on Windows and found that Date.toLocaleString (and related) now respect their parameters and produce expected results (they do not on v12), so I created a PR on the repo for MDN's compatibility data. The MDN compatibility table for these functions currently show "?". |
Currently, users cannot rely on full i18n support to be present cross-platform and even cross-distribution mainly because different package maintainers use different configurations for ICU and if Node.js was built with
system-icu
one still has to havelibicu
installed. Browsers on the other hand generally do support full i18n out of the box.There is the option to use the
full-icu
package but it is somewhat awkward to use as it requires a environment variable or commandline switch to work.Building with
full-icu
is currently a ~40% increase binary size (on macOS, it goes from 35M to 49M). Is this an acceptable tradeoff? I'm thinking that if we build with it, ICU data should be moved in-tree so the build does not rely on external downloads.cc: @nodejs/intl
The text was updated successfully, but these errors were encountered: