-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Next - Unicode 11 support #24
base: master
Are you sure you want to change the base?
Conversation
BREAKING CHANGES use import GraphemeSplitter from ’grapheme-splitter’ instead of const GraphemeSplitter from ‘grapheme-splitter’
The helper script converts UCD GraphemeBreakProperty into JavaScript code snippet.
BREAKING CHANGES The grapheme break property is aligned with Unicode 11
@@ -12,6 +12,10 @@ | |||
} | |||
], | |||
"main": "index.js", | |||
"files": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have constrained the files
here, only distributed libraries and LICENSE/README will be distributed. The src
, scripts
, dev infrastructure babel.config.js
will not be included in our npm package. Thus we obtain an optimal node_modules footprint.
Here is the result of npm pack
:
npm notice
npm notice 📦 [email protected]
npm notice === Tarball Contents ===
npm notice 1.1kB package.json
npm notice 839B index.d.ts
npm notice 145.1kB index.js
npm notice 1.1kB LICENSE
npm notice 5.2kB README.md
npm notice === Tarball Details ===
npm notice name: grapheme-splitter
npm notice version: 1.0.4
npm notice filename: grapheme-splitter-1.0.4.tgz
npm notice package size: 32.7 kB
npm notice unpacked size: 153.3 kB
npm notice shasum: c3455c5317b8c40b340d7c7035b78edf11c561e7
npm notice integrity: sha512-p+i2AbQ0PNq/T[...]3C7vTVdWa8gLQ==
npm notice total files: 5
Sorry I am taking so long to review those, Huang. I am moving country and
starting a new job there, so very little time for side projects.
I will get back to you at first opportunity, within several days
…On Sat, Dec 8, 2018 at 07:39 Huáng Jùnliàng ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In package.json
<#24 (comment)>
:
> @@ -12,6 +12,10 @@
}
],
"main": "index.js",
+ "files": [
As we have constrict the files here, only distributed libraries and
LICENSE/README will be distributed. The src, scripts, dev infrastructure
babel.config.js will not be included in our npm package. Thus we obtain
an optimal node_modules footprint.
Here is the result of npm pack:
npm notice
npm notice 📦 ***@***.***
npm notice === Tarball Contents ===
npm notice 1.1kB package.json
npm notice 839B index.d.ts
npm notice 145.1kB index.js
npm notice 1.1kB LICENSE
npm notice 5.2kB README.md
npm notice === Tarball Details ===
npm notice name: grapheme-splitter
npm notice version: 1.0.4
npm notice filename: grapheme-splitter-1.0.4.tgz
npm notice package size: 32.7 kB
npm notice unpacked size: 153.3 kB
npm notice shasum: c3455c5317b8c40b340d7c7035b78edf11c561e7
npm notice integrity: sha512-p+i2AbQ0PNq/T[...]3C7vTVdWa8gLQ==
npm notice total files: 5
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADpbEMUDTU49TBxY8pD6xggFohBkW5AWks5u216ggaJpZM4ZJW4W>
.
|
Hi Orling,
Glad to hear from you. It is a great life milestone immigrating to a new country. I wish you live a good life there. OSS is not meant to sacrifice our personal life and I wholeheartedly comprehend your situation. Happy to see you back in the future and take care.
Regards,
黄俊亮
Huáng Jùnliàng
… On Dec 11, 2018, at 12:49 PM, Orlin Georgiev ***@***.***> wrote:
Sorry I am taking so long to review those, Huang. I am moving country and
starting a new job there, so very little time for side projects.
I will get back to you at first opportunity, within several days
On Sat, Dec 8, 2018 at 07:39 Huáng Jùnliàng ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In package.json
> <#24 (comment)>
> :
>
> > @@ -12,6 +12,10 @@
>
> }
>
> ],
>
> "main": "index.js",
>
> + "files": [
>
>
> As we have constrict the files here, only distributed libraries and
> LICENSE/README will be distributed. The src, scripts, dev infrastructure
> babel.config.js will not be included in our npm package. Thus we obtain
> an optimal node_modules footprint.
>
> Here is the result of npm pack:
>
> npm notice
>
> npm notice 📦 ***@***.***
>
> npm notice === Tarball Contents ===
>
> npm notice 1.1kB package.json
>
> npm notice 839B index.d.ts
>
> npm notice 145.1kB index.js
>
> npm notice 1.1kB LICENSE
>
> npm notice 5.2kB README.md
>
> npm notice === Tarball Details ===
>
> npm notice name: grapheme-splitter
>
> npm notice version: 1.0.4
>
> npm notice filename: grapheme-splitter-1.0.4.tgz
>
> npm notice package size: 32.7 kB
>
> npm notice unpacked size: 153.3 kB
>
> npm notice shasum: c3455c5317b8c40b340d7c7035b78edf11c561e7
>
> npm notice integrity: sha512-p+i2AbQ0PNq/T[...]3C7vTVdWa8gLQ==
>
> npm notice total files: 5
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#24 (review)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADpbEMUDTU49TBxY8pD6xggFohBkW5AWks5u216ggaJpZM4ZJW4W>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADcNdqw9JIMFQLfMLpPE7tdq738ny9UJks5u3zlbgaJpZM4ZJW4W>.
|
Hi @orling, Hoping this could be approved soon |
Better not break APIs, even for minor things. Also this pull request contains several unrelated changes, making it very time-consuming to review and risky to merge |
@orling @JLHwung thanks for the good work done by both of you. I'm trying to improve the unicode segmentation for VS Code/Monaco Editor and find this project already doing most of the work. I made some changes in my own fork https://github.com/rebornix/grapheme-splitter/tree/perf, including
My change is very unlikely to be merged into upstream and I still like to share here, just in case if you are interested. |
@orling Oh it was my code almost a year ago. I can split that into different PRs. |
@rebornix oh nice, you refactored those incredibly long if conditions. Any issues you've found with your version, or is it as accurate as the original? |
@jasonsbarr I didn't run into any weird issue in all my use cases (and it passed the test suites.) |
@orling amazing work on creating this library... it's been extremely useful for us. I was also super impressed with @JLHwung's work (and I personally agree with the recommendation to move to Typescript). I urgently needed Unicode 13 support, so I forked this pull request and have created Graphemer It includes the following:
If you'd like to discuss consolidating those efforts into this library that would work for me or if, life is getting in the way, and |
@mattpauldavies I stupidly went ahead and factored Given that these classes have no actual state and are completely pointless, I feel like these should just be functions. |
Not stupid at all @xorgy. I agree the classes don't provide any real benefit. I kept them as I wanted a direct swap and I was already using How would you feel about refactoring Graphemer to use functions? We could then either release a v2.0 (with breaking changes) or we could map the functions to a sort of proxy class that would provide backwards capability. I want to think about it a bit, but if the functions were split into separate files it would make maintenance easier. Especially updating to new Unicode versions. If that doesn't vibe with you feel free to take the Unicode 13 work and crack on! |
@mattpauldavies For my immediate use case, I ended up writing https://github.com/xorgy/grapheme-iterator from scratch instead. (though I don't suggest anyone use it right now, I'm not 100% confident in the correctness of my state machine right now, and the generator is a bit of a mess since I wrote the state machine directly from reading the spec). I think the approach is pretty good though, my
The other benefit is that none of the symbols in grapheme-iterator need to be preserved when minifying. Overall it ends up about 3500 bytes gzipped with the table, even with no name mangling. |
@mattpauldavies I think maybe I could make a CommonJS version of it (might need this myself anyway, if I want to use it from a CommonJS-based node app), and write a GraphemeSplitter interface emulator on top of that; then new GraphemeSplitter could just depend on the cleaner module. Or somebody else could do that, it's only a couple hundred lines of code, and you'd only really need to touch about a dozen of them. |
Also now instead of being just 2x faster than GraphemeSplitter, grapheme-iterator is about 22x faster. |
Note that The GraphemeSplitter interface const splitter = new GraphemeSplitter();
splitter.splitGraphemes("abcd"); // returns ["a", "b", "c", "d"] can be replaced by const segmenter = new Intl.Segmenter({granularity: "grapheme"});
[...segmenter.segment("abcd")] // returns [{segment: "a", index: 0, input: "abcd"} , ... , {segment: "d", index: 3, input: "abcd"}] @orling Consider leave a note on README and suggest transition to |
There'd still need to be a polyfill, otherwise most websites won't be able to rely on it for about 5-10 years. |
It also seems that P.S. I think grapheme-iterator is working correctly now, need to find a better test suite than the one from the Unicode Consortium (which doesn't even include examples for each GraphemeBreakProperty (!)), but when I find such a test suite I'll put it up as 1.0. |
🤔 |
The change should be a breaking change since
to
or if they are using a legacy environment
Other than that, the API is stable.
Dev Infrastructure Changes:
GraphemeBreakProperty.txt
to JavaScript snippet.emoji-data.txt
to JavaScript snippet.@orling Could you install travis to this repository so that I can setup CI? It would be good to prove that the software works as expected.