Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode reference version #33965

Closed
cormullion opened this issue Nov 27, 2019 · 8 comments · Fixed by #35282
Closed

Unicode reference version #33965

cormullion opened this issue Nov 27, 2019 · 8 comments · Fixed by #35282

Comments

@cormullion
Copy link
Contributor

This might be a silly question :) but ... I noticed that the 1.3 release notes mention Unicode version 12.1.0:

Support for Unicode 12.1.0

here

but the only Unicode reference I can find in this repo is version 9.0.0:

$(SRCCACHE)/UnicodeData.txt:
	@mkdir -p "$(SRCCACHE)"
	$(JLDOWNLOAD) "$@" http://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt

from here

Are there two numbering systems? I'm happy to be educated in these arcane Unicode matters, such as what determines which Unicode symbols are in and which aren't...

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Nov 27, 2019

Unicode is handled by this dependency: https://github.com/JuliaStrings/utf8proc but maybe not only and we have partially 9.0 support (and thus only that, really)? Should the version number in that file simply be updated? There is a http://www.unicode.org/Public/12.1.0/ucd/UnicodeData.txt file and it's 2% longer with change to LATIN SMALL LETTER S WITH HOOK, and adding a lot like HEBREW YOD TRIANGLE.

See at: https://github.com/JuliaLang/julia/blob/master/deps/utf8proc.mk

PCRE also handles Unicode, and maybe that's the only other dependency (then there are packages ICU.jl and possibly other, Scott's).

Possibly that file is only for "tab completion of LaTeX-like abbreviations in the Julia REPL", see here (I didn't check carefully):
https://github.com/JuliaLang/julia/blob/06fed56ea3ac3bd73ca3448f002b0c521eeb1765/doc/src/manual/unicode-input.md

@cormullion
Copy link
Contributor Author

Hi @PallHaraldsson - thanks for the explanation and links...!

So, if I understand correctly, the REPL completions files (emoji_symbols.jl and latex_symbols.jl) determine which Unicode symbols are looked for in the v9 UnicodeData.txt file, so I suppose that that must currently restrict the Julia 1.3 REPL's Emoji support to a few versions behind the current version? So there are quite a few 'current' emojis missing from Julia 1.3 because they were introduced after v9, such as:

🦝🦙🦛🦘🦡🦢🦚🦜🦟🦠🥭🥬🥯🧂🥮🦞🧁🧭🧱🛹🧳🧨🧧🥎🥏🥍🧿🧩🧸♟🧵🧶🥽🥼🥾🥿🧮🧾🧰🧲🧪🧫🧬🧴🧷🧹🧺🧻🧼🧽🧯♾🏴‍☠️🧘🏾‍♀️-🧘🏿‍♀️🦓🦒🦔🦕🦖🦗🥥🥦🥨🥩🥪🥣🥫🥟🥠🥡🥧🥤🥢🛸🛷🥌🧣🧤🧥🧦🧢🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁳󠁣󠁴󠁿🏴󠁧󠁢󠁷󠁬󠁳󠁿🦖 ...

not to mention all the diversity-oriented emojis recently released with v12.0.0 ... (😱)

The v12 file is 2000 lines longer than v9 (many of the additions are new or archaic languages).

The emoji_symbols.jl file appears to use a data file from https://github.com/iamcal/emoji-data; the current version of that (from earlier this year) supports Unicode v11 so that file would also need to be updated, along with your PR to v12, before the REPL completions can be updated to the current standard.

I don't know whether it's the official Julia policy to continuously support all the emojis in the current standard in the REPL, or whether there's any selection process... 🏚🚴‍♀️🖌

It's fun to name your Julia variables 🦖 or 🤔 though... :)

@tk3369
Copy link
Contributor

tk3369 commented Nov 29, 2019

I happened to be using emoji_symbols.jl when building a fun Slack app. It would be nice to update to the latest version from https://github.com/iamcal/emoji-data as I was not able to parse 🦃😆

@wookay
Copy link
Contributor

wookay commented Nov 29, 2019

there's a package to support the additional emoji symbols for REPL.
https://github.com/wookay/EmojiSymbols.jl

@cormullion
Copy link
Contributor Author

@wookay Nice package. Can you use any of your code there to update Julia 1.3 to the latest version?

@wookay
Copy link
Contributor

wookay commented Nov 29, 2019

@cormullion well, that package used the same code from emoji_symbols.jl file. you could get the generator.jl.

@PallHaraldsson
Copy link
Contributor

I can confirm the package does work adding emojis to the REPL, but I wouldn't say not having them or latest Unicode in the REPL means not having Unicode 12.1 support (C has not REPL by default and Perl with "good" UTF-8 support has bad REPL). You can still copy and paste these in (or use the package).

I would be most worried about runtime support, e.g. lowercase and uppercase (and I don't think they apply to emojis).

@stevengj
Copy link
Member

The UnicodeData.txt file in the Makefile is only there to look up the names of the characters produced by LaTeX-like tab-completions in the REPL in order to generate this section of the documentation. All of the current tab-completion characters are present in Unicode 9, so no one bothered to update this data file to a newer version.

This has nothing to do with the version of Unicode supported by Julia (e.g. for parsing or text processing), which is determined by utf8proc.

The emoji tab completions were added as on April Fool's day in #10709, and I don't know if they have been updated in a while. (Realize that the :foo: tab completions for emoji come from github shortcuts, as I understand it, not from the Unicode standard). It wouldn't hurt to add more recent emoji shortcuts to Base, I guess, though it's hardly essential — just because we don't have a tab completion for a character doesn't mean it's not "supported".

(Most Unicode characters will never have tab completions in the REPL. They are still supported.)

I would suggest closing this issue, as it's really not about Unicode support in Julia. If you want to open another issue to add more emoji tab completions, please go ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants