Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More emoji groups and subgroups #7

Open
COM8 opened this issue Mar 23, 2019 · 7 comments
Open

More emoji groups and subgroups #7

COM8 opened this issue Mar 23, 2019 · 7 comments

Comments

@COM8
Copy link
Contributor

COM8 commented Mar 23, 2019

I would suggest splitting up all emojis in their groups and subgroups like it's done here.
At the moment we only have All and Basic.

I think I could extend the emoji-importer.html for this.

@COM8
Copy link
Contributor Author

COM8 commented Mar 24, 2019

I've written a simple parser for the emoji-test.txt files.
Emoji-List-Parser

Once #6 has been merged I will create a PR that updates everything to Unicode 12.0 and adds more (sub)groups.

@mqudsi
Copy link
Member

mqudsi commented Mar 25, 2019

That's not a bad idea, but it's going to be a bit more complicated than that because the purpose of the Basic list is to provide emoji that can be displayed in an emoji picker, without variants, and more importantly, filtering for native support (currently under Windows 10).

.NET Core is of course cross-platform, so the choice of Segoe UI as the font that determines whether or not a glyph is supported is no longer a no-brainer. It should probably be factored out to either a property of the font/platform or a separate table specifically for the font.

In all cases, we definitely should extend emoji-importer.html rather than starting from scratch as it already checks for font support and, more importantly, it can be run without any installed dependencies (aside from a web browser) -- although it isn't fully automated or CI-ready as it currently requires user intervention.

@COM8
Copy link
Contributor Author

COM8 commented Mar 25, 2019

I don't really get why the step with filtering emoji is necessary.
Since people are able to define their own fonts with symbols for for example Unicode 13 before MS releases a new version of their font supporting all new emojis.

OR if you run a newer version of Windows (Insider Previews/...) you wouldn't have access to all emojis since they were not supported on your PC.

Also it makes the collection incomplete. I'm a fan of providing all possibilities to the user and he should then decide in his app if he/she likes to keep/show those unsupported emojis or not.

Regarding the the emoji-importer.html:
Sure yes we should still extend it, but I'm probably not the right guy for this.
I was tinkering around with it and since I'm not a fan of webstuff I decided to write my own one supporting all features required for more (sub)groups my self.

@mqudsi
Copy link
Member

mqudsi commented Mar 25, 2019

I'm not sure if you realized, but the filtered emoji list is only in addition to the full emoji list.

Think about when it would be necessary to show a list of all emoji (vs looking up an emoji by its unicode sequence or vice versa). The context is a native application providing an interface for a user to enter an emoji into an input by presenting a list of emoji. You would never want to show an emoji that does not render, displays as a tofu, or displays broken as two separate emoji rather than the intended single emoji.

The reason why this is precompiled into the application is that it is resource intensive to determine whether or not an emoji can be correctly rendered as a single glyph in a particular font, and there's no native way of figuring that out at runtime in a cross-platform manner without introducing some serious (unmanaged!) dependencies.

Of course no one is required or even asked to use the filtered list of emoji rather than the full list in developing your application - but the list is there for those that need such a feature.


I have some updates for the importer locally committed that I need to push out. The importer itself doesn't need a lot of work, and updating to a newer version of the Unicode spec is as simple as replacing the emoji-test file with the latest (presuming there aren't any lexical changes needed).

There's also significant logic in the importer to create the list of keywords from the names of emoji, to convert emoji names to useful and friendly symbol names, etc. all of which was only developed because it was necessary and isn't there just for show.

I'm more than happy to adapt the importer to include the subgroup info; I'm just debating whether or not to introduce separate lists for each subgroup or to include the subgroup as a property of the emoji.

@COM8
Copy link
Contributor Author

COM8 commented Mar 25, 2019

Ok thanks! I get it.

Over the day I was working on extending my parser and added group, subgroup and skinTone support to my fork of unicode.net.

I also added 10 new lists for all groups.
Adding lists for every subgroup (~97) would be a little bit too much it think.

@COM8
Copy link
Contributor Author

COM8 commented Mar 25, 2019

Btw. I think you should remove the seguiemj.ttf file from the repo since (correct me if I'm worng😉) you probably do not have the rights to publish it here since you have to buy it if you are not on Windows.

mqudsi added a commit that referenced this issue Mar 25, 2019
@mqudsi
Copy link
Member

mqudsi commented Mar 25, 2019

Nice. I just pushed some commits with major updates to the parser, including added support for group and subgroup. I haven't updated the C# assets yet.

I have a few concerns regarding keeping these lists in memory at all times. I've actually been wondering if it's not better to change the SingleEmoji instances from fields to properties so that they have ~zero memory overhead until invoked, but that leaves the question of whether or not lists that can be determined at runtime without resorting to reflection (i.e. not the font-filtered list or Emoji.All) should be generated dynamically via Linq (which the .NET Core team in general has now taken to eschewing for performance reasons).

Most likely the best compromise is for them to be dynamically generated on first access and then cached thereafter.

mqudsi added a commit that referenced this issue Mar 26, 2019
* Add items previouly elided due to name conflicts (e.g. UpButton2)
* Add group and subgroup properties to `SingleEmoji` (see #7)
* Automate code generation (supporting both web and headless node)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants