-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode update for Julia, XML, HTML, YAML, CSS and Javascript lexers. #1537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going through and adding these! It looks like it must have been a bit of a pain >_<
A concern I have is the human readability of some of these identifiers. I like having Rouge be more correct but I also worry about maintenance. Is it possible to write simpler expressions (even if they may not be completely accurate) or does that cause things to break? Either by using POSIX bracket expressions or the more descriptive character properties (e.g. \p{Word}
)?
OK, I made a small commit between to have a corrected and more accurate version and the last commit is the simplified one (more permissive on rare characters). I kept the small commit because it is a kind of flag if one day someone bumps into a problem due to simplified version (so around \p{Me} and \p{No}, not normally in more complete and longer regexes but added by \p{Word})… If you think class subtraction is not too hard to read (it seems Ruby 1.9+ allows it), I can try, but it becomes quite longer… Compare: |
@BenjaminGalliot Thanks for making those changes. This looks good and I think that reads a easier. I don't think the subtraction is necessary (at least not until we here from people hitting problems caused by it). I'll merge and this will be part of the v3.21.0 release that's scheduled to be pushed to RubyGems on Tuesday 14 July 🎉 |
@BenjaminGalliot Sorry—in my haste, I realised that this is missing examples. Could you add some simple examples to the visual samples for each language? These files are in |
Added! Thank you, @pyrmont! |
@BenjaminGalliot Thanks for all your work on this PR. I've merged it in and it will be part of v3.21.0 of Rouge. That's set for release on Tuesday 14 July. I'm not sure when GitLab will start using it but they're usually pretty good at updating to the latest gem so I don't expect it will take too long. Thanks again! 🎉 |
…#1537) Most of Rouge's lexers use rules that only match ASCII characters. This is often not strictly correct as many languages support the use of non-ASCII characters in their identifiers. This commit adds support for non-ASCII characters to the CSS, HTML, JavaScript, Julia, XML and YAML lexers. The regular expressions used are more permissive than they should be if they were to be completely correct but this is intentional. Ease of maintenance has been prioritised over syntactic correctness. Co-authored-by: Michael Camilleri <[email protected]>
Hello,
As you asked me in this issue, I now make a PR. It was my first time to do some Ruby so I tried to be as conservative as possible, and following as much as possible the documentations and recommendations.
Edit: because it was not accepted yet, I added a minor correction.
Sincerely.
(This fixes #1534.)