-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Parsing for Rule-Based Transliterators #3730
Conversation
Squashed commit of the following: commit cd4d43e Merge: c5ff913 5c9b605 Author: Niels Saurer <[email protected]> Date: Mon Jul 24 10:33:16 2023 +0000 Merge branch 'main' into unicodeset-consumed-bytes commit c5ff913 Author: Niels Saurer <[email protected]> Date: Sun Jul 23 14:27:25 2023 +0000 fix testcases commit a462938 Author: Niels Saurer <[email protected]> Date: Sun Jul 23 14:19:58 2023 +0000 fmt commit a4a857e Author: Niels Saurer <[email protected]> Date: Sun Jul 23 14:19:44 2023 +0000 add tests commit 6efe9ef Author: Niels Saurer <[email protected]> Date: Sun Jul 23 14:05:31 2023 +0000 remove done TODO commit 290daba Author: Niels Saurer <[email protected]> Date: Sun Jul 23 14:05:12 2023 +0000 return source-length of parsed unicodeset commit 745f07f Merge: f6a0560 ac988fd Author: Niels Saurer <[email protected]> Date: Sun Jul 23 13:41:28 2023 +0000 Merge branch 'main' into unicodeset-consumed-bytes commit f6a0560 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 13:49:08 2023 +0000 switch VariableValue to take strings as Cow commit e581cda Author: Niels Saurer <[email protected]> Date: Wed Jul 19 13:39:42 2023 +0000 fmt commit 60bb2b9 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 13:39:19 2023 +0000 use preciser internal types commit f4e4331 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 14:44:35 2023 +0000 update internal docs commit f776a11 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 14:38:12 2023 +0000 fix docs commit b3a42c1 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 14:36:24 2023 +0000 fix borrow-check errors commit 2961298 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 14:28:14 2023 +0000 fix insert errors commit 8f44ca0 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 14:07:08 2023 +0000 fix VariableMap::insert commit 73e50d2 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 13:55:04 2023 +0000 unwrap insertion error commit 430d2d9 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 13:51:26 2023 +0000 remove CharOrString from impls commit 1b13a2c Author: Niels Saurer <[email protected]> Date: Tue Jul 18 13:25:17 2023 +0000 fmt commit f44ab69 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 13:23:44 2023 +0000 remove must_use commit 72bca55 Author: Niels Saurer <[email protected]> Date: Tue Jul 18 13:18:34 2023 +0000 change VariableMap interface commit eb86194 Author: Niels Saurer <[email protected]> Date: Mon Jul 17 15:59:55 2023 +0200 fmt commit 012dc9a Author: Niels Saurer <[email protected]> Date: Mon Jul 17 15:59:44 2023 +0200 add comments for pat_ws commit 68ad0a2 Author: Niels Saurer <[email protected]> Date: Mon Jul 17 15:57:19 2023 +0200 switch away from hardcoded [:Pattern_White_Space:] data commit d854120 Author: Niels Saurer <[email protected]> Date: Mon Jul 17 15:54:36 2023 +0200 clean up API surface commit 8778e4b Author: Niels Saurer <[email protected]> Date: Fri Jul 14 18:37:44 2023 +0200 add docs to variablevalue commit 70e1535 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 18:35:46 2023 +0200 fix doc links commit 6b88a3e Author: Niels Saurer <[email protected]> Date: Fri Jul 14 18:21:27 2023 +0200 fmt commit a6c298f Author: Niels Saurer <[email protected]> Date: Fri Jul 14 18:21:13 2023 +0200 add more tests commit d6f7634 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 18:06:16 2023 +0200 fix clippy tests commit 8bfdc18 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 17:56:28 2023 +0200 fmt commit 3dc34f9 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 17:54:46 2023 +0200 add reference to allocation issue commit 7a027e6 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 17:44:07 2023 +0200 rename multi-codepoints to strings commit 052c1cf Author: Niels Saurer <[email protected]> Date: Fri Jul 14 17:35:08 2023 +0200 simplify lifetimes commit f524bf2 Merge: 641ad38 fa3e3a8 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 17:32:01 2023 +0200 Merge branch 'main' into unicodeset-new-spec commit 641ad38 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 17:27:12 2023 +0200 fmt commit 3c73fc6 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 17:27:02 2023 +0200 improve error messages commit 67f61e8 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 16:42:14 2023 +0200 add more whitespace tests commit abe8e15 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 16:25:06 2023 +0200 add docs commit 5ad7f49 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 16:19:27 2023 +0200 fix clippy commit 2eb81ac Author: Niels Saurer <[email protected]> Date: Fri Jul 14 16:03:10 2023 +0200 fmt commit b1ffbe9 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 16:02:42 2023 +0200 add multi-escapes commit 116ff5c Author: Niels Saurer <[email protected]> Date: Fri Jul 14 15:03:04 2023 +0200 rename lifetimes commit cca06a7 Author: Niels Saurer <[email protected]> Date: Fri Jul 14 14:58:53 2023 +0200 cleanup lifetimes commit 67b88f0 Author: Niels Saurer <[email protected]> Date: Thu Jul 13 22:47:28 2023 +0200 fix sortedness bug commit 44c9eff Author: Niels Saurer <[email protected]> Date: Thu Jul 13 22:00:15 2023 +0200 remove unused commit 4f525e3 Author: Niels Saurer <[email protected]> Date: Thu Jul 13 21:56:03 2023 +0200 fmt commit 85b5dae Author: Niels Saurer <[email protected]> Date: Thu Jul 13 21:55:36 2023 +0200 move to token-based main parse loop commit 409eb84 Author: Niels Saurer <[email protected]> Date: Thu Jul 13 20:21:16 2023 +0200 update comment commit b65d6cd Author: Niels Saurer <[email protected]> Date: Thu Jul 13 20:03:50 2023 +0200 remove token/lexer/ commit 9b2dd2a Author: Niels Saurer <[email protected]> Date: Thu Jul 13 20:02:25 2023 +0200 extend variable support, add more tests commit 11da89c Author: Niels Saurer <[email protected]> Date: Thu Jul 13 19:24:33 2023 +0200 fix tests completely commit 004addc Author: Niels Saurer <[email protected]> Date: Thu Jul 13 15:43:00 2023 +0000 start fixing some tests commit d1264a1 Author: Niels Saurer <[email protected]> Date: Thu Jul 13 15:35:51 2023 +0000 fmt commit 0570af9 Author: Niels Saurer <[email protected]> Date: Thu Jul 13 15:35:44 2023 +0000 pub fns that accept variables commit e1d053c Author: Niels Saurer <[email protected]> Date: Thu Jul 13 15:26:03 2023 +0000 add variables commit 274491c Merge: 8da3859 a5aa861 Author: Niels Saurer <[email protected]> Date: Wed Jul 12 17:01:00 2023 +0000 Merge branch 'remove-unicodesetbuilderoptions' into unicodeset-new-spec commit a5aa861 Author: Niels Saurer <[email protected]> Date: Wed Jul 12 16:57:36 2023 +0000 remove dupe commit 8da3859 Author: Niels Saurer <[email protected]> Date: Wed Jul 12 16:57:23 2023 +0000 wip commit 5889ebe Merge: e2fcb0f 5a31190 Author: Niels Saurer <[email protected]> Date: Wed Jul 12 16:41:20 2023 +0000 Merge branch 'remove-unicodesetbuilderoptions' into unicodeset-new-spec commit 5a31190 Author: Niels Saurer <[email protected]> Date: Wed Jul 12 16:22:03 2023 +0000 fix cargo quick commit 6c68ff9 Author: Niels Saurer <[email protected]> Date: Wed Jul 12 16:17:16 2023 +0000 fmt commit c4895b2 Author: Niels Saurer <[email protected]> Date: Wed Jul 12 16:17:02 2023 +0000 remove UnicodeSetBuilderOptions commit e2fcb0f Author: Niels Saurer <[email protected]> Date: Wed Jul 12 16:06:48 2023 +0000 wip commit 2e3a09e Merge: 37b4adf 6bf559c Author: Niels Saurer <[email protected]> Date: Wed Jul 12 15:47:32 2023 +0000 Merge branch 'main' into unicodeset-new-spec commit 37b4adf Author: Niels Saurer <[email protected]> Date: Tue Jul 4 08:44:52 2023 +0000 wip
🎉 All dependencies have been resolved ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to split this file into couple of modules, the file is a bit huge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're talking about parse.rs
?
I'm not sure how to do that best, do you have any suggestions in particular? IMO we could/should definitely try to factor out the escaping logic here and in unicodeset_parser
when polishing, as they're mostly identical, but other than that I'm not sure what's idiomatic in Rust and ICU4X.
I suppose tests and the error type could be in their own files (I'm not convinced), but I don't think the parsing logic should be split up1 because it's all contributing to exactly one entity (transliterator sources), which can't be split up.
Footnotes
-
With the exception of transliterator IDs (
simple_id
and related stuff), if we ever need that functionality isolated. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits which you could address in a follow-up. I'll merge!
Ok(BasicId { | ||
source, | ||
target, | ||
variant: variant_id.unwrap_or("".to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider making the variant optional instead of using the empty string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in #3827
^triggering CI |
commit c85e861 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:40:53 2023 +0200 borrow SingleID commit 06425a1 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:22:03 2023 +0200 fix comment indentation commit 2f70922 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:09:13 2023 +0200 update comments commit 47444ee Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:06:43 2023 +0200 fmt commit c0de3a0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:03:19 2023 +0200 fix clippy, allow testing of intermediate pass1 values commit 227f738 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:55:53 2023 +0200 fix compile errors by introducing 2 small clones per transliterator commit 512b158 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:49:01 2023 +0200 doesn't compile - missing self deconstruction commit 7848f09 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:40:51 2023 +0200 use rule group aggregation in pass1 commit 93663e4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:09:29 2023 +0200 add rule group aggregation commit 57666eb Author: Niels Saurer <[email protected]> Date: Wed Aug 9 14:12:19 2023 +0200 Squash of transliterator-compiler commit d1812b4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:31:53 2023 +0200 fix merge mistake commit f15f6eb Merge: abb91cc a39cfed Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:27:08 2023 +0200 Merge branch 'main' into transliterator-compiler commit abb91cc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:12:13 2023 +0200 reformat tests commit f6a10f5 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:30:09 2023 +0200 sizes => counts commit 9ffc2f0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:26:27 2023 +0200 add more docs commit eae5748 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:46:20 2023 +0200 remove TODO commit 6b09689 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:28:42 2023 +0200 improve docs commit c9b16d5 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:15:23 2023 +0200 clippy commit 020a677 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 22:53:14 2023 +0200 add result aggregation to first pass commit 2d1bfd7 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 16:28:23 2023 +0200 add tests commit 6f35ea5 Author: Niels Saurer <[email protected]> Date: Mon Aug 7 22:25:56 2023 +0200 CI fixes commit c6c4844 Author: Niels Saurer <[email protected]> Date: Sun Aug 6 20:06:31 2023 +0200 first steps commit fb68218 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 16:21:33 2023 +0000 Squash transliterator-parser structure for transliterator parser start parsing ':: ... ;' rules complete ::-rule parsing add more global filter tests add negative tests for '::'-rules, be more restrictive update error docs add comment about static UnicodeSet type alias add variable defs escaping and fix unicodeset handling fix unicodeset tests function calls add variable-inside-unicodesets update tests rewrite parse_section using parse_element fix unquoted literal handling add cursor/placeholder tests add cursor support add allow(unused) for this PR remove unused dependencies add todo about inefficient unicodeset variablemap handling allow usage of UnicodeSet's VariableMap directly in TransliteratorParser avoid one allocation per parsed unicodeset remove done todo about allocation-free unicodeset parser hook avoid allocations for number parsing invalid num err with offset update comment switch to allocation free hex parsing (and support for multi escapes) fix main merge conflict support \p unicodesets remove todo for \p unicodeset parsing turn low-prio todo about avoiding clones into note turn non-memory-safety safety comments into regular comments add issue number to TODOs add transliteration component crate commit a39cfed Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:19:28 2023 +0200 Add Parsing for Rule-Based Transliterators (unicode-org#3730) commit 57e9d59 Author: Andrew Cupps <[email protected]> Date: Tue Aug 8 18:53:26 2023 -0700 Resolve follow-up comments to unicode-org#3760 (unicode-org#3818) * Docs for `U` and `r` * Delete empty test and add todo * Remove old code and empty era check * Add todo
commit ae14cdc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 21:04:38 2023 +0200 clippy commit 8a14e3e Author: Niels Saurer <[email protected]> Date: Wed Aug 9 21:02:28 2023 +0200 tutorials cargo lock commit 4256873 Merge: 72cff57 f549131 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 20:56:20 2023 +0200 Merge branch 'main' into transliterator-datastruct-generation commit 72cff57 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 20:42:03 2023 +0200 refactor pass2 interface commit 8fa4dfd Author: Niels Saurer <[email protected]> Date: Wed Aug 9 20:31:29 2023 +0200 skip compilation of cursors on source side, anchors on target side commit 54b0542 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 19:09:50 2023 +0200 add comment commit cba53a7 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 19:04:27 2023 +0200 fix clippy warnings commit 2dd2ec8 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 19:01:15 2023 +0200 fmt commit 56774fe Author: Niels Saurer <[email protected]> Date: Wed Aug 9 18:45:22 2023 +0200 refactor MutVarTable commit 6176769 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 18:31:18 2023 +0200 revamp pass2 API commit f8459c9 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 18:22:47 2023 +0200 initial final data struct generation commit d6873b0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:48:41 2023 +0200 Squash of transliterator-ir commit c85e861 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:40:53 2023 +0200 borrow SingleID commit 06425a1 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:22:03 2023 +0200 fix comment indentation commit 2f70922 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:09:13 2023 +0200 update comments commit 47444ee Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:06:43 2023 +0200 fmt commit c0de3a0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:03:19 2023 +0200 fix clippy, allow testing of intermediate pass1 values commit 227f738 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:55:53 2023 +0200 fix compile errors by introducing 2 small clones per transliterator commit 512b158 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:49:01 2023 +0200 doesn't compile - missing self deconstruction commit 7848f09 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:40:51 2023 +0200 use rule group aggregation in pass1 commit 93663e4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:09:29 2023 +0200 add rule group aggregation commit 57666eb Author: Niels Saurer <[email protected]> Date: Wed Aug 9 14:12:19 2023 +0200 Squash of transliterator-compiler commit d1812b4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:31:53 2023 +0200 fix merge mistake commit f15f6eb Merge: abb91cc a39cfed Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:27:08 2023 +0200 Merge branch 'main' into transliterator-compiler commit abb91cc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:12:13 2023 +0200 reformat tests commit f6a10f5 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:30:09 2023 +0200 sizes => counts commit 9ffc2f0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:26:27 2023 +0200 add more docs commit eae5748 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:46:20 2023 +0200 remove TODO commit 6b09689 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:28:42 2023 +0200 improve docs commit c9b16d5 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:15:23 2023 +0200 clippy commit 020a677 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 22:53:14 2023 +0200 add result aggregation to first pass commit 2d1bfd7 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 16:28:23 2023 +0200 add tests commit 6f35ea5 Author: Niels Saurer <[email protected]> Date: Mon Aug 7 22:25:56 2023 +0200 CI fixes commit c6c4844 Author: Niels Saurer <[email protected]> Date: Sun Aug 6 20:06:31 2023 +0200 first steps commit fb68218 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 16:21:33 2023 +0000 Squash transliterator-parser structure for transliterator parser start parsing ':: ... ;' rules complete ::-rule parsing add more global filter tests add negative tests for '::'-rules, be more restrictive update error docs add comment about static UnicodeSet type alias add variable defs escaping and fix unicodeset handling fix unicodeset tests function calls add variable-inside-unicodesets update tests rewrite parse_section using parse_element fix unquoted literal handling add cursor/placeholder tests add cursor support add allow(unused) for this PR remove unused dependencies add todo about inefficient unicodeset variablemap handling allow usage of UnicodeSet's VariableMap directly in TransliteratorParser avoid one allocation per parsed unicodeset remove done todo about allocation-free unicodeset parser hook avoid allocations for number parsing invalid num err with offset update comment switch to allocation free hex parsing (and support for multi escapes) fix main merge conflict support \p unicodesets remove todo for \p unicodeset parsing turn low-prio todo about avoiding clones into note turn non-memory-safety safety comments into regular comments add issue number to TODOs add transliteration component crate commit a39cfed Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:19:28 2023 +0200 Add Parsing for Rule-Based Transliterators (unicode-org#3730) commit 57e9d59 Author: Andrew Cupps <[email protected]> Date: Tue Aug 8 18:53:26 2023 -0700 Resolve follow-up comments to unicode-org#3760 (unicode-org#3818) * Docs for `U` and `r` * Delete empty test and add todo * Remove old code and empty era check * Add todo commit c55c641 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 02:36:53 2023 +0200 wip commit c6cbb0a Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:20:08 2023 +0200 Squash of transliterator-compiler commit abb91cc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:12:13 2023 +0200 reformat tests commit f6a10f5 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:30:09 2023 +0200 sizes => counts commit 9ffc2f0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:26:27 2023 +0200 add more docs commit eae5748 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:46:20 2023 +0200 remove TODO commit 6b09689 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:28:42 2023 +0200 improve docs commit c9b16d5 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:15:23 2023 +0200 clippy commit 020a677 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 22:53:14 2023 +0200 add result aggregation to first pass commit 2d1bfd7 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 16:28:23 2023 +0200 add tests commit 6f35ea5 Author: Niels Saurer <[email protected]> Date: Mon Aug 7 22:25:56 2023 +0200 CI fixes commit c6c4844 Author: Niels Saurer <[email protected]> Date: Sun Aug 6 20:06:31 2023 +0200 first steps commit fb68218 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 16:21:33 2023 +0000 Squash transliterator-parser structure for transliterator parser start parsing ':: ... ;' rules complete ::-rule parsing add more global filter tests add negative tests for '::'-rules, be more restrictive update error docs add comment about static UnicodeSet type alias add variable defs escaping and fix unicodeset handling fix unicodeset tests function calls add variable-inside-unicodesets update tests rewrite parse_section using parse_element fix unquoted literal handling add cursor/placeholder tests add cursor support add allow(unused) for this PR remove unused dependencies add todo about inefficient unicodeset variablemap handling allow usage of UnicodeSet's VariableMap directly in TransliteratorParser avoid one allocation per parsed unicodeset remove done todo about allocation-free unicodeset parser hook avoid allocations for number parsing invalid num err with offset update comment switch to allocation free hex parsing (and support for multi escapes) fix main merge conflict support \p unicodesets remove todo for \p unicodeset parsing turn low-prio todo about avoiding clones into note turn non-memory-safety safety comments into regular comments add issue number to TODOs add transliteration component crate
commit 1145a17 Author: Niels Saurer <[email protected]> Date: Thu Aug 10 02:06:46 2023 +0200 Squash merge transliterator-ir commit 9d55038 Author: Niels Saurer <[email protected]> Date: Thu Aug 10 02:03:34 2023 +0200 fix push_front/push_back mixup commit dc8dda7 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 23:02:10 2023 +0200 remove empty line commit bfe5827 Merge: c85e861 f549131 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 20:57:11 2023 +0200 Merge branch 'main' into transliterator-ir commit c85e861 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:40:53 2023 +0200 borrow SingleID commit 06425a1 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:22:03 2023 +0200 fix comment indentation commit 2f70922 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:09:13 2023 +0200 update comments commit 47444ee Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:06:43 2023 +0200 fmt commit c0de3a0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:03:19 2023 +0200 fix clippy, allow testing of intermediate pass1 values commit 227f738 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:55:53 2023 +0200 fix compile errors by introducing 2 small clones per transliterator commit 512b158 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:49:01 2023 +0200 doesn't compile - missing self deconstruction commit 7848f09 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:40:51 2023 +0200 use rule group aggregation in pass1 commit 93663e4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:09:29 2023 +0200 add rule group aggregation commit 57666eb Author: Niels Saurer <[email protected]> Date: Wed Aug 9 14:12:19 2023 +0200 Squash of transliterator-compiler commit d1812b4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:31:53 2023 +0200 fix merge mistake commit f15f6eb Merge: abb91cc a39cfed Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:27:08 2023 +0200 Merge branch 'main' into transliterator-compiler commit abb91cc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:12:13 2023 +0200 reformat tests commit f6a10f5 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:30:09 2023 +0200 sizes => counts commit 9ffc2f0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:26:27 2023 +0200 add more docs commit eae5748 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:46:20 2023 +0200 remove TODO commit 6b09689 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:28:42 2023 +0200 improve docs commit c9b16d5 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:15:23 2023 +0200 clippy commit 020a677 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 22:53:14 2023 +0200 add result aggregation to first pass commit 2d1bfd7 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 16:28:23 2023 +0200 add tests commit 6f35ea5 Author: Niels Saurer <[email protected]> Date: Mon Aug 7 22:25:56 2023 +0200 CI fixes commit c6c4844 Author: Niels Saurer <[email protected]> Date: Sun Aug 6 20:06:31 2023 +0200 first steps commit fb68218 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 16:21:33 2023 +0000 Squash transliterator-parser structure for transliterator parser start parsing ':: ... ;' rules complete ::-rule parsing add more global filter tests add negative tests for '::'-rules, be more restrictive update error docs add comment about static UnicodeSet type alias add variable defs escaping and fix unicodeset handling fix unicodeset tests function calls add variable-inside-unicodesets update tests rewrite parse_section using parse_element fix unquoted literal handling add cursor/placeholder tests add cursor support add allow(unused) for this PR remove unused dependencies add todo about inefficient unicodeset variablemap handling allow usage of UnicodeSet's VariableMap directly in TransliteratorParser avoid one allocation per parsed unicodeset remove done todo about allocation-free unicodeset parser hook avoid allocations for number parsing invalid num err with offset update comment switch to allocation free hex parsing (and support for multi escapes) fix main merge conflict support \p unicodesets remove todo for \p unicodeset parsing turn low-prio todo about avoiding clones into note turn non-memory-safety safety comments into regular comments add issue number to TODOs add transliteration component crate commit 208abd7 Author: Niels Saurer <[email protected]> Date: Thu Aug 10 02:02:23 2023 +0200 add data struct generation tests commit d1f7e7c Author: Niels Saurer <[email protected]> Date: Thu Aug 10 00:58:50 2023 +0200 fix debug_assert bug commit 1f5c8dd Author: Niels Saurer <[email protected]> Date: Wed Aug 9 23:25:17 2023 +0200 refactor pass2 slightly commit ae14cdc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 21:04:38 2023 +0200 clippy commit 8a14e3e Author: Niels Saurer <[email protected]> Date: Wed Aug 9 21:02:28 2023 +0200 tutorials cargo lock commit 4256873 Merge: 72cff57 f549131 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 20:56:20 2023 +0200 Merge branch 'main' into transliterator-datastruct-generation commit 72cff57 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 20:42:03 2023 +0200 refactor pass2 interface commit 8fa4dfd Author: Niels Saurer <[email protected]> Date: Wed Aug 9 20:31:29 2023 +0200 skip compilation of cursors on source side, anchors on target side commit 54b0542 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 19:09:50 2023 +0200 add comment commit cba53a7 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 19:04:27 2023 +0200 fix clippy warnings commit 2dd2ec8 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 19:01:15 2023 +0200 fmt commit 56774fe Author: Niels Saurer <[email protected]> Date: Wed Aug 9 18:45:22 2023 +0200 refactor MutVarTable commit 6176769 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 18:31:18 2023 +0200 revamp pass2 API commit f8459c9 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 18:22:47 2023 +0200 initial final data struct generation commit d6873b0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:48:41 2023 +0200 Squash of transliterator-ir commit c85e861 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:40:53 2023 +0200 borrow SingleID commit 06425a1 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:22:03 2023 +0200 fix comment indentation commit 2f70922 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:09:13 2023 +0200 update comments commit 47444ee Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:06:43 2023 +0200 fmt commit c0de3a0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 17:03:19 2023 +0200 fix clippy, allow testing of intermediate pass1 values commit 227f738 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:55:53 2023 +0200 fix compile errors by introducing 2 small clones per transliterator commit 512b158 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:49:01 2023 +0200 doesn't compile - missing self deconstruction commit 7848f09 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:40:51 2023 +0200 use rule group aggregation in pass1 commit 93663e4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 16:09:29 2023 +0200 add rule group aggregation commit 57666eb Author: Niels Saurer <[email protected]> Date: Wed Aug 9 14:12:19 2023 +0200 Squash of transliterator-compiler commit d1812b4 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:31:53 2023 +0200 fix merge mistake commit f15f6eb Merge: abb91cc a39cfed Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:27:08 2023 +0200 Merge branch 'main' into transliterator-compiler commit abb91cc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:12:13 2023 +0200 reformat tests commit f6a10f5 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:30:09 2023 +0200 sizes => counts commit 9ffc2f0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:26:27 2023 +0200 add more docs commit eae5748 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:46:20 2023 +0200 remove TODO commit 6b09689 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:28:42 2023 +0200 improve docs commit c9b16d5 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:15:23 2023 +0200 clippy commit 020a677 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 22:53:14 2023 +0200 add result aggregation to first pass commit 2d1bfd7 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 16:28:23 2023 +0200 add tests commit 6f35ea5 Author: Niels Saurer <[email protected]> Date: Mon Aug 7 22:25:56 2023 +0200 CI fixes commit c6c4844 Author: Niels Saurer <[email protected]> Date: Sun Aug 6 20:06:31 2023 +0200 first steps commit fb68218 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 16:21:33 2023 +0000 Squash transliterator-parser structure for transliterator parser start parsing ':: ... ;' rules complete ::-rule parsing add more global filter tests add negative tests for '::'-rules, be more restrictive update error docs add comment about static UnicodeSet type alias add variable defs escaping and fix unicodeset handling fix unicodeset tests function calls add variable-inside-unicodesets update tests rewrite parse_section using parse_element fix unquoted literal handling add cursor/placeholder tests add cursor support add allow(unused) for this PR remove unused dependencies add todo about inefficient unicodeset variablemap handling allow usage of UnicodeSet's VariableMap directly in TransliteratorParser avoid one allocation per parsed unicodeset remove done todo about allocation-free unicodeset parser hook avoid allocations for number parsing invalid num err with offset update comment switch to allocation free hex parsing (and support for multi escapes) fix main merge conflict support \p unicodesets remove todo for \p unicodeset parsing turn low-prio todo about avoiding clones into note turn non-memory-safety safety comments into regular comments add issue number to TODOs add transliteration component crate commit a39cfed Author: Niels Saurer <[email protected]> Date: Wed Aug 9 13:19:28 2023 +0200 Add Parsing for Rule-Based Transliterators (unicode-org#3730) commit 57e9d59 Author: Andrew Cupps <[email protected]> Date: Tue Aug 8 18:53:26 2023 -0700 Resolve follow-up comments to unicode-org#3760 (unicode-org#3818) * Docs for `U` and `r` * Delete empty test and add todo * Remove old code and empty era check * Add todo commit c55c641 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 02:36:53 2023 +0200 wip commit c6cbb0a Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:20:08 2023 +0200 Squash of transliterator-compiler commit abb91cc Author: Niels Saurer <[email protected]> Date: Wed Aug 9 01:12:13 2023 +0200 reformat tests commit f6a10f5 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:30:09 2023 +0200 sizes => counts commit 9ffc2f0 Author: Niels Saurer <[email protected]> Date: Wed Aug 9 00:26:27 2023 +0200 add more docs commit eae5748 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:46:20 2023 +0200 remove TODO commit 6b09689 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:28:42 2023 +0200 improve docs commit c9b16d5 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 23:15:23 2023 +0200 clippy commit 020a677 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 22:53:14 2023 +0200 add result aggregation to first pass commit 2d1bfd7 Author: Niels Saurer <[email protected]> Date: Tue Aug 8 16:28:23 2023 +0200 add tests commit 6f35ea5 Author: Niels Saurer <[email protected]> Date: Mon Aug 7 22:25:56 2023 +0200 CI fixes commit c6c4844 Author: Niels Saurer <[email protected]> Date: Sun Aug 6 20:06:31 2023 +0200 first steps commit fb68218 Author: Niels Saurer <[email protected]> Date: Wed Jul 19 16:21:33 2023 +0000 Squash transliterator-parser structure for transliterator parser start parsing ':: ... ;' rules complete ::-rule parsing add more global filter tests add negative tests for '::'-rules, be more restrictive update error docs add comment about static UnicodeSet type alias add variable defs escaping and fix unicodeset handling fix unicodeset tests function calls add variable-inside-unicodesets update tests rewrite parse_section using parse_element fix unquoted literal handling add cursor/placeholder tests add cursor support add allow(unused) for this PR remove unused dependencies add todo about inefficient unicodeset variablemap handling allow usage of UnicodeSet's VariableMap directly in TransliteratorParser avoid one allocation per parsed unicodeset remove done todo about allocation-free unicodeset parser hook avoid allocations for number parsing invalid num err with offset update comment switch to allocation free hex parsing (and support for multi escapes) fix main merge conflict support \p unicodesets remove todo for \p unicodeset parsing turn low-prio todo about avoiding clones into note turn non-memory-safety safety comments into regular comments add issue number to TODOs add transliteration component crate
Adds parsing for the Transform Rule syntax. Tries to sensibly mimic ICU behavior.
Final, exposed parsing will consist of a "internal parsing > internal compilation" pipeline (see
lib.rs
). This PR adds the "internal parsing" functionality.Depends on: #3731
(cc @younies)