Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parsing for Rule-Based Transliterators #3730

Merged
merged 38 commits into from
Aug 9, 2023

Conversation

skius
Copy link
Member

@skius skius commented Jul 24, 2023

Adds parsing for the Transform Rule syntax. Tries to sensibly mimic ICU behavior.

Final, exposed parsing will consist of a "internal parsing > internal compilation" pipeline (see lib.rs). This PR adds the "internal parsing" functionality.

Depends on: #3731

(cc @younies)

skius added 29 commits July 24, 2023 15:35
Squashed commit of the following:

commit cd4d43e
Merge: c5ff913 5c9b605
Author: Niels Saurer <[email protected]>
Date:   Mon Jul 24 10:33:16 2023 +0000

    Merge branch 'main' into unicodeset-consumed-bytes

commit c5ff913
Author: Niels Saurer <[email protected]>
Date:   Sun Jul 23 14:27:25 2023 +0000

    fix testcases

commit a462938
Author: Niels Saurer <[email protected]>
Date:   Sun Jul 23 14:19:58 2023 +0000

    fmt

commit a4a857e
Author: Niels Saurer <[email protected]>
Date:   Sun Jul 23 14:19:44 2023 +0000

    add tests

commit 6efe9ef
Author: Niels Saurer <[email protected]>
Date:   Sun Jul 23 14:05:31 2023 +0000

    remove done TODO

commit 290daba
Author: Niels Saurer <[email protected]>
Date:   Sun Jul 23 14:05:12 2023 +0000

    return source-length of parsed unicodeset

commit 745f07f
Merge: f6a0560 ac988fd
Author: Niels Saurer <[email protected]>
Date:   Sun Jul 23 13:41:28 2023 +0000

    Merge branch 'main' into unicodeset-consumed-bytes

commit f6a0560
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 19 13:49:08 2023 +0000

    switch VariableValue to take strings as Cow

commit e581cda
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 19 13:39:42 2023 +0000

    fmt

commit 60bb2b9
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 19 13:39:19 2023 +0000

    use preciser internal types

commit f4e4331
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 14:44:35 2023 +0000

    update internal docs

commit f776a11
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 14:38:12 2023 +0000

    fix docs

commit b3a42c1
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 14:36:24 2023 +0000

    fix borrow-check errors

commit 2961298
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 14:28:14 2023 +0000

    fix insert errors

commit 8f44ca0
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 14:07:08 2023 +0000

    fix VariableMap::insert

commit 73e50d2
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 13:55:04 2023 +0000

    unwrap insertion error

commit 430d2d9
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 13:51:26 2023 +0000

    remove CharOrString from impls

commit 1b13a2c
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 13:25:17 2023 +0000

    fmt

commit f44ab69
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 13:23:44 2023 +0000

    remove must_use

commit 72bca55
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 18 13:18:34 2023 +0000

    change VariableMap interface

commit eb86194
Author: Niels Saurer <[email protected]>
Date:   Mon Jul 17 15:59:55 2023 +0200

    fmt

commit 012dc9a
Author: Niels Saurer <[email protected]>
Date:   Mon Jul 17 15:59:44 2023 +0200

    add comments for pat_ws

commit 68ad0a2
Author: Niels Saurer <[email protected]>
Date:   Mon Jul 17 15:57:19 2023 +0200

    switch away from hardcoded [:Pattern_White_Space:] data

commit d854120
Author: Niels Saurer <[email protected]>
Date:   Mon Jul 17 15:54:36 2023 +0200

    clean up API surface

commit 8778e4b
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 18:37:44 2023 +0200

    add docs to variablevalue

commit 70e1535
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 18:35:46 2023 +0200

    fix doc links

commit 6b88a3e
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 18:21:27 2023 +0200

    fmt

commit a6c298f
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 18:21:13 2023 +0200

    add more tests

commit d6f7634
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 18:06:16 2023 +0200

    fix clippy tests

commit 8bfdc18
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 17:56:28 2023 +0200

    fmt

commit 3dc34f9
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 17:54:46 2023 +0200

    add reference to allocation issue

commit 7a027e6
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 17:44:07 2023 +0200

    rename multi-codepoints to strings

commit 052c1cf
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 17:35:08 2023 +0200

    simplify lifetimes

commit f524bf2
Merge: 641ad38 fa3e3a8
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 17:32:01 2023 +0200

    Merge branch 'main' into unicodeset-new-spec

commit 641ad38
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 17:27:12 2023 +0200

    fmt

commit 3c73fc6
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 17:27:02 2023 +0200

    improve error messages

commit 67f61e8
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 16:42:14 2023 +0200

    add more whitespace tests

commit abe8e15
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 16:25:06 2023 +0200

    add docs

commit 5ad7f49
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 16:19:27 2023 +0200

    fix clippy

commit 2eb81ac
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 16:03:10 2023 +0200

    fmt

commit b1ffbe9
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 16:02:42 2023 +0200

    add multi-escapes

commit 116ff5c
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 15:03:04 2023 +0200

    rename lifetimes

commit cca06a7
Author: Niels Saurer <[email protected]>
Date:   Fri Jul 14 14:58:53 2023 +0200

    cleanup lifetimes

commit 67b88f0
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 22:47:28 2023 +0200

    fix sortedness bug

commit 44c9eff
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 22:00:15 2023 +0200

    remove unused

commit 4f525e3
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 21:56:03 2023 +0200

    fmt

commit 85b5dae
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 21:55:36 2023 +0200

    move to token-based main parse loop

commit 409eb84
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 20:21:16 2023 +0200

    update comment

commit b65d6cd
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 20:03:50 2023 +0200

    remove token/lexer/

commit 9b2dd2a
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 20:02:25 2023 +0200

    extend variable support, add more tests

commit 11da89c
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 19:24:33 2023 +0200

    fix tests completely

commit 004addc
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 15:43:00 2023 +0000

    start fixing some tests

commit d1264a1
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 15:35:51 2023 +0000

    fmt

commit 0570af9
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 15:35:44 2023 +0000

    pub fns that accept variables

commit e1d053c
Author: Niels Saurer <[email protected]>
Date:   Thu Jul 13 15:26:03 2023 +0000

    add variables

commit 274491c
Merge: 8da3859 a5aa861
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 17:01:00 2023 +0000

    Merge branch 'remove-unicodesetbuilderoptions' into unicodeset-new-spec

commit a5aa861
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 16:57:36 2023 +0000

    remove dupe

commit 8da3859
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 16:57:23 2023 +0000

    wip

commit 5889ebe
Merge: e2fcb0f 5a31190
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 16:41:20 2023 +0000

    Merge branch 'remove-unicodesetbuilderoptions' into unicodeset-new-spec

commit 5a31190
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 16:22:03 2023 +0000

    fix cargo quick

commit 6c68ff9
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 16:17:16 2023 +0000

    fmt

commit c4895b2
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 16:17:02 2023 +0000

    remove UnicodeSetBuilderOptions

commit e2fcb0f
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 16:06:48 2023 +0000

    wip

commit 2e3a09e
Merge: 37b4adf 6bf559c
Author: Niels Saurer <[email protected]>
Date:   Wed Jul 12 15:47:32 2023 +0000

    Merge branch 'main' into unicodeset-new-spec

commit 37b4adf
Author: Niels Saurer <[email protected]>
Date:   Tue Jul 4 08:44:52 2023 +0000

    wip
@dpulls
Copy link

dpulls bot commented Jul 24, 2023

🎉 All dependencies have been resolved !

@skius skius mentioned this pull request Jul 25, 2023
41 tasks
@sffc sffc requested a review from younies July 25, 2023 15:34
younies
younies previously approved these changes Aug 7, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to split this file into couple of modules, the file is a bit huge

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're talking about parse.rs?

I'm not sure how to do that best, do you have any suggestions in particular? IMO we could/should definitely try to factor out the escaping logic here and in unicodeset_parser when polishing, as they're mostly identical, but other than that I'm not sure what's idiomatic in Rust and ICU4X.

I suppose tests and the error type could be in their own files (I'm not convinced), but I don't think the parsing logic should be split up1 because it's all contributing to exactly one entity (transliterator sources), which can't be split up.

Footnotes

  1. With the exception of transliterator IDs (simple_id and related stuff), if we ever need that functionality isolated.

robertbastian
robertbastian previously approved these changes Aug 8, 2023
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits which you could address in a follow-up. I'll merge!

Ok(BasicId {
source,
target,
variant: variant_id.unwrap_or("".to_string()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider making the variant optional instead of using the empty string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #3827

@robertbastian
Copy link
Member

^triggering CI

@skius skius dismissed stale reviews from robertbastian and younies via 4217778 August 8, 2023 15:51
robertbastian
robertbastian previously approved these changes Aug 8, 2023
@robertbastian robertbastian merged commit a39cfed into unicode-org:main Aug 9, 2023
25 checks passed
skius added a commit to skius/icu4x that referenced this pull request Aug 9, 2023
commit c85e861
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 17:40:53 2023 +0200

    borrow SingleID

commit 06425a1
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 17:22:03 2023 +0200

    fix comment indentation

commit 2f70922
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 17:09:13 2023 +0200

    update comments

commit 47444ee
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 17:06:43 2023 +0200

    fmt

commit c0de3a0
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 17:03:19 2023 +0200

    fix clippy, allow testing of intermediate pass1 values

commit 227f738
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 16:55:53 2023 +0200

    fix compile errors by introducing 2 small clones per transliterator

commit 512b158
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 16:49:01 2023 +0200

    doesn't compile - missing self deconstruction

commit 7848f09
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 16:40:51 2023 +0200

    use rule group aggregation in pass1

commit 93663e4
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 16:09:29 2023 +0200

    add rule group aggregation

commit 57666eb
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 14:12:19 2023 +0200

    Squash of transliterator-compiler

    commit d1812b4
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 13:31:53 2023 +0200

        fix merge mistake

    commit f15f6eb
    Merge: abb91cc a39cfed
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 13:27:08 2023 +0200

        Merge branch 'main' into transliterator-compiler

    commit abb91cc
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 01:12:13 2023 +0200

        reformat tests

    commit f6a10f5
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 00:30:09 2023 +0200

        sizes => counts

    commit 9ffc2f0
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 00:26:27 2023 +0200

        add more docs

    commit eae5748
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:46:20 2023 +0200

        remove TODO

    commit 6b09689
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:28:42 2023 +0200

        improve docs

    commit c9b16d5
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:15:23 2023 +0200

        clippy

    commit 020a677
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 22:53:14 2023 +0200

        add result aggregation to first pass

    commit 2d1bfd7
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 16:28:23 2023 +0200

        add tests

    commit 6f35ea5
    Author: Niels Saurer <[email protected]>
    Date:   Mon Aug 7 22:25:56 2023 +0200

        CI fixes

    commit c6c4844
    Author: Niels Saurer <[email protected]>
    Date:   Sun Aug 6 20:06:31 2023 +0200

        first steps

    commit fb68218
    Author: Niels Saurer <[email protected]>
    Date:   Wed Jul 19 16:21:33 2023 +0000

        Squash transliterator-parser

        structure for transliterator parser

        start parsing ':: ... ;' rules

        complete ::-rule parsing

        add more global filter tests

        add negative tests for '::'-rules, be more restrictive

        update error docs

        add comment about static UnicodeSet type alias

        add variable defs

        escaping and fix unicodeset handling

        fix unicodeset tests

        function calls

        add variable-inside-unicodesets

        update tests

        rewrite parse_section using parse_element

        fix unquoted literal handling

        add cursor/placeholder tests

        add cursor support

        add allow(unused) for this PR

        remove unused dependencies

        add todo about inefficient unicodeset variablemap handling

        allow usage of UnicodeSet's VariableMap directly in TransliteratorParser

        avoid one allocation per parsed unicodeset

        remove done todo about allocation-free unicodeset parser hook

        avoid allocations for number parsing

        invalid num err with offset

        update comment

        switch to allocation free hex parsing (and support for multi escapes)

        fix main merge conflict

        support \p unicodesets

        remove todo for \p unicodeset parsing

        turn low-prio todo about avoiding clones into note

        turn non-memory-safety safety comments into regular comments

        add issue number to TODOs

        add transliteration component crate

commit a39cfed
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 13:19:28 2023 +0200

    Add Parsing for Rule-Based Transliterators (unicode-org#3730)

commit 57e9d59
Author: Andrew Cupps <[email protected]>
Date:   Tue Aug 8 18:53:26 2023 -0700

    Resolve follow-up comments to unicode-org#3760 (unicode-org#3818)

    * Docs for `U` and `r`

    * Delete empty test and add todo

    * Remove old code and empty era check

    * Add todo
skius added a commit to skius/icu4x that referenced this pull request Aug 9, 2023
commit ae14cdc
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 21:04:38 2023 +0200

    clippy

commit 8a14e3e
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 21:02:28 2023 +0200

    tutorials cargo lock

commit 4256873
Merge: 72cff57 f549131
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 20:56:20 2023 +0200

    Merge branch 'main' into transliterator-datastruct-generation

commit 72cff57
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 20:42:03 2023 +0200

    refactor pass2 interface

commit 8fa4dfd
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 20:31:29 2023 +0200

    skip compilation of cursors on source side, anchors on target side

commit 54b0542
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 19:09:50 2023 +0200

    add comment

commit cba53a7
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 19:04:27 2023 +0200

    fix clippy warnings

commit 2dd2ec8
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 19:01:15 2023 +0200

    fmt

commit 56774fe
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 18:45:22 2023 +0200

    refactor MutVarTable

commit 6176769
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 18:31:18 2023 +0200

    revamp pass2 API

commit f8459c9
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 18:22:47 2023 +0200

    initial final data struct generation

commit d6873b0
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 17:48:41 2023 +0200

    Squash of transliterator-ir

    commit c85e861
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:40:53 2023 +0200

        borrow SingleID

    commit 06425a1
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:22:03 2023 +0200

        fix comment indentation

    commit 2f70922
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:09:13 2023 +0200

        update comments

    commit 47444ee
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:06:43 2023 +0200

        fmt

    commit c0de3a0
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:03:19 2023 +0200

        fix clippy, allow testing of intermediate pass1 values

    commit 227f738
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:55:53 2023 +0200

        fix compile errors by introducing 2 small clones per transliterator

    commit 512b158
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:49:01 2023 +0200

        doesn't compile - missing self deconstruction

    commit 7848f09
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:40:51 2023 +0200

        use rule group aggregation in pass1

    commit 93663e4
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:09:29 2023 +0200

        add rule group aggregation

    commit 57666eb
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 14:12:19 2023 +0200

        Squash of transliterator-compiler

        commit d1812b4
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 13:31:53 2023 +0200

            fix merge mistake

        commit f15f6eb
        Merge: abb91cc a39cfed
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 13:27:08 2023 +0200

            Merge branch 'main' into transliterator-compiler

        commit abb91cc
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 01:12:13 2023 +0200

            reformat tests

        commit f6a10f5
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 00:30:09 2023 +0200

            sizes => counts

        commit 9ffc2f0
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 00:26:27 2023 +0200

            add more docs

        commit eae5748
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:46:20 2023 +0200

            remove TODO

        commit 6b09689
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:28:42 2023 +0200

            improve docs

        commit c9b16d5
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:15:23 2023 +0200

            clippy

        commit 020a677
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 22:53:14 2023 +0200

            add result aggregation to first pass

        commit 2d1bfd7
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 16:28:23 2023 +0200

            add tests

        commit 6f35ea5
        Author: Niels Saurer <[email protected]>
        Date:   Mon Aug 7 22:25:56 2023 +0200

            CI fixes

        commit c6c4844
        Author: Niels Saurer <[email protected]>
        Date:   Sun Aug 6 20:06:31 2023 +0200

            first steps

        commit fb68218
        Author: Niels Saurer <[email protected]>
        Date:   Wed Jul 19 16:21:33 2023 +0000

            Squash transliterator-parser

            structure for transliterator parser

            start parsing ':: ... ;' rules

            complete ::-rule parsing

            add more global filter tests

            add negative tests for '::'-rules, be more restrictive

            update error docs

            add comment about static UnicodeSet type alias

            add variable defs

            escaping and fix unicodeset handling

            fix unicodeset tests

            function calls

            add variable-inside-unicodesets

            update tests

            rewrite parse_section using parse_element

            fix unquoted literal handling

            add cursor/placeholder tests

            add cursor support

            add allow(unused) for this PR

            remove unused dependencies

            add todo about inefficient unicodeset variablemap handling

            allow usage of UnicodeSet's VariableMap directly in TransliteratorParser

            avoid one allocation per parsed unicodeset

            remove done todo about allocation-free unicodeset parser hook

            avoid allocations for number parsing

            invalid num err with offset

            update comment

            switch to allocation free hex parsing (and support for multi escapes)

            fix main merge conflict

            support \p unicodesets

            remove todo for \p unicodeset parsing

            turn low-prio todo about avoiding clones into note

            turn non-memory-safety safety comments into regular comments

            add issue number to TODOs

            add transliteration component crate

    commit a39cfed
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 13:19:28 2023 +0200

        Add Parsing for Rule-Based Transliterators (unicode-org#3730)

    commit 57e9d59
    Author: Andrew Cupps <[email protected]>
    Date:   Tue Aug 8 18:53:26 2023 -0700

        Resolve follow-up comments to unicode-org#3760 (unicode-org#3818)

        * Docs for `U` and `r`

        * Delete empty test and add todo

        * Remove old code and empty era check

        * Add todo

commit c55c641
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 02:36:53 2023 +0200

    wip

commit c6cbb0a
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 01:20:08 2023 +0200

    Squash of transliterator-compiler

    commit abb91cc
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 01:12:13 2023 +0200

        reformat tests

    commit f6a10f5
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 00:30:09 2023 +0200

        sizes => counts

    commit 9ffc2f0
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 00:26:27 2023 +0200

        add more docs

    commit eae5748
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:46:20 2023 +0200

        remove TODO

    commit 6b09689
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:28:42 2023 +0200

        improve docs

    commit c9b16d5
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:15:23 2023 +0200

        clippy

    commit 020a677
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 22:53:14 2023 +0200

        add result aggregation to first pass

    commit 2d1bfd7
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 16:28:23 2023 +0200

        add tests

    commit 6f35ea5
    Author: Niels Saurer <[email protected]>
    Date:   Mon Aug 7 22:25:56 2023 +0200

        CI fixes

    commit c6c4844
    Author: Niels Saurer <[email protected]>
    Date:   Sun Aug 6 20:06:31 2023 +0200

        first steps

    commit fb68218
    Author: Niels Saurer <[email protected]>
    Date:   Wed Jul 19 16:21:33 2023 +0000

        Squash transliterator-parser

        structure for transliterator parser

        start parsing ':: ... ;' rules

        complete ::-rule parsing

        add more global filter tests

        add negative tests for '::'-rules, be more restrictive

        update error docs

        add comment about static UnicodeSet type alias

        add variable defs

        escaping and fix unicodeset handling

        fix unicodeset tests

        function calls

        add variable-inside-unicodesets

        update tests

        rewrite parse_section using parse_element

        fix unquoted literal handling

        add cursor/placeholder tests

        add cursor support

        add allow(unused) for this PR

        remove unused dependencies

        add todo about inefficient unicodeset variablemap handling

        allow usage of UnicodeSet's VariableMap directly in TransliteratorParser

        avoid one allocation per parsed unicodeset

        remove done todo about allocation-free unicodeset parser hook

        avoid allocations for number parsing

        invalid num err with offset

        update comment

        switch to allocation free hex parsing (and support for multi escapes)

        fix main merge conflict

        support \p unicodesets

        remove todo for \p unicodeset parsing

        turn low-prio todo about avoiding clones into note

        turn non-memory-safety safety comments into regular comments

        add issue number to TODOs

        add transliteration component crate
skius added a commit to skius/icu4x that referenced this pull request Aug 10, 2023
commit 1145a17
Author: Niels Saurer <[email protected]>
Date:   Thu Aug 10 02:06:46 2023 +0200

    Squash merge transliterator-ir

    commit 9d55038
    Author: Niels Saurer <[email protected]>
    Date:   Thu Aug 10 02:03:34 2023 +0200

        fix push_front/push_back mixup

    commit dc8dda7
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 23:02:10 2023 +0200

        remove empty line

    commit bfe5827
    Merge: c85e861 f549131
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 20:57:11 2023 +0200

        Merge branch 'main' into transliterator-ir

    commit c85e861
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:40:53 2023 +0200

        borrow SingleID

    commit 06425a1
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:22:03 2023 +0200

        fix comment indentation

    commit 2f70922
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:09:13 2023 +0200

        update comments

    commit 47444ee
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:06:43 2023 +0200

        fmt

    commit c0de3a0
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:03:19 2023 +0200

        fix clippy, allow testing of intermediate pass1 values

    commit 227f738
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:55:53 2023 +0200

        fix compile errors by introducing 2 small clones per transliterator

    commit 512b158
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:49:01 2023 +0200

        doesn't compile - missing self deconstruction

    commit 7848f09
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:40:51 2023 +0200

        use rule group aggregation in pass1

    commit 93663e4
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:09:29 2023 +0200

        add rule group aggregation

    commit 57666eb
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 14:12:19 2023 +0200

        Squash of transliterator-compiler

        commit d1812b4
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 13:31:53 2023 +0200

            fix merge mistake

        commit f15f6eb
        Merge: abb91cc a39cfed
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 13:27:08 2023 +0200

            Merge branch 'main' into transliterator-compiler

        commit abb91cc
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 01:12:13 2023 +0200

            reformat tests

        commit f6a10f5
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 00:30:09 2023 +0200

            sizes => counts

        commit 9ffc2f0
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 00:26:27 2023 +0200

            add more docs

        commit eae5748
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:46:20 2023 +0200

            remove TODO

        commit 6b09689
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:28:42 2023 +0200

            improve docs

        commit c9b16d5
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:15:23 2023 +0200

            clippy

        commit 020a677
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 22:53:14 2023 +0200

            add result aggregation to first pass

        commit 2d1bfd7
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 16:28:23 2023 +0200

            add tests

        commit 6f35ea5
        Author: Niels Saurer <[email protected]>
        Date:   Mon Aug 7 22:25:56 2023 +0200

            CI fixes

        commit c6c4844
        Author: Niels Saurer <[email protected]>
        Date:   Sun Aug 6 20:06:31 2023 +0200

            first steps

        commit fb68218
        Author: Niels Saurer <[email protected]>
        Date:   Wed Jul 19 16:21:33 2023 +0000

            Squash transliterator-parser

            structure for transliterator parser

            start parsing ':: ... ;' rules

            complete ::-rule parsing

            add more global filter tests

            add negative tests for '::'-rules, be more restrictive

            update error docs

            add comment about static UnicodeSet type alias

            add variable defs

            escaping and fix unicodeset handling

            fix unicodeset tests

            function calls

            add variable-inside-unicodesets

            update tests

            rewrite parse_section using parse_element

            fix unquoted literal handling

            add cursor/placeholder tests

            add cursor support

            add allow(unused) for this PR

            remove unused dependencies

            add todo about inefficient unicodeset variablemap handling

            allow usage of UnicodeSet's VariableMap directly in TransliteratorParser

            avoid one allocation per parsed unicodeset

            remove done todo about allocation-free unicodeset parser hook

            avoid allocations for number parsing

            invalid num err with offset

            update comment

            switch to allocation free hex parsing (and support for multi escapes)

            fix main merge conflict

            support \p unicodesets

            remove todo for \p unicodeset parsing

            turn low-prio todo about avoiding clones into note

            turn non-memory-safety safety comments into regular comments

            add issue number to TODOs

            add transliteration component crate

commit 208abd7
Author: Niels Saurer <[email protected]>
Date:   Thu Aug 10 02:02:23 2023 +0200

    add data struct generation tests

commit d1f7e7c
Author: Niels Saurer <[email protected]>
Date:   Thu Aug 10 00:58:50 2023 +0200

    fix debug_assert bug

commit 1f5c8dd
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 23:25:17 2023 +0200

    refactor pass2 slightly

commit ae14cdc
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 21:04:38 2023 +0200

    clippy

commit 8a14e3e
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 21:02:28 2023 +0200

    tutorials cargo lock

commit 4256873
Merge: 72cff57 f549131
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 20:56:20 2023 +0200

    Merge branch 'main' into transliterator-datastruct-generation

commit 72cff57
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 20:42:03 2023 +0200

    refactor pass2 interface

commit 8fa4dfd
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 20:31:29 2023 +0200

    skip compilation of cursors on source side, anchors on target side

commit 54b0542
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 19:09:50 2023 +0200

    add comment

commit cba53a7
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 19:04:27 2023 +0200

    fix clippy warnings

commit 2dd2ec8
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 19:01:15 2023 +0200

    fmt

commit 56774fe
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 18:45:22 2023 +0200

    refactor MutVarTable

commit 6176769
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 18:31:18 2023 +0200

    revamp pass2 API

commit f8459c9
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 18:22:47 2023 +0200

    initial final data struct generation

commit d6873b0
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 17:48:41 2023 +0200

    Squash of transliterator-ir

    commit c85e861
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:40:53 2023 +0200

        borrow SingleID

    commit 06425a1
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:22:03 2023 +0200

        fix comment indentation

    commit 2f70922
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:09:13 2023 +0200

        update comments

    commit 47444ee
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:06:43 2023 +0200

        fmt

    commit c0de3a0
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 17:03:19 2023 +0200

        fix clippy, allow testing of intermediate pass1 values

    commit 227f738
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:55:53 2023 +0200

        fix compile errors by introducing 2 small clones per transliterator

    commit 512b158
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:49:01 2023 +0200

        doesn't compile - missing self deconstruction

    commit 7848f09
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:40:51 2023 +0200

        use rule group aggregation in pass1

    commit 93663e4
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 16:09:29 2023 +0200

        add rule group aggregation

    commit 57666eb
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 14:12:19 2023 +0200

        Squash of transliterator-compiler

        commit d1812b4
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 13:31:53 2023 +0200

            fix merge mistake

        commit f15f6eb
        Merge: abb91cc a39cfed
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 13:27:08 2023 +0200

            Merge branch 'main' into transliterator-compiler

        commit abb91cc
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 01:12:13 2023 +0200

            reformat tests

        commit f6a10f5
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 00:30:09 2023 +0200

            sizes => counts

        commit 9ffc2f0
        Author: Niels Saurer <[email protected]>
        Date:   Wed Aug 9 00:26:27 2023 +0200

            add more docs

        commit eae5748
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:46:20 2023 +0200

            remove TODO

        commit 6b09689
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:28:42 2023 +0200

            improve docs

        commit c9b16d5
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 23:15:23 2023 +0200

            clippy

        commit 020a677
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 22:53:14 2023 +0200

            add result aggregation to first pass

        commit 2d1bfd7
        Author: Niels Saurer <[email protected]>
        Date:   Tue Aug 8 16:28:23 2023 +0200

            add tests

        commit 6f35ea5
        Author: Niels Saurer <[email protected]>
        Date:   Mon Aug 7 22:25:56 2023 +0200

            CI fixes

        commit c6c4844
        Author: Niels Saurer <[email protected]>
        Date:   Sun Aug 6 20:06:31 2023 +0200

            first steps

        commit fb68218
        Author: Niels Saurer <[email protected]>
        Date:   Wed Jul 19 16:21:33 2023 +0000

            Squash transliterator-parser

            structure for transliterator parser

            start parsing ':: ... ;' rules

            complete ::-rule parsing

            add more global filter tests

            add negative tests for '::'-rules, be more restrictive

            update error docs

            add comment about static UnicodeSet type alias

            add variable defs

            escaping and fix unicodeset handling

            fix unicodeset tests

            function calls

            add variable-inside-unicodesets

            update tests

            rewrite parse_section using parse_element

            fix unquoted literal handling

            add cursor/placeholder tests

            add cursor support

            add allow(unused) for this PR

            remove unused dependencies

            add todo about inefficient unicodeset variablemap handling

            allow usage of UnicodeSet's VariableMap directly in TransliteratorParser

            avoid one allocation per parsed unicodeset

            remove done todo about allocation-free unicodeset parser hook

            avoid allocations for number parsing

            invalid num err with offset

            update comment

            switch to allocation free hex parsing (and support for multi escapes)

            fix main merge conflict

            support \p unicodesets

            remove todo for \p unicodeset parsing

            turn low-prio todo about avoiding clones into note

            turn non-memory-safety safety comments into regular comments

            add issue number to TODOs

            add transliteration component crate

    commit a39cfed
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 13:19:28 2023 +0200

        Add Parsing for Rule-Based Transliterators (unicode-org#3730)

    commit 57e9d59
    Author: Andrew Cupps <[email protected]>
    Date:   Tue Aug 8 18:53:26 2023 -0700

        Resolve follow-up comments to unicode-org#3760 (unicode-org#3818)

        * Docs for `U` and `r`

        * Delete empty test and add todo

        * Remove old code and empty era check

        * Add todo

commit c55c641
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 02:36:53 2023 +0200

    wip

commit c6cbb0a
Author: Niels Saurer <[email protected]>
Date:   Wed Aug 9 01:20:08 2023 +0200

    Squash of transliterator-compiler

    commit abb91cc
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 01:12:13 2023 +0200

        reformat tests

    commit f6a10f5
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 00:30:09 2023 +0200

        sizes => counts

    commit 9ffc2f0
    Author: Niels Saurer <[email protected]>
    Date:   Wed Aug 9 00:26:27 2023 +0200

        add more docs

    commit eae5748
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:46:20 2023 +0200

        remove TODO

    commit 6b09689
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:28:42 2023 +0200

        improve docs

    commit c9b16d5
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 23:15:23 2023 +0200

        clippy

    commit 020a677
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 22:53:14 2023 +0200

        add result aggregation to first pass

    commit 2d1bfd7
    Author: Niels Saurer <[email protected]>
    Date:   Tue Aug 8 16:28:23 2023 +0200

        add tests

    commit 6f35ea5
    Author: Niels Saurer <[email protected]>
    Date:   Mon Aug 7 22:25:56 2023 +0200

        CI fixes

    commit c6c4844
    Author: Niels Saurer <[email protected]>
    Date:   Sun Aug 6 20:06:31 2023 +0200

        first steps

    commit fb68218
    Author: Niels Saurer <[email protected]>
    Date:   Wed Jul 19 16:21:33 2023 +0000

        Squash transliterator-parser

        structure for transliterator parser

        start parsing ':: ... ;' rules

        complete ::-rule parsing

        add more global filter tests

        add negative tests for '::'-rules, be more restrictive

        update error docs

        add comment about static UnicodeSet type alias

        add variable defs

        escaping and fix unicodeset handling

        fix unicodeset tests

        function calls

        add variable-inside-unicodesets

        update tests

        rewrite parse_section using parse_element

        fix unquoted literal handling

        add cursor/placeholder tests

        add cursor support

        add allow(unused) for this PR

        remove unused dependencies

        add todo about inefficient unicodeset variablemap handling

        allow usage of UnicodeSet's VariableMap directly in TransliteratorParser

        avoid one allocation per parsed unicodeset

        remove done todo about allocation-free unicodeset parser hook

        avoid allocations for number parsing

        invalid num err with offset

        update comment

        switch to allocation free hex parsing (and support for multi escapes)

        fix main merge conflict

        support \p unicodesets

        remove todo for \p unicodeset parsing

        turn low-prio todo about avoiding clones into note

        turn non-memory-safety safety comments into regular comments

        add issue number to TODOs

        add transliteration component crate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants