Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parsing for Rule-Based Transliterators #3730

Merged
merged 38 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
62d390a
UnicodeSet return consumed bytes
skius Jul 24, 2023
934db5d
structure for transliterator parser
skius Jul 19, 2023
1213df3
start parsing ':: ... ;' rules
skius Jul 19, 2023
db3e57b
complete ::-rule parsing
skius Jul 20, 2023
66373e8
add more global filter tests
skius Jul 20, 2023
723735c
add negative tests for '::'-rules, be more restrictive
skius Jul 20, 2023
ef50763
update error docs
skius Jul 20, 2023
a5c84d5
add comment about static UnicodeSet type alias
skius Jul 20, 2023
e2ee0e0
add variable defs
skius Jul 20, 2023
e97697e
escaping and fix unicodeset handling
skius Jul 21, 2023
ee67510
fix unicodeset tests
skius Jul 21, 2023
fbdbe8b
function calls
skius Jul 21, 2023
3656229
add variable-inside-unicodesets
skius Jul 21, 2023
3f1752a
update tests
skius Jul 21, 2023
dde0ed5
rewrite parse_section using parse_element
skius Jul 21, 2023
ad81bc6
fix unquoted literal handling
skius Jul 21, 2023
df14c25
add cursor/placeholder tests
skius Jul 21, 2023
d120594
add cursor support
skius Jul 21, 2023
99494e9
add allow(unused) for this PR
skius Jul 21, 2023
2ad5b49
remove unused dependencies
skius Jul 21, 2023
6c7fa9b
add todo about inefficient unicodeset variablemap handling
skius Jul 21, 2023
d12981b
allow usage of UnicodeSet's VariableMap directly in TransliteratorParser
skius Jul 22, 2023
e01d8f9
avoid one allocation per parsed unicodeset
skius Jul 22, 2023
c2f5b03
remove done todo about allocation-free unicodeset parser hook
skius Jul 22, 2023
9089707
avoid allocations for number parsing
skius Jul 22, 2023
425e80a
invalid num err with offset
skius Jul 22, 2023
b099035
update comment
skius Jul 22, 2023
44c9cef
switch to allocation free hex parsing (and support for multi escapes)
skius Jul 24, 2023
ece691e
fix main merge conflict
skius Jul 24, 2023
d29a806
support \p unicodesets
skius Jul 24, 2023
a8045bf
remove todo for \p unicodeset parsing
skius Jul 24, 2023
16eac30
turn low-prio todo about avoiding clones into note
skius Jul 24, 2023
553f7ef
turn non-memory-safety safety comments into regular comments
skius Jul 24, 2023
0a7faad
Merge branch 'main' into transliterator-parser
skius Jul 25, 2023
206c69c
add issue number to TODOs
skius Jul 25, 2023
87f5dc8
Merge branch 'main' into transliterator-parser
skius Aug 8, 2023
4217778
doc fixes
skius Aug 8, 2023
a7903a8
Merge branch 'main' into transliterator-parser
skius Aug 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ members = [
"experimental/ixdtf",
"experimental/relativetime",
"experimental/relativetime/data",
"experimental/transliterator_parser",
"experimental/transliteration",
"experimental/unicodeset_parser",
"ffi/capi_cdylib",
Expand Down
1 change: 1 addition & 0 deletions docs/tutorials/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

34 changes: 34 additions & 0 deletions experimental/transliterator_parser/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# This file is part of ICU4X. For terms of use, please see the file
# called LICENSE at the top level of the ICU4X source tree
# (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

[package]
name = "icu_transliterator_parser"
description = "API to parse transform rules into transliterators as defined in UTS35"
version = "0.0.0"
authors = ["The ICU4X Project Developers"]
edition = "2021"
readme = "README.md"
repository = "https://github.com/unicode-org/icu4x"
license = "Unicode-DFS-2016"
categories = ["internationalization"]
# Keep this in sync with other crates unless there are exceptions
include = [
"src/**/*",
"tests/**/*",
"Cargo.toml",
"LICENSE",
"README.md"
]

[package.metadata.docs.rs]
all-features = true

[dependencies]
icu_collections = { path = "../../components/collections" }
icu_properties = { path = "../../components/properties", default-features = false }
icu_provider = { path = "../../provider/core" }
icu_unicodeset_parser = { path = "../unicodeset_parser" }

[features]
compiled_data = ["icu_properties/compiled_data"]
51 changes: 51 additions & 0 deletions experimental/transliterator_parser/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE

See Terms of Use <https://www.unicode.org/copyright.html>
for definitions of Unicode Inc.’s Data Files and Software.

NOTICE TO USER: Carefully read the following legal agreement.
BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S
DATA FILES ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"),
YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
TERMS AND CONDITIONS OF THIS AGREEMENT.
IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE
THE DATA FILES OR SOFTWARE.

COPYRIGHT AND PERMISSION NOTICE

Copyright © 1991-2022 Unicode, Inc. All rights reserved.
Distributed under the Terms of Use in https://www.unicode.org/copyright.html.

Permission is hereby granted, free of charge, to any person obtaining
a copy of the Unicode data files and any associated documentation
(the "Data Files") or Unicode software and any associated documentation
(the "Software") to deal in the Data Files or Software
without restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, and/or sell copies of
the Data Files or Software, and to permit persons to whom the Data Files
or Software are furnished to do so, provided that either
(a) this copyright and permission notice appear with all copies
of the Data Files or Software, or
(b) this copyright and permission notice appear in associated
Documentation.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT OF THIRD PARTY RIGHTS.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS
NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THE DATA FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder
shall not be used in advertising or otherwise to promote the sale,
use or other dealings in these Data Files or Software without prior
written authorization of the copyright holder.


Portions of ICU4X may have been adapted from ICU4C and/or ICU4J.
ICU 1.8.1 to ICU 57.1 © 1995-2016 International Business Machines Corporation and others.
13 changes: 13 additions & 0 deletions experimental/transliterator_parser/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions experimental/transliterator_parser/src/compile.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

pub(crate) fn compile(
_rules: Vec<crate::parse::Rule>,
) -> Result<super::TransliteratorDataStruct, crate::ParseError> {
todo!()
}
116 changes: 116 additions & 0 deletions experimental/transliterator_parser/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

//! `icu_transliterator_parser` is a utility crate of the [`ICU4X`] project.
//!
//! This crate provides parsing functionality for [UTS #35 - Transliterators](https://unicode.org/reports/tr35/tr35-general.html#Transforms).
//!
//! See [`parse`](crate::parse()) for more information.
//!
//! [`ICU4X`]: ../icu/index.html

// https://github.com/unicode-org/icu4x/blob/main/docs/process/boilerplate.md#library-annotations
#![cfg_attr(
not(test),
deny(
clippy::indexing_slicing,
clippy::unwrap_used,
clippy::expect_used,
clippy::panic,
clippy::exhaustive_structs,
clippy::exhaustive_enums,
missing_debug_implementations,
)
)]
#![warn(missing_docs)]

use icu_properties::provider::*;
use icu_provider::prelude::*;

mod compile;
mod parse;

pub use parse::ParseError;
pub use parse::ParseErrorKind;

/// Standin for <https://github.com/skius/icu4x/blob/transliterator/experimental/transliteration/src/datastruct_design.rs>
/// Will live in runtime icu_transliteration crate
#[derive(Debug)]
#[non_exhaustive]
pub struct TransliteratorDataStruct;

/// Parse a rule based transliterator definition into a `TransliteratorDataStruct`.
///
/// See [UTS #35 - Transliterators](https://unicode.org/reports/tr35/tr35-general.html#Transforms) for more information.
#[cfg(feature = "compiled_data")]
pub fn parse(source: &str) -> Result<TransliteratorDataStruct, parse::ParseError> {
parse_unstable(source, &icu_properties::provider::Baked)
}

#[doc = icu_provider::gen_any_buffer_unstable_docs!(UNSTABLE, parse())]
pub fn parse_unstable<P>(
source: &str,
provider: &P,
) -> Result<TransliteratorDataStruct, parse::ParseError>
where
P: ?Sized
+ DataProvider<AsciiHexDigitV1Marker>
+ DataProvider<AlphabeticV1Marker>
+ DataProvider<BidiControlV1Marker>
+ DataProvider<BidiMirroredV1Marker>
+ DataProvider<CaseIgnorableV1Marker>
+ DataProvider<CasedV1Marker>
+ DataProvider<ChangesWhenCasefoldedV1Marker>
+ DataProvider<ChangesWhenCasemappedV1Marker>
+ DataProvider<ChangesWhenLowercasedV1Marker>
+ DataProvider<ChangesWhenNfkcCasefoldedV1Marker>
+ DataProvider<ChangesWhenTitlecasedV1Marker>
+ DataProvider<ChangesWhenUppercasedV1Marker>
+ DataProvider<DashV1Marker>
+ DataProvider<DefaultIgnorableCodePointV1Marker>
+ DataProvider<DeprecatedV1Marker>
+ DataProvider<DiacriticV1Marker>
+ DataProvider<EmojiV1Marker>
+ DataProvider<EmojiComponentV1Marker>
+ DataProvider<EmojiModifierV1Marker>
+ DataProvider<EmojiModifierBaseV1Marker>
+ DataProvider<EmojiPresentationV1Marker>
+ DataProvider<ExtendedPictographicV1Marker>
+ DataProvider<ExtenderV1Marker>
+ DataProvider<GraphemeBaseV1Marker>
+ DataProvider<GraphemeExtendV1Marker>
+ DataProvider<HexDigitV1Marker>
+ DataProvider<IdsBinaryOperatorV1Marker>
+ DataProvider<IdsTrinaryOperatorV1Marker>
+ DataProvider<IdContinueV1Marker>
+ DataProvider<IdStartV1Marker>
+ DataProvider<IdeographicV1Marker>
+ DataProvider<JoinControlV1Marker>
+ DataProvider<LogicalOrderExceptionV1Marker>
+ DataProvider<LowercaseV1Marker>
+ DataProvider<MathV1Marker>
+ DataProvider<NoncharacterCodePointV1Marker>
+ DataProvider<PatternSyntaxV1Marker>
+ DataProvider<PatternWhiteSpaceV1Marker>
+ DataProvider<QuotationMarkV1Marker>
+ DataProvider<RadicalV1Marker>
+ DataProvider<RegionalIndicatorV1Marker>
+ DataProvider<SentenceTerminalV1Marker>
+ DataProvider<SoftDottedV1Marker>
+ DataProvider<TerminalPunctuationV1Marker>
+ DataProvider<UnifiedIdeographV1Marker>
+ DataProvider<UppercaseV1Marker>
+ DataProvider<VariationSelectorV1Marker>
+ DataProvider<WhiteSpaceV1Marker>
+ DataProvider<XidContinueV1Marker>
+ DataProvider<GeneralCategoryMaskNameToValueV1Marker>
+ DataProvider<GeneralCategoryV1Marker>
+ DataProvider<ScriptNameToValueV1Marker>
+ DataProvider<ScriptV1Marker>
+ DataProvider<ScriptWithExtensionsPropertyV1Marker>
+ DataProvider<XidStartV1Marker>,
{
let parsed = parse::parse_unstable(source, provider)?;
compile::compile(parsed)
}
Loading
Loading