-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* upper cased date support Signed-off-by: ekmb <[email protected]> * update whitelist, change roman weights Signed-off-by: ekmb <[email protected]> * docstrings, space fix, init file Signed-off-by: ekmb <[email protected]> * lgtm Signed-off-by: ekmb <[email protected]> * fraction with measure class Signed-off-by: ekmb <[email protected]> Signed-off-by: Mike Chrzanowski <[email protected]>
- Loading branch information
Showing
29 changed files
with
474 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
13 changes: 13 additions & 0 deletions
13
nemo_text_processing/text_normalization/data/roman/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
49 changes: 49 additions & 0 deletions
49
nemo_text_processing/text_normalization/data/roman/digit_teen.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
i 1 | ||
ii 2 | ||
iii 3 | ||
iv 4 | ||
v 5 | ||
vi 6 | ||
vii 7 | ||
viii 8 | ||
ix 9 | ||
x 10 | ||
xi 11 | ||
xii 12 | ||
xiii 13 | ||
xiv 14 | ||
xv 15 | ||
xvi 16 | ||
xvii 17 | ||
xviii 18 | ||
xix 19 | ||
xx 20 | ||
xxi 21 | ||
xxii 22 | ||
xxiii 23 | ||
xxiv 24 | ||
xxv 25 | ||
xxvi 26 | ||
xxvii 27 | ||
xxviii 28 | ||
xxix 29 | ||
xxx 30 | ||
xxxi 31 | ||
xxxii 32 | ||
xxxiii 33 | ||
xxxiv 34 | ||
xxxv 35 | ||
xxxvi 36 | ||
xxxvii 37 | ||
xxxviii 38 | ||
xxxix 39 | ||
xl 40 | ||
xli 41 | ||
xlii 42 | ||
xliii 43 | ||
xliv 44 | ||
xlv 45 | ||
xlvi 46 | ||
xlvii 47 | ||
xlviii 48 | ||
xlix 49 |
9 changes: 9 additions & 0 deletions
9
nemo_text_processing/text_normalization/data/roman/hundreds.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
c 100 | ||
cc 200 | ||
ccc 300 | ||
cd 400 | ||
d 500 | ||
dc 600 | ||
dcc 700 | ||
dccc 800 | ||
cm 900 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
l 50 | ||
lx 60 | ||
lxx 70 | ||
lxxx 80 | ||
xc 90 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,9 @@ | ||
Ph.D. p h d | ||
Hon. honorable | ||
& and | ||
Mt. Mount | ||
Maj. Major | ||
Rev. Reverend | ||
# hash | ||
Gov. governor | ||
7-eleven seven eleven | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,4 +11,7 @@ Mrs. Misses | |
Ms. Miss | ||
Mr Mister | ||
Mrs Misses | ||
Ms Miss | ||
Ms Miss | ||
&Co. and Co. | ||
§ section | ||
= equals |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.