-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
02ff660
commit 607103a
Showing
1 changed file
with
6 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,17 @@ | ||
# Kurdish-G2P-dataset | ||
Datasets for evaluation of Central Kurdish Grapheme-to-Phoneme Conversion systems | ||
|
||
## Datasets | ||
## Format | ||
Central Kurdish words in Standard Arabic script and its corresponding phoneme string separated by tab character. Syllable start is indicated by full stop. For example: | ||
`ئازادی .ʔa.za.dî` | ||
|
||
## Datasets | ||
### AsoSoft Kurdish Corpus most frequent tokens | ||
Manually transliterated First 5000 most frequent words of AsoSoft Kurdish Corpus presented by: | ||
Manually converted First 5000 most frequent words of AsoSoft Kurdish Corpus presented by: | ||
|
||
Veisi, H., MohammadAmini, M., & Hosseini, H. (2019). “Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus”. Digital Scholarship in the Humanities. | ||
|
||
### Wergor dataset | ||
Manually transliterated 5041 unique words of document presented by: https://github.com/sinaahmadi/wergor | ||
Manually converted 5041 unique words of document presented by: https://github.com/sinaahmadi/wergor | ||
|
||
Ahmadi, S. (2019). “A Rule-Based Kurdish Text Transliteration System”. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(2), 18. |