Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
aso-mehmudi authored Apr 21, 2019
1 parent 02ff660 commit 607103a
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# Kurdish-G2P-dataset
Datasets for evaluation of Central Kurdish Grapheme-to-Phoneme Conversion systems

## Datasets
## Format
Central Kurdish words in Standard Arabic script and its corresponding phoneme string separated by tab character. Syllable start is indicated by full stop. For example:
`ئازادی .ʔa.za.dî`

## Datasets
### AsoSoft Kurdish Corpus most frequent tokens
Manually transliterated First 5000 most frequent words of AsoSoft Kurdish Corpus presented by:
Manually converted First 5000 most frequent words of AsoSoft Kurdish Corpus presented by:

Veisi, H., MohammadAmini, M., & Hosseini, H. (2019). “Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus”. Digital Scholarship in the Humanities.

### Wergor dataset
Manually transliterated 5041 unique words of document presented by: https://github.com/sinaahmadi/wergor
Manually converted 5041 unique words of document presented by: https://github.com/sinaahmadi/wergor

Ahmadi, S. (2019). “A Rule-Based Kurdish Text Transliteration System”. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(2), 18.

0 comments on commit 607103a

Please sign in to comment.