Syllables for Icelandic

This module adds syllabification and stress labeling to phonetic transcriptions of Icelandic. You can use it to enrich existing dictionaries, i.e. produce dictionaries with syllable and stress labeling, or as an element in a TTS pipeline, to add those labelings to transcribed input text.

Data

Regardless of the use case, you have to know your phone set. The module provides necessary data for the SAMPA and IPA phonetic alphabets, if you are using something else you have to create your own data/cons_clusters_your_alphabet.txt and data/vowels_your_alphabet.txt. All filenames are defined in src/dictionaries.py, adjust according to your setup.

Enrich a dictionary

To produce a pronunciation dictionary, call syllabify_and_label_dict(your_dictionary), where your_dictionary is a filename of a pronunciation dictionary in plain text format, one entry per line, word and transcription separated by \t, each phone separated by space:

aaron	a: r O n
abbadísin	a p a t i s I n
abbas	a p a s
...

There are two possible outputs implemented, the syllable structure allows you, however, to easily adapt the output to your needs.

CMU-format

("aaron" nil (((a: ) 1) ((r O n ) 0)))
("abbadísin" nil (((a ) 1) ((p a ) 0) ((t i ) 1) ((s I n ) 0)))
("abbas" nil (((a ) 1) ((p a s ) 0)))
...

Plain syllable formt (no stress labels)

aaron	a:.r O n
abbadísin	a.p a.t i.s I n
abbas	a.p a s

Label TTS input

To label phonetic transcriptions in a TTS pipeline, the module needs two lists: a list of words and a list of their transcripts, where the indices in both lists correspond to each other. That is, the transcript for the word at word_list[n] is found at transcriptions_list[n] .

Example:

# Input:
['hvernig', 'hefur', 'þú', 'það']
['k_h v E r t n I G', 'h E: v Y r', 'T u:', 'T a: D']
# Output, syllables only:
['k_h v E r t.n I G', 'h E:.v Y r', 'T u:', 'T a: D']
# Output, syllables and stress:
['k_h v E1 r t.n I0 G', 'h E:1.v Y0 r', 'T u:1', 'T a:1 D']

Trouble shooting & inquiries

This application is still in development. If you encounter any errors, feel free to open an issue inside the issue tracker. You can also contact us via email.

Contributing

You can contribute to this project by forking it, creating a private branch and opening a new pull request.

License

This software is developed under the auspices of the Icelandic Government 5-Year Language Technology Program, described here and here (English).

This software is licensed under the Apache License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Syllables for Icelandic

Data

Enrich a dictionary

Label TTS input

Trouble shooting & inquiries

Contributing

License

About

Releases

Packages

Languages

grammatek/syllables

Folders and files

Latest commit

History

Repository files navigation

Syllables for Icelandic

Data

Enrich a dictionary

Label TTS input

Trouble shooting & inquiries

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages