Skip to content

A module to enrich phonetic transcriptions of Icelandic with syllables and stress labels

Notifications You must be signed in to change notification settings

grammatek/syllables

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Syllables for Icelandic

This module adds syllabification and stress labeling to phonetic transcriptions of Icelandic. You can use it to enrich existing dictionaries, i.e. produce dictionaries with syllable and stress labeling, or as an element in a TTS pipeline, to add those labelings to transcribed input text.

Data

Regardless of the use case, you have to know your phone set. The module provides necessary data for the SAMPA and IPA phonetic alphabets, if you are using something else you have to create your own data/cons_clusters_your_alphabet.txt and data/vowels_your_alphabet.txt. All filenames are defined in src/dictionaries.py, adjust according to your setup.

Enrich a dictionary

To produce a pronunciation dictionary, call syllabify_and_label_dict(your_dictionary), where your_dictionary is a filename of a pronunciation dictionary in plain text format, one entry per line, word and transcription separated by \t, each phone separated by space:

aaron	a: r O n
abbadísin	a p a t i s I n
abbas	a p a s
...

There are two possible outputs implemented, the syllable structure allows you, however, to easily adapt the output to your needs.

CMU-format

("aaron" nil (((a: ) 1) ((r O n ) 0)))
("abbadísin" nil (((a ) 1) ((p a ) 0) ((t i ) 1) ((s I n ) 0)))
("abbas" nil (((a ) 1) ((p a s ) 0)))
...

Plain syllable formt (no stress labels)

aaron	a:.r O n
abbadísin	a.p a.t i.s I n
abbas	a.p a s

Label TTS input

To label phonetic transcriptions in a TTS pipeline, the module needs two lists: a list of words and a list of their transcripts, where the indices in both lists correspond to each other. That is, the transcript for the word at word_list[n] is found at transcriptions_list[n] .

Example:

# Input:
['hvernig', 'hefur', 'þú', 'það']
['k_h v E r t n I G', 'h E: v Y r', 'T u:', 'T a: D']
# Output, syllables only:
['k_h v E r t.n I G', 'h E:.v Y r', 'T u:', 'T a: D']
# Output, syllables and stress:
['k_h v E1 r t.n I0 G', 'h E:1.v Y0 r', 'T u:1', 'T a:1 D']

Trouble shooting & inquiries

This application is still in development. If you encounter any errors, feel free to open an issue inside the issue tracker. You can also contact us via email.

Contributing

You can contribute to this project by forking it, creating a private branch and opening a new pull request.

License

Grammatek

Copyright © 2020, 2021 Grammatek ehf.

This software is developed under the auspices of the Icelandic Government 5-Year Language Technology Program, described here and here (English).

This software is licensed under the Apache License

About

A module to enrich phonetic transcriptions of Icelandic with syllables and stress labels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages