Skip to content

Latest commit

 

History

History
105 lines (75 loc) · 5.39 KB

persian.md

File metadata and controls

105 lines (75 loc) · 5.39 KB

Speech recognition models

ASR datasets (testing)

ASR results

WER results

Vosk 0.5 Vosk Small 0.5 Vosk 0.42 Vosk Small 0.42 Nemo FC Neuro FC Neuro Whisper Manifoldix XLS-R
Common Voice 29.7 31.2 16.7 23.4 16.2 24.6 23.5 29.5
Meetings 53.1 54.4 37.9 43.6 60.6 51.2 45.4 46.0
Fleurs 25.1 26.2 11.1 14.0 43.8 23.7 25.3 24.5

CER results

Vosk 0.5 Vosk Small 0.5 Vosk 0.42 Vosk Small 0.42 Nemo FC Neuro FC Neuro Whisper Manifoldix XLS-R
Common Voice 10.0 10.7 5.7 8.7 3.3 7.6 6.8 7.7
Meetings 27.4 27.8 18.7 22.4 40.1 26.0 23.0 17.9
Fleurs 7.0 7.5 4.0 5.1 27.2 9.0 6.7 6.5

Nemo is overtrained on Common Voice

Text to speech models

Dictionaries and G2P

G2P converters

Online resources

Notes

Overall, Persian phoneset is more or less stable across packages. Subtle differences:

  • Tihu dictionary has some rare phones like '_' and '^' the purpose of which is unknown.
  • Glottal stop affects prosody but doesn't really have acoustic realization in fast speech, so it is removed in HMM models (Vosk). It is still beneficial to keep it in neural models (TTS)
  • PersianG2P predictor is based on RNN and vulnerable to hallucinations, so not really perfect. Also, the Tihu dataset is small. WFST predictor like Phonetisaurus much more accurate even the algorithm is simple by using bigger dataset.
  • Overall, more advanced model trained on Kaamel would perform much better (transformer)
  • Kaamel dictionary has separate phoneme IPA sound 'ɡ' (u0x261) as opposed to 'g'. Not very frequent and purpose is unknown.
  • Kaamel dictionary is most complete but has questionable entries as well

Normalization

No established code here but here are some variants

Others

Notes

There is "Colloquial Persian" which is somewhat different from spoken Persian. Yet to find the package to convert between them.

Persian literate language uses zero-width-non-joiner (u200c) for many words an the patterns to apply it are non-regularand some people expend it to present in recognized texts. Web texts of course do not have it.

Other awesome lists for Persian