Align Open.Bible
data
Language | Passing | Failing | Unknown | Notes | Aligned Sample |
---|---|---|---|---|---|
Yoruba | 💚 | Psalm 119 | |||
Ewe | 💚 | Psalm 119 | |||
Lingala | 💚 | Psalm 119 | |||
Asante Twi | 💚 | ||||
Akuapem Twi | 💚 | ||||
Chichewa | ❤️🩹 | Passing with bad alignments | Psalm 119 | ||
Hausa | 💔 | ||||
Luo | 💔 | ||||
Luganda | 💔 | ||||
Kikuyu | 💔 | ||||
Arabic | ❓ | ||||
Kurdi Sorani | ❓ | ||||
Polish | ❓ | ||||
Vietnamese | ❓ |
$ git clone https://github.com/coqui-ai/open-bible-scripts.git
The first alignment approach is to use MFA to align and train a new acoustic model from stratch.
You need to install a couple things on your own:
Use the language name as defined in open-bible-scripts/data/*.txt
. Use the language code as expected by covo.
E.g., for Yoruba use yoruba
and yo
, for Ewe use ewe
and ee
, for Luganda luganda
and lg
, and so on.
$ cd open-bible-scripts
open-bible-scripts$ ./run-pre-alignment.sh yoruba yo
Generate alignments with mfa train
$ docker run -it --mount "type=bind,src=/home/ubuntu/open-bible-scripts,dst=/mnt" mmcauliffe/montreal-forced-aligner
(base) root@d8095c794d5f:/# conda activate aligner
(aligner) root@d8095c794d5f:/# mfa train --clean --num_jobs `nproc` --temp_directory /mnt/yoruba/data/mfa-tmp-dir --config_path /mnt/MFA_CONFIG /mnt/yoruba/data /mnt/yoruba/dict.txt /mnt/yoruba/data/mfa-output &> /mnt/yoruba/data/LOG &
# At this point, alignment will take a while,
# so you might want to detach from the docker container
# with `Ctrl-P followed by Ctrl-Q`
Use the language name as defined in open-bible-scripts/data/*.txt
.
E.g., for Yoruba use yoruba
, for Ewe use ewe
, for Luganda luganda
, and so on.
$ cd open-bible-scripts
open-bible-scripts$ ./run-post-alignment.sh yoruba yo
This works for only Lingala, Akuapem Twi, and Asante Twi.
Install sox on your OS. See linux installation below
sudo apt-get install sox
sudo apt-get install libsox-fmt-mp3
sox --version
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install pandas
Execute the run-biblica-splits-*.sh
script from the root dir, for example with Lingala:
./run-biblica-splits-lingala.sh