-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doc about installation and usage #7
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,84 @@ | ||
Working in progress. | ||
|
||
# Table of Contents | ||
|
||
- [Installation](#installation) | ||
* [Install k2](#install-k2) | ||
* [Install lhotse](#install-lhotse) | ||
* [Install icefall](#install-icefall) | ||
- [Run recipes](#run-recipes) | ||
|
||
## Installation | ||
|
||
`icefall` depends on [k2][k2] for FSA operations and [lhotse][lhotse] for | ||
data preparations. To use `icefall`, you have to install its dependencies first. | ||
The following subsections describe how to setup the environment. | ||
|
||
CAUTION: There are various ways to setup the environment. What we describe | ||
here is just one alternative. | ||
|
||
### Install k2 | ||
|
||
Please refer to [k2's installation documentation][k2-install] to install k2. | ||
If you have any issues about installing k2, please open an issue at | ||
<https://github.com/k2-fsa/k2/issues>. | ||
|
||
The following shows the minimal commands needed to install k2 from source: | ||
|
||
```bash | ||
mkdir $HOME/open-source | ||
cd $HOME/open-source | ||
git clone https://github.com/k2-fsa/k2.git | ||
cd k2 | ||
mkdir build_release | ||
cd build_release | ||
cmake -DCMAKE_BUILD_TYPE=Release .. | ||
make -j _k2 | ||
export PYTHONPATH=$HOME/open-source/k2/k2/python:$PYTHONPATH | ||
export PYTHONPATH=$HOME/open-source/k2/build_release/lib:$PYTHONPATH | ||
``` | ||
|
||
To check that k2 is installed successfully, please run | ||
|
||
```bash | ||
python3 -m k2.version | ||
``` | ||
|
||
It should show the information about the environment in which | ||
k2 was built. | ||
|
||
### Install lhotse | ||
|
||
Please refer to [lhotse's installation documentation][lhotse-install] to install | ||
lhotse. | ||
|
||
### Install icefall | ||
|
||
`icefall` is a set of Python scripts. What you need to do is just to set | ||
the environment variable `PYTHONPATH`: | ||
|
||
```bash | ||
cd $HOME/open-source | ||
git clone https://github.com/k2-fsa/icefall | ||
cd icefall | ||
pip install -r requirements.txt | ||
export PYTHONPATH=$HOME/open-source/icefall:$PYTHONPATHON | ||
``` | ||
|
||
To verify `icefall` was installed successfully, you can run: | ||
|
||
```bash | ||
python3 -c "import icefall; print(icefall.__file__)" | ||
``` | ||
|
||
It should print the path to `icefall`. | ||
|
||
## Run recipes | ||
|
||
At present, only LibriSpeech recipe is provided. Please | ||
follow [egs/librispeech/ASR/README.md][LibriSpeech] to run it. | ||
|
||
[LibriSpeech]: egs/librispeech/ASR/README.md | ||
[k2-install]: https://k2.readthedocs.io/en/latest/installation/index.html# | ||
[k2]: https://github.com/k2-fsa/k2 | ||
[lhotse]: https://github.com/lhotse-speech/lhotse | ||
[lhotse-install]: https://lhotse.readthedocs.io/en/latest/getting-started.html#installation |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,121 +1,64 @@ | ||
|
||
Run `./prepare.sh` to prepare the data. | ||
## Data preparation | ||
|
||
Run `./xxx_train.py` (to be added) to train a model. | ||
|
||
## Conformer-CTC | ||
Results of the pre-trained model from | ||
`<https://huggingface.co/GuoLiyong/snowfall_bpe_model/tree/main/exp-duration-200-feat_batchnorm-bpe-lrfactor5.0-conformer-512-8-noam>` | ||
are given below | ||
|
||
### HLG - no LM rescoring | ||
|
||
(output beam size is 8) | ||
|
||
#### 1-best decoding | ||
If you want to use `./prepare.sh` to download everything for you, | ||
you can just run | ||
|
||
``` | ||
[test-clean-no_rescore] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ] | ||
[test-other-no_rescore] %WER 7.03% [3682 / 52343, 220 ins, 1024 del, 2438 sub ] | ||
./prepare.sh | ||
``` | ||
|
||
#### n-best decoding | ||
|
||
For n=100, | ||
If you have pre-downloaded the LibriSpeech dataset, please | ||
read `./prepare.sh` and modify it to point to the location | ||
of your dataset so that it won't re-download it. After modification, | ||
please run | ||
|
||
``` | ||
[test-clean-no_rescore-100] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ] | ||
[test-other-no_rescore-100] %WER 7.14% [3737 / 52343, 275 ins, 1020 del, 2442 sub ] | ||
./prepare.sh | ||
``` | ||
|
||
For n=200, | ||
The script `./prepare.sh` prepares features, lexicon, LMs, etc. | ||
All generated files are saved in the folder `./data`. | ||
|
||
``` | ||
[test-clean-no_rescore-200] %WER 3.16% [1660 / 52576, 125 ins, 378 del, 1157 sub ] | ||
[test-other-no_rescore-200] %WER 7.04% [3684 / 52343, 228 ins, 1012 del, 2444 sub ] | ||
``` | ||
HINT: `./prepare.sh` support options `--stage` and `--stop-stage`. | ||
|
||
### HLG - with LM rescoring | ||
## TDNN-LSTM CTC training | ||
|
||
#### Whole lattice rescoring | ||
The folder `tdnn_lstm_ctc` contains scripts for CTC training | ||
with TDNN-LSTM models. | ||
|
||
``` | ||
[test-clean-lm_scale_0.8] %WER 2.77% [1456 / 52576, 150 ins, 210 del, 1096 sub ] | ||
[test-other-lm_scale_0.8] %WER 6.23% [3262 / 52343, 246 ins, 635 del, 2381 sub ] | ||
``` | ||
Pre-configured parameters for training and decoding are set in the function | ||
`get_params()` within `tdnn_lstm_ctc/train.py` | ||
and `tdnn_lstm_ctc/decode.py`. | ||
|
||
WERs of different LM scales are: | ||
Parameters that can be passed from the commandline can be found by | ||
|
||
``` | ||
For test-clean, WER of different settings are: | ||
lm_scale_0.8 2.77 best for test-clean | ||
lm_scale_0.9 2.87 | ||
lm_scale_1.0 3.06 | ||
lm_scale_1.1 3.34 | ||
lm_scale_1.2 3.71 | ||
lm_scale_1.3 4.18 | ||
lm_scale_1.4 4.8 | ||
lm_scale_1.5 5.48 | ||
lm_scale_1.6 6.08 | ||
lm_scale_1.7 6.79 | ||
lm_scale_1.8 7.49 | ||
lm_scale_1.9 8.14 | ||
lm_scale_2.0 8.82 | ||
|
||
For test-other, WER of different settings are: | ||
lm_scale_0.8 6.23 best for test-other | ||
lm_scale_0.9 6.37 | ||
lm_scale_1.0 6.62 | ||
lm_scale_1.1 6.99 | ||
lm_scale_1.2 7.46 | ||
lm_scale_1.3 8.13 | ||
lm_scale_1.4 8.84 | ||
lm_scale_1.5 9.61 | ||
lm_scale_1.6 10.32 | ||
lm_scale_1.7 11.17 | ||
lm_scale_1.8 12.12 | ||
lm_scale_1.9 12.93 | ||
lm_scale_2.0 13.77 | ||
./tdnn_lstm_ctc/train.py --help | ||
./tdnn_lstm_ctc/decode.py --help | ||
``` | ||
|
||
#### n-best LM rescoring | ||
|
||
n = 100 | ||
If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for | ||
mutli-GPU training, you can run | ||
|
||
``` | ||
[test-clean-lm_scale_0.8] %WER 2.79% [1469 / 52576, 149 ins, 212 del, 1108 sub ] | ||
[test-other-lm_scale_0.8] %WER 6.36% [3329 / 52343, 259 ins, 666 del, 2404 sub ] | ||
export CUDA_VISIBLE_DEVICES="0,2,3" | ||
./tdnn_lstm_ctc/train.py \ | ||
--master-port 12345 \ | ||
--world-size 3 | ||
``` | ||
|
||
WERs of different LM scales are: | ||
If you want to decode by averaging checkpoints `epoch-8.pt`, | ||
`epoch-9.pt` and `epoch-10.pt`, you can run | ||
|
||
``` | ||
For test-clean, WER of different settings are: | ||
lm_scale_0.8 2.79 best for test-clean | ||
lm_scale_0.9 2.89 | ||
lm_scale_1.0 3.03 | ||
lm_scale_1.1 3.28 | ||
lm_scale_1.2 3.52 | ||
lm_scale_1.3 3.78 | ||
lm_scale_1.4 4.04 | ||
lm_scale_1.5 4.24 | ||
lm_scale_1.6 4.45 | ||
lm_scale_1.7 4.58 | ||
lm_scale_1.8 4.7 | ||
lm_scale_1.9 4.8 | ||
lm_scale_2.0 4.92 | ||
For test-other, WER of different settings are: | ||
lm_scale_0.8 6.36 best for test-other | ||
lm_scale_0.9 6.45 | ||
lm_scale_1.0 6.64 | ||
lm_scale_1.1 6.92 | ||
lm_scale_1.2 7.25 | ||
lm_scale_1.3 7.59 | ||
lm_scale_1.4 7.88 | ||
lm_scale_1.5 8.13 | ||
lm_scale_1.6 8.36 | ||
lm_scale_1.7 8.54 | ||
lm_scale_1.8 8.71 | ||
lm_scale_1.9 8.88 | ||
lm_scale_2.0 9.02 | ||
./tdnn_lstm_ctc/decode.py \ | ||
--epoch 10 \ | ||
--avg 3 | ||
``` | ||
|
||
## Conformer CTC training | ||
|
||
The folder `conformer-ctc` contains scripts for CTC training | ||
with conformer models. The steps of running the training and | ||
decoding are similar as `tdnn_lstm_ctc`. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ | |
from conformer import Conformer | ||
from lhotse.utils import fix_random_seed | ||
from torch.nn.parallel import DistributedDataParallel as DDP | ||
from torch.nn.utils import clip_grad_norm_ | ||
from torch.utils.tensorboard import SummaryWriter | ||
from transformer import Noam | ||
|
||
|
@@ -114,7 +115,9 @@ def get_params() -> AttributeDict: | |
|
||
- log_interval: Print training loss if batch_idx % log_interval` is 0 | ||
|
||
- valid_interval: Run validation if batch_idx % valid_interval` is 0 | ||
- valid_interval: Run validation if batch_idx % valid_interval is 0 | ||
|
||
- reset_interval: Reset statistics if batch_idx % reset_interval is 0 | ||
|
||
- beam_size: It is used in k2.ctc_loss | ||
|
||
|
@@ -124,19 +127,20 @@ def get_params() -> AttributeDict: | |
""" | ||
params = AttributeDict( | ||
{ | ||
"exp_dir": Path("conformer_ctc/exp"), | ||
"exp_dir": Path("conformer_ctc/exp_new"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest to put the exp_dir to the parent folder, e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mm, it was my idea to co-locate the models and code like this. The idea is, it's easier to keep them in sync, and given a model, it would generally be fairly clear which code generated it. And we'd add the code to git (at least that was my idea.. although of course that would mean the code would disappear once you switched branches, which might not be ideal.) |
||
"lang_dir": Path("data/lang_bpe"), | ||
"feature_dim": 80, | ||
"weight_decay": 0.0, | ||
"weight_decay": 1e-6, | ||
"subsampling_factor": 4, | ||
"start_epoch": 0, | ||
"num_epochs": 50, | ||
"num_epochs": 20, | ||
"best_train_loss": float("inf"), | ||
"best_valid_loss": float("inf"), | ||
"best_train_epoch": -1, | ||
"best_valid_epoch": -1, | ||
"batch_idx_train": 0, | ||
"log_interval": 10, | ||
"reset_interval": 200, | ||
"valid_interval": 3000, | ||
"beam_size": 10, | ||
"reduction": "sum", | ||
|
@@ -440,6 +444,8 @@ def train_one_epoch( | |
tot_att_loss = 0.0 | ||
|
||
tot_frames = 0.0 # sum of frames over all batches | ||
params.tot_loss = 0.0 | ||
params.tot_frames = 0.0 | ||
for batch_idx, batch in enumerate(train_dl): | ||
params.batch_idx_train += 1 | ||
batch_size = len(batch["supervisions"]["text"]) | ||
|
@@ -457,6 +463,7 @@ def train_one_epoch( | |
|
||
optimizer.zero_grad() | ||
loss.backward() | ||
clip_grad_norm_(model.parameters(), 5.0, 2.0) | ||
optimizer.step() | ||
|
||
loss_cpu = loss.detach().cpu().item() | ||
|
@@ -468,6 +475,9 @@ def train_one_epoch( | |
tot_ctc_loss += ctc_loss_cpu | ||
tot_att_loss += att_loss_cpu | ||
|
||
params.tot_frames += params.train_frames | ||
params.tot_loss += loss_cpu | ||
|
||
tot_avg_loss = tot_loss / tot_frames | ||
tot_avg_ctc_loss = tot_ctc_loss / tot_frames | ||
tot_avg_att_loss = tot_att_loss / tot_frames | ||
|
@@ -516,6 +526,12 @@ def train_one_epoch( | |
tot_avg_loss, | ||
params.batch_idx_train, | ||
) | ||
if batch_idx > 0 and batch_idx % params.reset_interval == 0: | ||
tot_loss = 0.0 # sum of losses over all batches | ||
tot_ctc_loss = 0.0 | ||
tot_att_loss = 0.0 | ||
|
||
tot_frames = 0.0 # sum of frames over all batches | ||
|
||
if batch_idx > 0 and batch_idx % params.valid_interval == 0: | ||
compute_validation_loss( | ||
|
@@ -551,7 +567,7 @@ def train_one_epoch( | |
params.batch_idx_train, | ||
) | ||
|
||
params.train_loss = tot_loss / tot_frames | ||
params.train_loss = params.tot_loss / params.tot_frames | ||
|
||
if params.train_loss < params.best_train_loss: | ||
params.best_train_epoch = params.cur_epoch | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be more friendly to put conda or pip installation guide here rather than build from source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just removed this kind of information and refer the reader to the documentation of k2 for installation instructions.