Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc about installation and usage #7

Merged
merged 4 commits into from
Aug 12, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 84 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,84 @@
Working in progress.

# Table of Contents

- [Installation](#installation)
* [Install k2](#install-k2)
* [Install lhotse](#install-lhotse)
* [Install icefall](#install-icefall)
- [Run recipes](#run-recipes)

## Installation

`icefall` depends on [k2][k2] for FSA operations and [lhotse][lhotse] for
data preparations. To use `icefall`, you have to install its dependencies first.
The following subsections describe how to setup the environment.

CAUTION: There are various ways to setup the environment. What we describe
here is just one alternative.

### Install k2

Please refer to [k2's installation documentation][k2-install] to install k2.
If you have any issues about installing k2, please open an issue at
<https://github.com/k2-fsa/k2/issues>.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be more friendly to put conda or pip installation guide here rather than build from source code.

Copy link
Collaborator Author

@csukuangfj csukuangfj Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I just removed this kind of information and refer the reader to the documentation of k2 for installation instructions.

The following shows the minimal commands needed to install k2 from source:

```bash
mkdir $HOME/open-source
cd $HOME/open-source
git clone https://github.com/k2-fsa/k2.git
cd k2
mkdir build_release
cd build_release
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j _k2
export PYTHONPATH=$HOME/open-source/k2/k2/python:$PYTHONPATH
export PYTHONPATH=$HOME/open-source/k2/build_release/lib:$PYTHONPATH
```

To check that k2 is installed successfully, please run

```bash
python3 -m k2.version
```

It should show the information about the environment in which
k2 was built.

### Install lhotse

Please refer to [lhotse's installation documentation][lhotse-install] to install
lhotse.

### Install icefall

`icefall` is a set of Python scripts. What you need to do is just to set
the environment variable `PYTHONPATH`:

```bash
cd $HOME/open-source
git clone https://github.com/k2-fsa/icefall
cd icefall
pip install -r requirements.txt
export PYTHONPATH=$HOME/open-source/icefall:$PYTHONPATHON
```

To verify `icefall` was installed successfully, you can run:

```bash
python3 -c "import icefall; print(icefall.__file__)"
```

It should print the path to `icefall`.

## Run recipes

At present, only LibriSpeech recipe is provided. Please
follow [egs/librispeech/ASR/README.md][LibriSpeech] to run it.

[LibriSpeech]: egs/librispeech/ASR/README.md
[k2-install]: https://k2.readthedocs.io/en/latest/installation/index.html#
[k2]: https://github.com/k2-fsa/k2
[lhotse]: https://github.com/lhotse-speech/lhotse
[lhotse-install]: https://lhotse.readthedocs.io/en/latest/getting-started.html#installation
133 changes: 38 additions & 95 deletions egs/librispeech/ASR/README.md
Original file line number Diff line number Diff line change
@@ -1,121 +1,64 @@

Run `./prepare.sh` to prepare the data.
## Data preparation

Run `./xxx_train.py` (to be added) to train a model.

## Conformer-CTC
Results of the pre-trained model from
`<https://huggingface.co/GuoLiyong/snowfall_bpe_model/tree/main/exp-duration-200-feat_batchnorm-bpe-lrfactor5.0-conformer-512-8-noam>`
are given below

### HLG - no LM rescoring

(output beam size is 8)

#### 1-best decoding
If you want to use `./prepare.sh` to download everything for you,
you can just run

```
[test-clean-no_rescore] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore] %WER 7.03% [3682 / 52343, 220 ins, 1024 del, 2438 sub ]
./prepare.sh
```

#### n-best decoding

For n=100,
If you have pre-downloaded the LibriSpeech dataset, please
read `./prepare.sh` and modify it to point to the location
of your dataset so that it won't re-download it. After modification,
please run

```
[test-clean-no_rescore-100] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore-100] %WER 7.14% [3737 / 52343, 275 ins, 1020 del, 2442 sub ]
./prepare.sh
```

For n=200,
The script `./prepare.sh` prepares features, lexicon, LMs, etc.
All generated files are saved in the folder `./data`.

```
[test-clean-no_rescore-200] %WER 3.16% [1660 / 52576, 125 ins, 378 del, 1157 sub ]
[test-other-no_rescore-200] %WER 7.04% [3684 / 52343, 228 ins, 1012 del, 2444 sub ]
```
HINT: `./prepare.sh` support options `--stage` and `--stop-stage`.

### HLG - with LM rescoring
## TDNN-LSTM CTC training

#### Whole lattice rescoring
The folder `tdnn_lstm_ctc` contains scripts for CTC training
with TDNN-LSTM models.

```
[test-clean-lm_scale_0.8] %WER 2.77% [1456 / 52576, 150 ins, 210 del, 1096 sub ]
[test-other-lm_scale_0.8] %WER 6.23% [3262 / 52343, 246 ins, 635 del, 2381 sub ]
```
Pre-configured parameters for training and decoding are set in the function
`get_params()` within `tdnn_lstm_ctc/train.py`
and `tdnn_lstm_ctc/decode.py`.

WERs of different LM scales are:
Parameters that can be passed from the commandline can be found by

```
For test-clean, WER of different settings are:
lm_scale_0.8 2.77 best for test-clean
lm_scale_0.9 2.87
lm_scale_1.0 3.06
lm_scale_1.1 3.34
lm_scale_1.2 3.71
lm_scale_1.3 4.18
lm_scale_1.4 4.8
lm_scale_1.5 5.48
lm_scale_1.6 6.08
lm_scale_1.7 6.79
lm_scale_1.8 7.49
lm_scale_1.9 8.14
lm_scale_2.0 8.82

For test-other, WER of different settings are:
lm_scale_0.8 6.23 best for test-other
lm_scale_0.9 6.37
lm_scale_1.0 6.62
lm_scale_1.1 6.99
lm_scale_1.2 7.46
lm_scale_1.3 8.13
lm_scale_1.4 8.84
lm_scale_1.5 9.61
lm_scale_1.6 10.32
lm_scale_1.7 11.17
lm_scale_1.8 12.12
lm_scale_1.9 12.93
lm_scale_2.0 13.77
./tdnn_lstm_ctc/train.py --help
./tdnn_lstm_ctc/decode.py --help
```

#### n-best LM rescoring

n = 100
If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for
mutli-GPU training, you can run

```
[test-clean-lm_scale_0.8] %WER 2.79% [1469 / 52576, 149 ins, 212 del, 1108 sub ]
[test-other-lm_scale_0.8] %WER 6.36% [3329 / 52343, 259 ins, 666 del, 2404 sub ]
export CUDA_VISIBLE_DEVICES="0,2,3"
./tdnn_lstm_ctc/train.py \
--master-port 12345 \
--world-size 3
```

WERs of different LM scales are:
If you want to decode by averaging checkpoints `epoch-8.pt`,
`epoch-9.pt` and `epoch-10.pt`, you can run

```
For test-clean, WER of different settings are:
lm_scale_0.8 2.79 best for test-clean
lm_scale_0.9 2.89
lm_scale_1.0 3.03
lm_scale_1.1 3.28
lm_scale_1.2 3.52
lm_scale_1.3 3.78
lm_scale_1.4 4.04
lm_scale_1.5 4.24
lm_scale_1.6 4.45
lm_scale_1.7 4.58
lm_scale_1.8 4.7
lm_scale_1.9 4.8
lm_scale_2.0 4.92
For test-other, WER of different settings are:
lm_scale_0.8 6.36 best for test-other
lm_scale_0.9 6.45
lm_scale_1.0 6.64
lm_scale_1.1 6.92
lm_scale_1.2 7.25
lm_scale_1.3 7.59
lm_scale_1.4 7.88
lm_scale_1.5 8.13
lm_scale_1.6 8.36
lm_scale_1.7 8.54
lm_scale_1.8 8.71
lm_scale_1.9 8.88
lm_scale_2.0 9.02
./tdnn_lstm_ctc/decode.py \
--epoch 10 \
--avg 3
```

## Conformer CTC training

The folder `conformer-ctc` contains scripts for CTC training
with conformer models. The steps of running the training and
decoding are similar as `tdnn_lstm_ctc`.
26 changes: 21 additions & 5 deletions egs/librispeech/ASR/conformer_ctc/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from conformer import Conformer
from lhotse.utils import fix_random_seed
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.nn.utils import clip_grad_norm_
from torch.utils.tensorboard import SummaryWriter
from transformer import Noam

Expand Down Expand Up @@ -114,7 +115,9 @@ def get_params() -> AttributeDict:

- log_interval: Print training loss if batch_idx % log_interval` is 0

- valid_interval: Run validation if batch_idx % valid_interval` is 0
- valid_interval: Run validation if batch_idx % valid_interval is 0

- reset_interval: Reset statistics if batch_idx % reset_interval is 0

- beam_size: It is used in k2.ctc_loss

Expand All @@ -124,19 +127,20 @@ def get_params() -> AttributeDict:
"""
params = AttributeDict(
{
"exp_dir": Path("conformer_ctc/exp"),
"exp_dir": Path("conformer_ctc/exp_new"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to put the exp_dir to the parent folder, e.g. "exp_dir": Path("exp/conformer_ctc"), Path("exp/tdnn_lstm"). So we will not copy the exp data when we need a new version model, say, conformer_ctc_v2, confomer_ctc_v3. It is better to keep only source code in the model directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, it was my idea to co-locate the models and code like this. The idea is, it's easier to keep them in sync, and given a model, it would generally be fairly clear which code generated it. And we'd add the code to git (at least that was my idea.. although of course that would mean the code would disappear once you switched branches, which might not be ideal.)

"lang_dir": Path("data/lang_bpe"),
"feature_dim": 80,
"weight_decay": 0.0,
"weight_decay": 1e-6,
"subsampling_factor": 4,
"start_epoch": 0,
"num_epochs": 50,
"num_epochs": 20,
"best_train_loss": float("inf"),
"best_valid_loss": float("inf"),
"best_train_epoch": -1,
"best_valid_epoch": -1,
"batch_idx_train": 0,
"log_interval": 10,
"reset_interval": 200,
"valid_interval": 3000,
"beam_size": 10,
"reduction": "sum",
Expand Down Expand Up @@ -440,6 +444,8 @@ def train_one_epoch(
tot_att_loss = 0.0

tot_frames = 0.0 # sum of frames over all batches
params.tot_loss = 0.0
params.tot_frames = 0.0
for batch_idx, batch in enumerate(train_dl):
params.batch_idx_train += 1
batch_size = len(batch["supervisions"]["text"])
Expand All @@ -457,6 +463,7 @@ def train_one_epoch(

optimizer.zero_grad()
loss.backward()
clip_grad_norm_(model.parameters(), 5.0, 2.0)
optimizer.step()

loss_cpu = loss.detach().cpu().item()
Expand All @@ -468,6 +475,9 @@ def train_one_epoch(
tot_ctc_loss += ctc_loss_cpu
tot_att_loss += att_loss_cpu

params.tot_frames += params.train_frames
params.tot_loss += loss_cpu

tot_avg_loss = tot_loss / tot_frames
tot_avg_ctc_loss = tot_ctc_loss / tot_frames
tot_avg_att_loss = tot_att_loss / tot_frames
Expand Down Expand Up @@ -516,6 +526,12 @@ def train_one_epoch(
tot_avg_loss,
params.batch_idx_train,
)
if batch_idx > 0 and batch_idx % params.reset_interval == 0:
tot_loss = 0.0 # sum of losses over all batches
tot_ctc_loss = 0.0
tot_att_loss = 0.0

tot_frames = 0.0 # sum of frames over all batches

if batch_idx > 0 and batch_idx % params.valid_interval == 0:
compute_validation_loss(
Expand Down Expand Up @@ -551,7 +567,7 @@ def train_one_epoch(
params.batch_idx_train,
)

params.train_loss = tot_loss / tot_frames
params.train_loss = params.tot_loss / params.tot_frames

if params.train_loss < params.best_train_loss:
params.best_train_epoch = params.cur_epoch
Expand Down
19 changes: 10 additions & 9 deletions egs/librispeech/ASR/conformer_ctc/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,9 @@
import math
from typing import Dict, List, Optional, Tuple

import k2
import torch
import torch.nn as nn
from subsampling import Conv2dSubsampling, VggSubsampling

from icefall.utils import get_texts
from torch.nn.utils.rnn import pad_sequence

# Note: TorchScript requires Dict/List/etc. to be fully typed.
Expand Down Expand Up @@ -274,9 +271,11 @@ def decoder_forward(
device
)

# TODO: Use eos_id as ignore_id.
# tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
# TODO: Use length information to create the decoder padding mask
# We set the first column to False since the first column in ys_in_pad
# contains sos_id, which is the same as eos_id in our current setting.
tgt_key_padding_mask[:, 0] = False

tgt = self.decoder_embed(ys_in_pad) # (N, T) -> (N, T, C)
tgt = self.decoder_pos(tgt)
Expand Down Expand Up @@ -339,9 +338,11 @@ def decoder_nll(
device
)

# TODO: Use eos_id as ignore_id.
# tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
# TODO: Use length information to create the decoder padding mask
# We set the first column to False since the first column in ys_in_pad
# contains sos_id, which is the same as eos_id in our current setting.
tgt_key_padding_mask[:, 0] = False

tgt = self.decoder_embed(ys_in_pad) # (B, T) -> (B, T, F)
tgt = self.decoder_pos(tgt)
Expand Down
Loading