This repository is a fork of ESPnet and contains the code for the paper E2E-SincNet: toward fully end-to-end speech Recognition that will be presented at ICASSP 2020. E2E-SincNet is partially integrated to ESPnet due to major differences in the input data pipeline. E2E-SincNet will be part of the SpeechBrain toolkit. The provided code makes it feasible to reproduce the results obtained in the paper E2E-SincNet: toward fully end-to-end speech Recognition.
- TIMIT recipe: Ready to be used.
This repository is an enhanced version of an ESPnet fork. Therefore, the installation procedure is equivalent to the ESPnet one, making it easier to deploy SincNet in already existing setups.
The current version of E2E-SincNet supports an ASR recipe for the TIMIT dataset. Thus, a script named run_sincnet.sh
is available in egs/timit/asr1
to reproduce the results observed on the paper E2E-SincNet: toward fully end-to-end speech Recognition. Please note that the steps described in this README can be transposed to any recipe of the ESPnet toolkit. Therefore, WSJ results can be reproduced by following the same steps and modifying the run.sh
script.
The proposed integration of SincNet to ESPnet relies on a bridge between the input features preparation of PyTorch-Kaldi to the standard ESPnet recipes. Let us consider the TIMIT experiment in this tutorial. Therefore 4 steps are needed:
- Run the standard Kaldi TIMIT recipe until DNN training (no need to go further). This will create all the files needed by the features pre-processing script.
- Go to
egs/timit/local
and openconvert_sph_to_wav_kaldiscp_timit.py
. Please modify all the needed path in the latter script with respect to your setup. Then, just callpython convert_sph_to_wav_kaldiscp_timit.py
. This script converts all the TIMIT .WAV files to the correct format for further processing. PLEASE NOTE THAT THIS SCRIPT DUPLICATES THE WAV FILES SO YOU NEED WRITE ACCESS - Go to
egs/timit/local
and opensave_raw_fea.py
. Please modify all the needed path in the latter script with respect to your setup. More precisely, you will need to modify and call this script 3 times to generate the train/dev and test raw input features (python convert_sph_to_wav_kaldiscp_timit.py
). This script is from PyTorch-Kaldi and will be soon modified so step 1 becomes unnecessary. - You're good to finally launch the ESPnet recipe (
run_sincnet.sh
)!
All the configuration files are customizable in the same manner as ESPnet.