End-to-end (E2E) automatic speech recognition (ASR) models were implemented with Pytorch.
We used KsponSpeech dataset for training and Hydra to control all the training configurations.
pip install -e .
You can download dataset at AI-Hub. Anyone can download this dataset just by applying. Then, the KsponSpeech dataset was preprocessed through here.
You can choose from several models and training options.
- Deep Speech2 Training
$ python main.py \
model=deepspeech2 \
train=deepspeech2_train \
train.dataset_path=$DATASET_PATH \
train.audio_path=$AUDIO_PATH \
train.label_path=$LABEL_PATH
- Listen, Attend and Spell Training
$ python main.py \
model=las train=las_train \
train.dataset_path=$DATASET_PATH \
train.audio_path=$AUDIO_PATH \
train.label_path=$LABEL_PATH
- Joint CTC-Attention Listen, Attend and Spell Training
$ python main.py \
model=joint_ctc_attention_las \
train=las_train \
train.dataset_path=$DATASET_PATH \
train.audio_path=$AUDIO_PATH \
train.label_path=$LABEL_PATH
$ python eval.py \
eval.dataset_path=$DATASET_PATH \
eval.audio_path=$AUDIO_PATH \
eval.label_path=$LABEL_PATH \
eval.model_path=$MODEL_PATH
MIT License
Copyright (c) 2021 Sangchun Ha
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions: