Adding Module Models #446

tomassosorio · 2020-02-26T13:56:48Z

🚀 Feature

Similar to torchvision would be interesting to have the architecture of the most important models available, such as Wav2Letter, DeepSpeech, DeepSpeech2 between others...

Motivation

When trying to train a model it is already available most features needed such as Datasets, Data Transformations, however it is not available any architecture of a known model.

Pitch

Adding a module with the architecture of the most known models.

vincentqb · 2020-03-02T20:28:24Z

This is a good idea. Would you like to contribute some?

EDIT: We need a way of validating a model implementation before being able to add it in torchaudio:

For wav2letter and wavernn, we have a training pipeline showing reasonable convergence.
In Add MobileNetV3 Architecture in TorchVision vision#3182, we (also) provided pre-trained weights that worked well with the model.
If this is taken from an existing pytorch implementation, the reference can be used to compare numerically against.

tomassosorio · 2020-03-02T20:53:44Z

Sure!

I will inspire into torchvision structure and add at first Wav2Letter.

At the time I cannot offer the model with the trained weights, but I can talk with my company to see if we could offer this to torchaudio in the future.

vincentqb · 2020-03-04T23:20:45Z

Let's start with just the model class, untrained. Hosting trained models (and datasets) is a separate discussion :)

vincentqb · 2020-05-05T15:59:55Z

@tomassosorio -- thanks again for merging wav2letter! Which other ones do you believe would be a good idea to request help to implement?

Wav2Letter
DeepSpeech
DeepSpeech2

tomassosorio · 2020-05-05T16:06:37Z

Yes, I fell both of them would be good options as future implementations, such as the Wav2Letter++, in this link there is also some other models that could be interesting to implement :)

koukyo1994 · 2020-05-15T04:01:57Z

How about PANNs? It has a rich collection of pre-trained audio tagging / sed models.

limazix · 2020-05-15T13:57:30Z

Hi guys, I'd like to help with this task. Where can I start?

Edresson · 2020-06-27T21:41:18Z

@vincentqb I think it would be interesting to implement Jasper too, as it reported state of the art on LibreSpeech :).

It is available on OpenSeq2Seq

What do you think about converting the checkpoints of these models to pytorch and using them in the initialization of model weights?

vincentqb · 2020-09-28T15:28:07Z

@vincentqb I think it would be interesting to implement Jasper too, as it reported state of the art on LibreSpeech :).

It is available on OpenSeq2Seq

Thanks for suggesting jasper here, and there :) Do ping me when you open a pull request for it :)

What do you think about converting the checkpoints of these models to pytorch and using them in the initialization of model weights?

We do plan on offering some pre-trained models for those we offer. Is that what you mean?

vincentqb · 2020-09-28T15:29:17Z

@Edresson -- thanks also for suggestion wav2vec -- it is on our radar, and we would love to have example pipeline such as wav2letter for it :) we can talk more about this if this is of interest to you

Edresson · 2020-09-28T16:07:18Z

@vincentqb I think it would be interesting to implement Jasper too, as it reported state of the art on LibreSpeech :).
It is available on OpenSeq2Seq

Thanks for suggesting jasper here, and there :) Do ping me when you open a pull request for it :)

What do you think about converting the checkpoints of these models to pytorch and using them in the initialization of model weights?

We do plan on offering some pre-trained models for those we offer. Is that what you mean?

Yes, I thought about maybe converting the openseq2seq checkpoints. But training and providing a checkpoint may be a better option.

* Fix CUDNN error in spatial_transformer_tutorial.py * Better checking * disable audio_classifier_tutorial.py * try to fix spatial_transformer_tutorial.py * fix path bug

discort · 2021-03-18T16:01:17Z

@vincentqb

I'd like to add vanilla deepspeech model

mthrok · 2021-08-03T20:21:50Z

Closing the issue as torchaudio now has models submodule.

For adding a new model, please propose it by opening a new issue.

tomassosorio mentioned this issue Mar 11, 2020

Add model Wav2Letter #462

Merged

vincentqb added the help wanted label May 5, 2020

jimchen90 mentioned this issue Jun 8, 2020

Add MelResNet Block #705

Merged

This was referenced Jun 17, 2020

UpsampleNetwork #724

Merged

Add WaveRNN Model #735

Merged

jimchen90 mentioned this issue Jun 24, 2020

Add wavernn example pipeline #749

Merged

2 tasks

vincentqb mentioned this issue Sep 28, 2020

Low WER training pipeline in torchaudio with wav2letter #913

Closed

discort added a commit to discort/audio that referenced this issue Mar 18, 2021

pytorch#446 add vanilla deepspeech model

6da13db

discort mentioned this issue Mar 18, 2021

Add vanilla DeepSpeech model #1399

Merged

vincentqb pushed a commit to discort/audio that referenced this issue May 11, 2021

pytorch#446 add vanilla deepspeech model

9b359a1

mthrok closed this as completed Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Module Models #446

Adding Module Models #446

tomassosorio commented Feb 26, 2020

vincentqb commented Mar 2, 2020 •

edited

Loading

tomassosorio commented Mar 2, 2020

vincentqb commented Mar 4, 2020

vincentqb commented May 5, 2020

tomassosorio commented May 5, 2020 •

edited

Loading

koukyo1994 commented May 15, 2020 •

edited

Loading

limazix commented May 15, 2020

Edresson commented Jun 27, 2020

vincentqb commented Sep 28, 2020

vincentqb commented Sep 28, 2020

Edresson commented Sep 28, 2020

discort commented Mar 18, 2021

mthrok commented Aug 3, 2021

Adding Module Models #446

Adding Module Models #446

Comments

tomassosorio commented Feb 26, 2020

🚀 Feature

Motivation

Pitch

vincentqb commented Mar 2, 2020 • edited Loading

tomassosorio commented Mar 2, 2020

vincentqb commented Mar 4, 2020

vincentqb commented May 5, 2020

tomassosorio commented May 5, 2020 • edited Loading

koukyo1994 commented May 15, 2020 • edited Loading

limazix commented May 15, 2020

Edresson commented Jun 27, 2020

vincentqb commented Sep 28, 2020

vincentqb commented Sep 28, 2020

Edresson commented Sep 28, 2020

discort commented Mar 18, 2021

mthrok commented Aug 3, 2021

vincentqb commented Mar 2, 2020 •

edited

Loading

tomassosorio commented May 5, 2020 •

edited

Loading

koukyo1994 commented May 15, 2020 •

edited

Loading