Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Module Models #446

Closed
tomassosorio opened this issue Feb 26, 2020 · 13 comments
Closed

Adding Module Models #446

tomassosorio opened this issue Feb 26, 2020 · 13 comments

Comments

@tomassosorio
Copy link
Contributor

🚀 Feature

Similar to torchvision would be interesting to have the architecture of the most important models available, such as Wav2Letter, DeepSpeech, DeepSpeech2 between others...

Motivation

When trying to train a model it is already available most features needed such as Datasets, Data Transformations, however it is not available any architecture of a known model.

Pitch

Adding a module with the architecture of the most known models.

@vincentqb
Copy link
Contributor

vincentqb commented Mar 2, 2020

This is a good idea. Would you like to contribute some?

EDIT: We need a way of validating a model implementation before being able to add it in torchaudio:

@tomassosorio
Copy link
Contributor Author

Sure!

I will inspire into torchvision structure and add at first Wav2Letter.

At the time I cannot offer the model with the trained weights, but I can talk with my company to see if we could offer this to torchaudio in the future.

@vincentqb
Copy link
Contributor

Let's start with just the model class, untrained. Hosting trained models (and datasets) is a separate discussion :)

@vincentqb
Copy link
Contributor

@tomassosorio -- thanks again for merging wav2letter! Which other ones do you believe would be a good idea to request help to implement?

  • Wav2Letter
  • DeepSpeech
  • DeepSpeech2

@tomassosorio
Copy link
Contributor Author

tomassosorio commented May 5, 2020

Yes, I fell both of them would be good options as future implementations, such as the Wav2Letter++, in this link there is also some other models that could be interesting to implement :)

@koukyo1994
Copy link

koukyo1994 commented May 15, 2020

How about PANNs? It has a rich collection of pre-trained audio tagging / sed models.

@limazix
Copy link

limazix commented May 15, 2020

Hi guys, I'd like to help with this task. Where can I start?

This was referenced Jun 17, 2020
@Edresson
Copy link

@vincentqb I think it would be interesting to implement Jasper too, as it reported state of the art on LibreSpeech :).

It is available on OpenSeq2Seq

What do you think about converting the checkpoints of these models to pytorch and using them in the initialization of model weights?

@vincentqb
Copy link
Contributor

@vincentqb I think it would be interesting to implement Jasper too, as it reported state of the art on LibreSpeech :).

It is available on OpenSeq2Seq

Thanks for suggesting jasper here, and there :) Do ping me when you open a pull request for it :)

What do you think about converting the checkpoints of these models to pytorch and using them in the initialization of model weights?

We do plan on offering some pre-trained models for those we offer. Is that what you mean?

@vincentqb
Copy link
Contributor

@Edresson -- thanks also for suggestion wav2vec -- it is on our radar, and we would love to have example pipeline such as wav2letter for it :) we can talk more about this if this is of interest to you

@Edresson
Copy link

@vincentqb I think it would be interesting to implement Jasper too, as it reported state of the art on LibreSpeech :).
It is available on OpenSeq2Seq

Thanks for suggesting jasper here, and there :) Do ping me when you open a pull request for it :)

What do you think about converting the checkpoints of these models to pytorch and using them in the initialization of model weights?

We do plan on offering some pre-trained models for those we offer. Is that what you mean?

Yes, I thought about maybe converting the openseq2seq checkpoints. But training and providing a checkpoint may be a better option.

mthrok pushed a commit to mthrok/audio that referenced this issue Feb 26, 2021
* Fix CUDNN error in spatial_transformer_tutorial.py

* Better checking

* disable audio_classifier_tutorial.py

* try to fix spatial_transformer_tutorial.py

* fix path bug
discort added a commit to discort/audio that referenced this issue Mar 18, 2021
@discort
Copy link
Contributor

discort commented Mar 18, 2021

@vincentqb

I'd like to add vanilla deepspeech model

vincentqb pushed a commit to discort/audio that referenced this issue May 11, 2021
@mthrok
Copy link
Collaborator

mthrok commented Aug 3, 2021

Closing the issue as torchaudio now has models submodule.

For adding a new model, please propose it by opening a new issue.

@mthrok mthrok closed this as completed Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants