-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Module Models #446
Comments
This is a good idea. Would you like to contribute some? EDIT: We need a way of validating a model implementation before being able to add it in torchaudio:
|
Sure! I will inspire into torchvision structure and add at first Wav2Letter. At the time I cannot offer the model with the trained weights, but I can talk with my company to see if we could offer this to torchaudio in the future. |
Let's start with just the model class, untrained. Hosting trained models (and datasets) is a separate discussion :) |
@tomassosorio -- thanks again for merging wav2letter! Which other ones do you believe would be a good idea to request help to implement?
|
Yes, I fell both of them would be good options as future implementations, such as the Wav2Letter++, in this link there is also some other models that could be interesting to implement :) |
How about PANNs? It has a rich collection of pre-trained audio tagging / sed models. |
Hi guys, I'd like to help with this task. Where can I start? |
@vincentqb I think it would be interesting to implement Jasper too, as it reported state of the art on LibreSpeech :). It is available on OpenSeq2Seq What do you think about converting the checkpoints of these models to pytorch and using them in the initialization of model weights? |
Thanks for suggesting jasper here, and there :) Do ping me when you open a pull request for it :)
We do plan on offering some pre-trained models for those we offer. Is that what you mean? |
@Edresson -- thanks also for suggestion wav2vec -- it is on our radar, and we would love to have example pipeline such as wav2letter for it :) we can talk more about this if this is of interest to you |
Yes, I thought about maybe converting the openseq2seq checkpoints. But training and providing a checkpoint may be a better option. |
* Fix CUDNN error in spatial_transformer_tutorial.py * Better checking * disable audio_classifier_tutorial.py * try to fix spatial_transformer_tutorial.py * fix path bug
I'd like to add vanilla deepspeech model |
Closing the issue as torchaudio now has models submodule. For adding a new model, please propose it by opening a new issue. |
🚀 Feature
Similar to torchvision would be interesting to have the architecture of the most important models available, such as Wav2Letter, DeepSpeech, DeepSpeech2 between others...
Motivation
When trying to train a model it is already available most features needed such as Datasets, Data Transformations, however it is not available any architecture of a known model.
Pitch
Adding a module with the architecture of the most known models.
The text was updated successfully, but these errors were encountered: