Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - Ukrainian model #30

Closed
egorsmkv opened this issue Nov 5, 2020 · 6 comments
Closed

Feature request - Ukrainian model #30

egorsmkv opened this issue Nov 5, 2020 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@egorsmkv
Copy link

egorsmkv commented Nov 5, 2020

🚀 Feature

We would like to have a Ukrainian model for the task of Speech-to-Text.

Motivation

Ukraine has a large population and in the country and there are tons of tasks related to Speech-to-Text.

Additional context

Our group that is based in Telegram ( https://t.me/speech_recognition_uk ) collected a dataset of Ukrainian public speeches/interviews in audio and text formats accessed here: https://mega.nz/folder/T34DQSCL#Q1O8vcrX_8Qnp27Ge56_4A/folder/O3hzlKIJ

We think this dataset will be helpful in the training process.

@egorsmkv egorsmkv added the enhancement New feature or request label Nov 5, 2020
@egorsmkv
Copy link
Author

egorsmkv commented Nov 5, 2020

Also we have own repository where we’re collecting links to datasets: https://github.com/egorsmkv/speech-recognition-uk

@snakers4
Copy link
Owner

snakers4 commented Nov 5, 2020

Hi,

This is exactly the effort I would expect from the community for low-resource languages
For now - I will just fit a model on your data as is and share the model via silero-models
Then when V3 compact models arrive for all languages, I will consider tuning a Russian model on your corpus

https://github.com/egorsmkv/speech-recognition-uk

Just a few ideas on how to make your repo better:

  • Add some table with overall statistics
  • Add some commands (maybe some cli to download your files)
  • Direct links are always nice

@egorsmkv
Copy link
Author

egorsmkv commented Nov 5, 2020

Thanks for your suggestions!

@snakers4 snakers4 mentioned this issue Nov 6, 2020
@snakers4
Copy link
Owner

snakers4 commented Nov 6, 2020

Please see
#20 (comment)

@snakers4
Copy link
Owner

snakers4 commented Nov 7, 2020

I updated the models and purged the CDN cache.

@snakers4
Copy link
Owner

snakers4 commented Nov 7, 2020

It is unlikely that much will change soon, so if everything works let's close the ticket.

@egorsmkv egorsmkv closed this as completed Nov 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants