-
Hello, dear developer. First of all, I would like to thank you for everything you do for the community. It is very cool! I am wondering if it is possible to implement the function of classifying multilingual streams without transcribing them? In order to save time on processing the entire incoming volume of data and work only with those languages that are a priority? I imagine it this way: multilingual streams are received at the input (implemented by a script with sequential initialization of whisper for each file in the directory). Each time Whisper is initialized with the -classification function, we get a file passport in the format at the *.json, which contains the full path to the file, the identified language and the probability of this, while the program does not transcribe itself. This will reduce a huge amount of time when working with multilingual streams, since most often you have to manually select the language of interest from the general data set. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello! Thank you so much for your kind words.
I guess you are talking about the language detection, it's possible. How much do you offer for this implementation? |
Beta Was this translation helpful? Give feedback.
Hello! Thank you so much for your kind words.
I guess you are talking about the language detection, it's possible. How much do you offer for this implementation?
You can contact me at: [email protected]