Replies: 3 comments
-
Could be combined with the existing audio detection label of speech to efficiently determine which clips should have speech to text applied. |
Beta Was this translation helpful? Give feedback.
-
This could also be helpful: https://github.com/Carleslc/AudioToText I would really like this feature! |
Beta Was this translation helpful? Give feedback.
-
Looks like this one is small, fast and available as ONNX models. I wonder if real time word triggers could be a thing as part of audio detection? https://www.reddit.com/r/LocalLLaMA/comments/1hh5y87/moonshine_web_realtime_inbrowser_speech/ |
Beta Was this translation helpful? Give feedback.
-
Using faster whisper or something similar, process complete video segments and ether add a caption track (probably easier to play back later) or caption file that can then be added to frigate search.
It might be possible to leverage existing projects to add this capability.
https://github.com/McCloudS/subgen
Beta Was this translation helpful? Give feedback.
All reactions