We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After running the model for ASR recognition, some content is often missing audio link:(https://share-github.tos-cn-beijing.volces.com/test.mp3)
import whisperx from faster_whisper import WhisperModel mp3_audio = whisperx.load_audio('test.mp3') prompt = ' 新闻今日谈 林秀芹 李炜 时事评论员 ' language = 'zh' asr_model = WhisperModel("large-v2", device='cuda', compute_type='float16') segments, info = asr_model.transcribe(mp3_audio, beam_size=5, vad_filter=True, language=language, initial_prompt=prompt, hotwords=prompt, ) tmp_segments = [] for segment in segments: simplified_text = segment.text if hasattr(segment, 'words') and segment.words: tmp_segments.append( {"start": add_time + segment.start, "end": add_time + segment.end, "text": simplified_text, "words": segment.words}) else: tmp_segments.append( {"start": add_time + segment.start, "end": add_time + segment.end, "text": simplified_text}) # , "words": segment.words asr_result = {'segments': tmp_segments, 'language': language}
current output:
{ 'language': 'zh', 'segments': [ {'end': 21.89, 'start': 17.49, 'text': '我是林秀芹 首先联合话题关注的是中德关系的新的进展'}, ...... {'end': 755.53, 'start': 748.93, 'text': '当然 谢谢李伟先生带来的分析 我们先休息下来 但关注的是世界经济论坛非洲峰会的相关话题 稍后再见'}, {'end': 787.29, 'start': 781.09, 'text': '谈非洲峰会呢 六号在南非闭幕 这一次的非洲峰会可以说是吸引全世界一个关注目光'}, ... ... ]}
correct output:
{ 'language': 'zh', 'segments': [ {'end': 17.49, 'start': 14.8, 'text': '大家好 欢迎收看今天的 新闻今日谈'}, # lost content {'end': 21.89, 'start': 17.49, 'text': '我是林秀芹 首先联合话题关注的是中德关系的新的进展'}, ...... {'end': 755.53, 'start': 748.93, 'text': '当然 谢谢李伟先生带来的分析 我们先休息下来 但关注的是世界经济论坛非洲峰会的相关话题 稍后再见'}, {'end': 781, 'start': 778, 'text': '欢迎回来 世界经济论坛'}, # lost content {'end': 787.29, 'start': 781.09, 'text': '非洲峰会呢 六号在南非闭幕 这一次的非洲峰会可以说是吸引全世界一个关注目光'}, ... ... ]}
env:
faster-whisper 1.1.0
How to adjust parameters or modify code to ensure normal output help plz.
The text was updated successfully, but these errors were encountered:
Check if VAD didn't cut off those missing segments.
Sorry, something went wrong.
how to check vad? sorry,im beginner
Does this prove that the time lost by audio was discarded by VAD? How should I optimize @Purfview
the audio is probably too quiet for correct speaker recognition
-> VAD result: [00:17.424 -> 12:36.176], [13:02.288 -> 25:36.400]
-> VAD result [00:14.864 -> 12:36.144], [13:01.264 -> 25:35.088]
No branches or pull requests
After running the model for ASR recognition, some content is often missing
audio link:(https://share-github.tos-cn-beijing.volces.com/test.mp3)
current output:
correct output:
env:
How to adjust parameters or modify code to ensure normal output
help plz.
The text was updated successfully, but these errors were encountered: