-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
声音忽大忽小是什么原因 #45
Comments
声音最小的时候几乎听不清 |
大神声音忽大忽小的问题能解决吗? |
Please just amplify the volume or use some post processing normalizing technique |
我需要接到大模型上使用,语音声音忽大忽小,体验不是很好,希望能改进 |
看日志应该是拆成一句话一句话转的,这样不可避免每句话的音量无法对齐,看 @Zengyi-Qin 的回复是做后处理如标准化等 |
@Zengyi-Qin,是的,如果是这样在某些应用场景下,就失去使用价值了
|
WARNING: Input file had loudness range of 10.1. This is larger than the loudness range target (7.0). Normalization will revert to dynamic mode. Well, normalization does not solve the issue. The dynamic range remains too wide, with the volume fluctuating randomly between loud and soft. |
直接处理肯定是不行的,整段音频音量会同时增大或减少,在听感上跟输出的音频没啥区别,要在分段输出那里进行处理。 |
pip install pyloudnorm 加载音频文件data, rate = sf.read(r"D:\Downloads\output_v2_zh.wav") 峰值归一化至 -1 dBpeak_normalized_audio = pyln.normalize.peak(data, -1.0) 测量响度meter = pyln.Meter(rate) 响度归一化至 -12 dB LUFSloudness_normalized_audio = pyln.normalize.loudness(data, loudness, -12.0) sf.write("./normalized_audio.wav", loudness_normalized_audio, rate) |
中文生成的声音忽大忽小是什么原因,特别是长文本的时候
data:image/s3,"s3://crabby-images/cecb4/cecb49aa5650d0d6d4b43adb9af6f75b05d75e74" alt="微信图片_20240302180653"
The text was updated successfully, but these errors were encountered: