Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

声音忽大忽小是什么原因 #45

Open
skyliwq opened this issue Mar 2, 2024 · 9 comments
Open

声音忽大忽小是什么原因 #45

skyliwq opened this issue Mar 2, 2024 · 9 comments

Comments

@skyliwq
Copy link

skyliwq commented Mar 2, 2024

中文生成的声音忽大忽小是什么原因,特别是长文本的时候
微信图片_20240302180653

@skyliwq
Copy link
Author

skyliwq commented Mar 2, 2024

声音最小的时候几乎听不清

@skyliwq
Copy link
Author

skyliwq commented Mar 6, 2024

大神声音忽大忽小的问题能解决吗?

@Zengyi-Qin
Copy link
Contributor

Please just amplify the volume or use some post processing normalizing technique

@skyliwq
Copy link
Author

skyliwq commented Mar 7, 2024

Please just amplify the volume or use some post processing normalizing technique

我需要接到大模型上使用,语音声音忽大忽小,体验不是很好,希望能改进

@luobotaxinghu
Copy link

看日志应该是拆成一句话一句话转的,这样不可避免每句话的音量无法对齐,看 @Zengyi-Qin 的回复是做后处理如标准化等
这个工作其实应该项目内部处理,而不是交给用户,用户拿到整段音频是不能做处理的,需要改中间实现的代码分句音频处理

@skyliwq
Copy link
Author

skyliwq commented Mar 8, 2024

@Zengyi-Qin,是的,如果是这样在某些应用场景下,就失去使用价值了

看日志应该是拆成一句话一句话转的,这样不可避免每句话的音量无法对齐,看 @Zengyi-Qin 的回复是做后处理如标准化等 这个工作其实应该项目内部处理,而不是交给用户,用户拿到整段音频是不能做处理的,需要改中间实现的代码分句音频处理

@MissingTwins
Copy link

MissingTwins commented Mar 8, 2024

pip install ffmpeg-normalize
ffmpeg-normalize input.wav -c:a libopus -b:a 128k -o output.oga -f

WARNING: Input file had loudness range of 10.1. This is larger than the loudness range target (7.0). Normalization will revert to dynamic mode.

Well, normalization does not solve the issue. The dynamic range remains too wide, with the volume fluctuating randomly between loud and soft.

@andyweiqiu
Copy link

pip install ffmpeg-normalize ffmpeg-normalize input.wav -c:a libopus -b:a 128k -o output.oga -f

WARNING: Input file had loudness range of 10.1. This is larger than the loudness range target (7.0). Normalization will revert to dynamic mode.

Well, normalization does not solve the issue. The dynamic range remains too wide, with the volume fluctuating randomly between loud and soft.

直接处理肯定是不行的,整段音频音量会同时增大或减少,在听感上跟输出的音频没啥区别,要在分段输出那里进行处理。

@v3ucn
Copy link

v3ucn commented May 3, 2024

pip install pyloudnorm

加载音频文件

data, rate = sf.read(r"D:\Downloads\output_v2_zh.wav")

峰值归一化至 -1 dB

peak_normalized_audio = pyln.normalize.peak(data, -1.0)

测量响度

meter = pyln.Meter(rate)
loudness = meter.integrated_loudness(data)

响度归一化至 -12 dB LUFS

loudness_normalized_audio = pyln.normalize.loudness(data, loudness, -12.0)

sf.write("./normalized_audio.wav", loudness_normalized_audio, rate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants