Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❓ Questions / Help / Support #595

Open
bhswallow opened this issue Jan 16, 2025 · 1 comment
Open

❓ Questions / Help / Support #595

bhswallow opened this issue Jan 16, 2025 · 1 comment
Assignees
Labels
help wanted Extra attention is needed

Comments

@bhswallow
Copy link

bhswallow commented Jan 16, 2025

❓ Questions and Help

We have a wiki available for our users. Please make sure you have checked it out first.
I tried parsing an audio in vad and came up with a value of 0 for the last SpeechEndAt, which confused me. Sometimes the last audio is still around for a long time, but the value of the SpeechEndAt is still zero, making it impossible for me to figure out whether the final data is silent or speaking. The data are as follows:
[
{
"SpeechStartAt": 0.994,
"SpeechEndAt": 6.142
},
{
"SpeechStartAt": 6.242,
"SpeechEndAt": 0 //// this is 0 ?
}
]
The original sound file is as follows: http://cyber-milestone.oss-cn-shanghai.aliyuncs.com/uploads/2025/01/16/audio_tmp_a8e5341d230effa9fd7084d4f4aca3a5.wav

code:
detector, err := speech.NewDetector(speech.DetectorConfig{
ModelPath: "./sources/vad/silero/silero_vad_16k_op15.onnx",
SampleRate: 16000,
Threshold: 0.2,
MinSilenceDurationMs: 100,
SpeechPadMs: 30,
})
if err != nil {
return nil, nil, 0.0, fmt.Errorf("failed to create speech detector: %s", err)
}
defer func() {
_ = detector.Destroy()
}()

_, err = file.Seek(0, io.SeekStart)
if err != nil {
	return nil, nil, 0.0, fmt.Errorf("failed to seek file: %s", err)
}

dec := wav.NewDecoder(file)
if ok := dec.IsValidFile(); !ok {
	return nil, nil, 0.0, fmt.Errorf("invalid WAV file") 

buf, err := dec.FullPCMBuffer()
if err != nil {
	return nil, nil, 0.0, fmt.Errorf("failed to get PCM buffer: %s", err) 
}

pcmBuf := buf.AsFloat32Buffer()

speechSegments, err := detector.Detect(pcmBuf.Data)
if err != nil {
	return nil, nil, 0.0, fmt.Errorf("failed to detect speech segments 2: %s", err)
}

wsLogger.WithFields(logrus.Fields{
	"speechSegments": speechSegments,
}).Info("")


audioDuration := float64(len(pcmBuf.Data)) / 16000
silenceSegments := DetectSilence(speechSegments, audioDuration)

wsLogger.WithFields(logrus.Fields{
	"silenceSegments": silenceSegments,
}).Info("")
@bhswallow bhswallow added the help wanted Extra attention is needed label Jan 16, 2025
@snakers4
Copy link
Owner

For this particular audio, if I use the onnx-runtime version, I get the following:

code invocation
USE_PIP = False # download model using pip package or torch.hub
USE_ONNX = True # change this to True if you want to test onnx model
if USE_ONNX:
    !pip install -q onnxruntime
if USE_PIP:
  !pip install -q silero-vad
  from silero_vad import (load_silero_vad,
                          read_audio,
                          get_speech_timestamps,
                          save_audio,
                          VADIterator,
                          collect_chunks)
  model = load_silero_vad(onnx=USE_ONNX)
else:
  model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                                model='silero_vad',
                                force_reload=True,
                                onnx=USE_ONNX)

  (get_speech_timestamps,
  save_audio,
  read_audio,
  VADIterator,
  collect_chunks) = utils

wav = read_audio('audio_tmp_a8e5341d230effa9fd7084d4f4aca3a5.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE, visualize_probs=True)
pprint(speech_timestamps)
the result
[{'end': 38880, 'start': 15904},
 {'end': 74208, 'start': 40480},
 {'end': 91616, 'start': 76320},
 {'end': 161280, 'start': 99872}]

Image

I suppose the problem is with the wrapper you are using

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants