❓ Questions / Help / Support #595

bhswallow · 2025-01-16T13:02:42Z

❓ Questions and Help

We have a wiki available for our users. Please make sure you have checked it out first.
I tried parsing an audio in vad and came up with a value of 0 for the last SpeechEndAt, which confused me. Sometimes the last audio is still around for a long time, but the value of the SpeechEndAt is still zero, making it impossible for me to figure out whether the final data is silent or speaking. The data are as follows:
[
{
"SpeechStartAt": 0.994,
"SpeechEndAt": 6.142
},
{
"SpeechStartAt": 6.242,
"SpeechEndAt": 0 //// this is 0 ?
}
]
The original sound file is as follows: http://cyber-milestone.oss-cn-shanghai.aliyuncs.com/uploads/2025/01/16/audio_tmp_a8e5341d230effa9fd7084d4f4aca3a5.wav

code:
detector, err := speech.NewDetector(speech.DetectorConfig{
ModelPath: "./sources/vad/silero/silero_vad_16k_op15.onnx",
SampleRate: 16000,
Threshold: 0.2,
MinSilenceDurationMs: 100,
SpeechPadMs: 30,
})
if err != nil {
return nil, nil, 0.0, fmt.Errorf("failed to create speech detector: %s", err)
}
defer func() {
_ = detector.Destroy()
}()

_, err = file.Seek(0, io.SeekStart)
if err != nil {
	return nil, nil, 0.0, fmt.Errorf("failed to seek file: %s", err)
}

dec := wav.NewDecoder(file)
if ok := dec.IsValidFile(); !ok {
	return nil, nil, 0.0, fmt.Errorf("invalid WAV file") 

buf, err := dec.FullPCMBuffer()
if err != nil {
	return nil, nil, 0.0, fmt.Errorf("failed to get PCM buffer: %s", err) 
}

pcmBuf := buf.AsFloat32Buffer()

speechSegments, err := detector.Detect(pcmBuf.Data)
if err != nil {
	return nil, nil, 0.0, fmt.Errorf("failed to detect speech segments 2: %s", err)
}

wsLogger.WithFields(logrus.Fields{
	"speechSegments": speechSegments,
}).Info("")


audioDuration := float64(len(pcmBuf.Data)) / 16000
silenceSegments := DetectSilence(speechSegments, audioDuration)

wsLogger.WithFields(logrus.Fields{
	"silenceSegments": silenceSegments,
}).Info("")

The text was updated successfully, but these errors were encountered:

snakers4 · 2025-01-16T15:25:20Z

For this particular audio, if I use the onnx-runtime version, I get the following:

code invocation

USE_PIP = False # download model using pip package or torch.hub
USE_ONNX = True # change this to True if you want to test onnx model
if USE_ONNX:
    !pip install -q onnxruntime
if USE_PIP:
  !pip install -q silero-vad
  from silero_vad import (load_silero_vad,
                          read_audio,
                          get_speech_timestamps,
                          save_audio,
                          VADIterator,
                          collect_chunks)
  model = load_silero_vad(onnx=USE_ONNX)
else:
  model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                                model='silero_vad',
                                force_reload=True,
                                onnx=USE_ONNX)

  (get_speech_timestamps,
  save_audio,
  read_audio,
  VADIterator,
  collect_chunks) = utils

wav = read_audio('audio_tmp_a8e5341d230effa9fd7084d4f4aca3a5.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE, visualize_probs=True)
pprint(speech_timestamps)

the result

[{'end': 38880, 'start': 15904},
 {'end': 74208, 'start': 40480},
 {'end': 91616, 'start': 76320},
 {'end': 161280, 'start': 99872}]

I suppose the problem is with the wrapper you are using

bhswallow added the help wanted Extra attention is needed label Jan 16, 2025

bhswallow assigned snakers4 Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ Questions / Help / Support #595

❓ Questions / Help / Support #595

bhswallow commented Jan 16, 2025 •

edited

Loading

snakers4 commented Jan 16, 2025

❓ Questions / Help / Support #595

❓ Questions / Help / Support #595

Comments

bhswallow commented Jan 16, 2025 • edited Loading

❓ Questions and Help

snakers4 commented Jan 16, 2025

bhswallow commented Jan 16, 2025 •

edited

Loading