-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Questions Why python and c++ time stamps are different? #533
Comments
Hi,
The next logical step would be to compare the raw probabilities output by the python code and c++ code. If they are the same - then it's post-processing. If not - it's onnx_runtime. You see, the c++ example is community contributed, we did not debug it. |
Also a standard suggestion, plot the probablities for both implementations side by side with an audio envelope and probably with some marker for the speech segments, that would help debug. |
oh i see. I thought your open c++ code is guranteed. |
All examples are community-generated.
Not yet. |
I have found that using the same input,for example all zeros,after reset_states() the onnx model output is different from pytorch model. |
jit and onnx have slightly different input formats, most likely this is the reason |
I use the same parameter,the vad result may have a lot of different. May have a few more pieces |
@smallsheep666 |
I have found why there is a huge probs different between c++ and pytroch. |
Looks like a c++ wrapper for a previous version of the model ( |
I compared output probs from torch(python), Onnx(python) and Onnxruntime(c++) 3 types. My test is below locate(silero-vad-master/src/silero_vad) from utils_vad import read_audio model = torch.jit.load('data/silero_vad.jit') audio = read_audio('/DB/SD/wespeaker/voxconverse_data/dev/audio/afjiv.wav') audio_length_samples = len(audio) speech_probs = [] 2. Onnx(python) from utils_vad import OnnxWrapper model = OnnxWrapper('data/silero_vad.onnx',force_onnx_cpu=True) audio_length_samples = len(audio) speech_probs = [] 3. Onnx(c++) probs 1,2 are same as first coulm and 3 is second coulm. 0.01201203465461731 0.0442627 To get right probs, I made c++ source code base on libtorch(torch script). |
This is a method from the ONNX wrapper, where states are reset manually.
The first inference does not require 100ms. It requires zero state, zero padding and the audio chunk itself. silero-vad/src/silero_vad/utils_vad.py Lines 63 to 80 in 46f94b7
|
oh sorry i missed your model.jit function model.run_method("reset_states"); |
In order to be consistent with python, I added these contents at the beginning of the predict function in C++: |
I debugged carefully and found that there are three detailed differences between the C++ code and the python code:
Based on the above three points, I designed two vectors:
If you understand the above points, you can modify the C++ code so that its detection results are consistent with those of Python. |
This is should be done for a In any case, a
This function resets states in two places - "inside" of the model (since we cannot do it directly in ONNX inside of the model, we drag this state along in the interface, silero-vad/src/silero_vad/utils_vad.py Lines 46 to 50 in 46f94b7
This function or its counterpart in the C++ code should be invoked:
|
Hi @snakers4 |
We used to have quantized models long time ago, bit there were many complaints that they did not run on some platforms. So we decided not to bother anymore since models are small. |
Thank you for your answer. |
Hi snakers4! Please check my PR as below. Thank you. |
❓ Questions and Help
Hi silero team!
When i try to use silero-vad using python, I felt it is good.
But if i use silero-vad using c++, i got quite different result between python and c++.
I prepared silero-vad 5.1(pip) and c++ build( silero-vad-master downloaded on 2024-08-26) respectively.
#Test samle file. Voxconverse data
[asr1@k-atc12 cpp]$ sox --i voxconverse_data/dev/audio/afjiv.wav
Input File : 'voxconverse_data/dev/audio/afjiv.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:02:31.25 = 2419968 samples ~ 11343.6 CDDA sectors
File Size : 4.84M
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
sha256sum ~/miniconda3/envs/wespeaker/lib/python3.9/site-packages/silero_vad/data/silero_vad.onnx
2623a2953f6ff3d2c1e61740c6cdb7168133479b267dfef114a4a3cc5bdd788f miniconda3/envs/wespeaker/lib/python3.9/site-packages/silero_vad/data/silero_vad.onnx
#in Python.
#in c++ (Built by silrero-vad souce. I downloaded 'silero-vad-master' on 2024-08-26)
changed some parameter in 'silero-vad-master/examples/cpp/silero-vad-onnx.cpp'
float Threshold = 0.5,
int min_silence_duration_ms = 100,
int speech_pad_ms = 30,
int min_speech_duration_ms = 250,
#They are referred from '~/miniconda3/envs/wespeaker/lib/python3.9/site-packages/silero_vad/utils_vad.py'
sha256sum "../../src/silero_vad/data/silero_vad.onnx"
2623a2953f6ff3d2c1e61740c6cdb7168133479b267dfef114a4a3cc5bdd788f
./test
[asr1@k-atc12 cpp]$ ./test
num_channel_ :1
sample_rate_ :16000
bits_per_sample_:16
num_samples :2419968
num_data_size :4839936
{start:00019456,end:00200192}
{start:00202752,end:00258048}
{start:00261120,end:00400384}
{start:00403456,end:00473600}
{start:00477184,end:00506880}
{start:00510976,end:00548864}
{start:00555520,end:00637952}
{start:00642560,end:00686592}
{start:00689152,end:00727552}
{start:00729600,end:00787456}
{start:00790016,end:00826880}
{start:00829952,end:00846848}
{start:00849920,end:00858112}
{start:00863232,end:01068032}
{start:01071616,end:01083904}
{start:01088000,end:01289216}
{start:01295360,end:01311744}
{start:01314816,end:01324032}
{start:01326592,end:01340928}
{start:01357824,end:01378816}
{start:01394688,end:01408512}
{start:01420288,end:01427968}
{start:01432576,end:01484800}
{start:01491456,end:01510912}
{start:01521152,end:01569280}
{start:01578496,end:01609216}
{start:01619456,end:01625088}
{start:01627648,end:01650176}
{start:01655296,end:01676288}
{start:01687040,end:01710080}
{start:01716224,end:01724928}
{start:01731072,end:01750528}
{start:01754112,end:01762304}
{start:01765888,end:01772544}
{start:01777664,end:01790976}
{start:01796608,end:01813504}
{start:01821184,end:01859072}
{start:01873408,end:01906176}
{start:01910272,end:01923072}
{start:01926144,end:01959936}
{start:01967616,end:01989120}
{start:02003968,end:02050048}
{start:02058752,end:02076160}
{start:02094592,end:02114048}
{start:02116608,end:02131968}
{start:02170880,end:02191872}
{start:02195456,end:02211840}
{start:02223104,end:02244096}
{start:02250240,end:02267648}
{start:02272256,end:02303488}
{start:02314752,end:02327552}
I check both of onnx model checksum code. They are same.
Any clues?
Thank you.
The text was updated successfully, but these errors were encountered: