-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output Inconstancy of Feature set #46
Comments
Yes, you are right. Here a minimal example how to reproduce (even without import audiofile
import numpy as np
import opensmile
np.random.seed(0)
smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.ComParE_2016 ,
feature_level=opensmile.FeatureLevel.Functionals,
verbose=True
)
sampling_rate = 16000
signal = np.random.normal(size=(1, sampling_rate))
audiofile.write('test.wav', signal, sampling_rate)
f1 = smile.process_file('test.wav')
f2 = smile.process_signal(signal, sampling_rate) and then >>> f1['audspec_lengthL1norm_sma_maxPos']
file start end
test.wav 0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
>>> f2['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.0
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32 |
Hi, does someone have an answer for that? I would like to use from_signal as it is faster than from_file |
Hi, I also encountered the above issue! My workaround is:
# load the wav data and convert to 32b float
arr, fs = torchaudio.load(WAV_PATH, normalize=true)
arr = arr.numpy().ravel() I also remarked significant differences in feature values when resampling the signal! e.g., In the visualization below I used the raw legend:
it seems that the GeMAPs F0semitone is more robustly extracted in the 16KhZ variant? (less 60 peaks) Is this behavior normal? |
If you are using the Python version, then the WAV parsing of opensmile is not used as the file is read with The code that reproduces the error here at #46 (comment) returns different results as I did not normalize the magnitude of the audio. import audiofile
import numpy as np
import opensmile
np.random.seed(0)
smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.ComParE_2016 ,
feature_level=opensmile.FeatureLevel.Functionals,
verbose=True
)
sampling_rate = 16000
signal = np.random.normal(size=(1, sampling_rate))
signal = signal / (np.max(np.max(np.abs(signal))) + 10 ** -9)
audiofile.write('test.wav', signal, sampling_rate)
f1 = smile.process_file('test.wav')
f2 = smile.process_signal(signal, sampling_rate) and then >>> f1['audspec_lengthL1norm_sma_maxPos']
file start end
test.wav 0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
>>> f2['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32 |
The problem with >>> import librosa
>>> signal, sampling_rate = librosa.load('test.wav')
>>> sampling_rate
22050 If I then execute opensmile, I get a different result: >>> f3 = smile.process_signal(signal, sampling_rate)
>>> f3['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.434783
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32 To avoid this you have to tell >>> signal, sampling_rate = librosa.load('test.wav', sr=None)
>>> sampling_rate
16000 If you then use opensmile, you get the desired result: >>> f4 = smile.process_signal(signal, sampling_rate)
>>> f4['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32 |
For reference, I'm using the |
Hi, I am using the python library to interact with opensmile.
When I use smile.process_file or smile.process_signal with a file opened through librosa, the outputted features are different. I don't understand how this is possible as it is the same file, so the extraction should be the same if I pass a file or a signal. I am using librosa because the signal is also used somewhere else in my code.
What do you advise ?
The text was updated successfully, but these errors were encountered: