Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ffmpeg to send input to opensmile to get features? #35

Open
aniketzz opened this issue Dec 9, 2021 · 9 comments
Open

Use ffmpeg to send input to opensmile to get features? #35

aniketzz opened this issue Dec 9, 2021 · 9 comments

Comments

@aniketzz
Copy link

aniketzz commented Dec 9, 2021

I want to use FFMEPG to send input to the opensmile and generate the features from egemaps, prosody or mfcc.
I am able to modify the config files to get the live input but now I want to take the input from a video source and extract audio via ffmpeg and send it to opensmile.

@chausner-audeering
Copy link
Contributor

There is a cFFmpegSource component but it only supports input from a file. If you want to use FFmpeg for live audio recording, you will need to do the recording outside of openSMILE and pass the data via SMILEapi and cExternalAudioSource to openSMILE. For more information, see https://audeering.github.io/opensmile/reference.html#smileapi-c-api-and-wrappers.

@aniketzz
Copy link
Author

aniketzz commented Dec 9, 2021

Can you please elaborate? I am getting some trouble understanding where and what to change.
For example: when I looked at SMILEapi, I did not understand where the input was coming from.
How do I call cExternalAudioSource? For using local device microphone I am using the below code in config:

[waveIn:cPortaudioSource]
writer.dmLevel=wave
monoMixdown = 0
 ; -1 is the default device, set listDevices=1 to see a device list
device = -1
listDevices = 0
sampleRate = 16000
 ; if your soundcard only supports stereo (2-channel) recording, 
 ; use channels=2 and set monoMixdown=1
channels = 1
nBits = 16
audioBuffersize_sec = 0.050000
buffersize_sec=2.0

@chausner-audeering
Copy link
Contributor

Documentation on SMILEapi is unfortunately rather sparse. Basically, it boils down to:

  • Replacing cPortaudioSource in the config with cExternalAudioInput
  • Using SMILEapi to load and run the config file
  • Passing audio data via SMILEapi to the cExternalAudioInput component

SMILEapi is a C API for maximum compatibility with other languages. openSMILE includes a Python wrapper which is recommended if you are working in Python.

You might also want to take a look at the implementation of https://github.com/audeering/opensmile-python which under the hood uses SMILEapi via the Python wrapper.

@aniketzz
Copy link
Author

aniketzz commented Dec 10, 2021

Is there any way to get the data per frameTime in realtime for prosody, mfcc and egemaps in opensmile?
I am able to configure the API to generate the features for prosody, mfcc and egemaps.
The current input is a file. How do I get the features in realtime using the API? currently, it generated the data as a series in one go.

Also, What will be the way to use ffmpeg with the api? I see that I have to pass the data(audio file) generated by ffmpeg or can I stream data via ffmpeg and pass it.

@chausner-audeering
Copy link
Contributor

When using SMILEapi in combination with eExternalSink, you will get the features in real-time as soon as they are generated.

Also, What will be the way to use ffmpeg with the api?

You can stream audio in real-time from FFmpeg to openSMILE. You'll need to set up the audio recording with FFmpeg, and then pass each individual buffer of audio received from FFmpeg to openSMILE via the SMILEapi function smile_extaudiosource_write_data.

@aniketzz
Copy link
Author

What will be the way to use FFmpeg with the python API?
How do I get the features in real-time using the python API?
I have changed the config to:

[waveIn:cFFmpegSource]
writer.dmLevel = wave
blocksize_sec = 1.0
filename = \cm[inputfile(I){test.wav}:name of input file]
monoMixdown = 1.0
outFieldName = pcm
However, it takes input from a file but I want to take input from a port.
For example, I'll be sending an audio file through 8000 port and I want to pass this input to the open smile python API

@chausner-audeering
Copy link
Contributor

cFFmpegSource only supports input from files. If you need to receive an audio stream via the network and you want to decode it using FFmpeg, I suggest to ask in the FFmpeg forums or maybe StackOverflow for help. I can help you with passing the audio via the SMILEapi interface to openSMILE.

To get started with SMILEapi, see the API definition and comments in https://github.com/audeering/opensmile/blob/master/progsrc/smileapi/python/opensmile/SMILEapi.py. See also the help in the openSMILE documentation on components cExternalAudioSource and cExternalSink.

@aniketzz
Copy link
Author

We have ffmpeg command ready to decode the audio which is coming from the UDP port, but How do we integrate the command into the opensmile python API?

@aniketzz
Copy link
Author

We have ffmpeg command ready to decode the audio which is coming from the UDP port, but How do we integrate the command into the opensmile python API?

can anyone help me with the above query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants