Skip to content

gowtham07/BackchannelDetectionMPIVideoData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Backchannel Detection:

Backchanneling during a conversation occurs when one participant is speaking and another participant gives a response to the speaker. It can be verbal cues (“uh-huh”, “hmm”), visual cues(“nodding”, “facial expressions”) or both, these cues are input modalities used for backchannel detection. The backchannel has the important function of encour- aging current speaker to hold their turn and con- tinue to speak, which enables smooth conversa- tion. Predicting a backchannel can be beneficial for building human like conversational agents or robots

Refer to the base report for full understanding of our experiment : https://drive.google.com/file/d/1XLZRns5FUpb33731iPfz5u1UrFK_jzQ2/view?usp=sharing

The above python notebook and report serves as the base for further experiments. We did further experiments on different fusion techniques in transformers for the same task.

We perform the below fusion techniques:

  • one-stream
  • one-to-one stream
  • One-to-Two stream
  • Two-to-One stream
  • Cross Attention
  • Cross-to-one Stream

After extensive evaluation we find out that simple one-stream serves the purpose for backchannel detection

CheckOut leaderboard for this comeptiton. Leaderboard

This project backchannel detection along with other project backchannel estimation is submitted to IJCNN'23 and is accepted. Official Paper

Refer to our official code where we experiment with different fusion tchniques menioned in the published paper. CodeRepo

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published