Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* Equal Contribution)
Model | Weight | Trained on |
---|---|---|
LoCoNet + LASER | Checkpoint | AVA-Activespeaker |
TalkNCE + LASER | Checkpoint | AVA-Activespeaker |
Start from building the environment
conda env create -f landmark_loconet_environmet.yml
conda activate landmark_loconet
We follow TalkNet's data preparation script to download and prepare the AVA dataset.
python train.py --dataPathAVA AVADataPath --download
Or equivalently, if you cannot run the above file, you can run
python downloadAVA.py
, but you need to manually modify the savePath in that file
AVADataPath
is the folder you want to save the AVA dataset and its preprocessing outputs, the details can be found in here . Please read them carefully.
After AVA dataset is downloaded, please change the DATA.dataPathAVA entry in the configs/multi.yaml file.
For Talkies and ASW, please refer to https://github.com/fuankarion/maas and https://github.com/clovaai/lookwhostalking respectively. Please make sure that the folder structure and csv file of the 2 datasets match AVA so that the all files can be run flawlessly.
Please modify the dataset path in shift_audio.py and then run the file to create the shifted audio version.
python -W ignore::UserWarning train.py --cfg configs/multi.yaml OUTPUT_DIR <output directory>
Please modify hyperparameters in configs/multi.yaml accordingly before training
python -W ignore::UserWarning train_landmark_loconet.py --cfg configs/multi.yaml OUTPUT_DIR <output directory>
Please download mediapipe throught pip and modify the dataPath before running the following code:
python create_landmark.py
Please modify the path to model's weight, model's hyperparameters, and dataPath in configs/multi.yaml before evaluating:
For our model, please run:
python test_mulicard_landmark.py --cfg configs/multi.yaml
For LoCoNet, please run:
python test_mulicard.py --cfg configs/multi.yaml
For our model, please run:
python test_landmark_loconet.py --cfg configs/multi.yaml
For LoCoNet, please run:
python test.py --cfg configs/multi.yaml
After running the evaluation file, please run
python utils/get_ava_active_speaker_performance_no_map.py -g <path to modified groundtruth csv file> -p <path to our result csv file>
because the get_ava_active_speaker_performance.py only calculates the mAP based on the number of positive examples which is not possible in our shifted dataset.
Based on TalkNet's codebase, we have created demo codes for both LoCoNet and LASER. The codes are stored in demoLoCoNet.py and demoLoCoNet_landmark.py
Here are some instructions in using the code:
-
Create a demo folder and put the video you wanted in there.
-
in the demo python file, we support 3 features:
- normal, synced audio
- shifted audio by some amount of seconds
- swap the first half and second half of the audio
-
Then, we support using different audio and video. You just need to specify the audio from a video you want to use:
parser.add_argument('--videoName', type=str, default="demo_6_our", help='Demo video name') parser.add_argument('--audioName', type=str, default="demo_6_our", help='Demo video name')
-
Then, on line 515 and 516, if you want to shift by some second, change the shift_ms (in 1s = 1000ms). Or if you want to swap the audio, change swap = True. Otherwise, setting to 0 and false will do the normal demo.
Please cite the following if our paper or code is helpful to your research.
TODO: insert later
The code base of this project is studied from TalkNet and LoCoNet which is a very easy-to-use ASD pipeline.
The contrative loss function is obtained through communication with the author of TalkNCE.