Here we present our framework for Deepfake General Audio Detection, which aims to identify whether the audio is genuine or deepfake and to locate deepfake regions. Specifically, we:
- Propose the task of deepfake general audio detection and established a benchmark for evaluation.
- Design an audio manipulation pipeline to regenerate key regions, resulting in a large quantity of convincingly realistic deepfake general audio.
- Provide a dataset, FakeSound, for training and evaluation for deepfake general audio detection task.
- Propose a deepfake detection model which outperforms the state-of-the-art models in previous speech deepfake competitions and human beings.
Install dependencies:
git clone https://github.com/FakeSoundData/FakeSound
conda install --yes --file requirements.txt
Install pre-trained models EAT into the model/ directory.
cd models
mkdir EAT
cd EAT
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
git clone https://github.com/cwx-worst-one/EAT
Due to copyright issues, we are unable to provide the original AudioCaps audio data. You can download the raw audio from AudioCaps. The manipulated audio can be downloaded from (1) HuggingfaceDataset or (2) FakeSound, with the extraction code "fake".
We provide the results of the Grounding model for key region detection.
You can also reproduce FakeSound dataset by regenerating key regions based on the results of the grounding, using audio generation models AudioLDM/AudioLDM2 and super resolution model AudioSR.
The metadata for the training and test sets is contained in the file "deepfake_data/{}.json", where
- the "audio_id" format is {AudioCaps_id}{onset}{offset} or {AudioCaps_id},
- the "label" is "0" for deepfake audio, with reconstructed regions indicated as "onset_offset".
The training and testing codes are named train.py and inference.py, respectively. You need to modify the WORKSPACE_PATH inside them to match your own directory path.
python train.py --train_file FakeSound/meta_data/train.json
python inference.py
Our code referred to the DKU speech deepfake detection, EAT, AudioLDM, and AudioLDM2. We appreciate their open-sourcing of their code.