Skip to content

FakeSoundData/FakeSound

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FakeSound:Deepfake General Audio Detection

arXiv githubio Hugging Face data

Here we present our framework for Deepfake General Audio Detection, which aims to identify whether the audio is genuine or deepfake and to locate deepfake regions. Specifically, we:

  • Propose the task of deepfake general audio detection and established a benchmark for evaluation.
  • Design an audio manipulation pipeline to regenerate key regions, resulting in a large quantity of convincingly realistic deepfake general audio.
  • Provide a dataset, FakeSound, for training and evaluation for deepfake general audio detection task.
  • Propose a deepfake detection model which outperforms the state-of-the-art models in previous speech deepfake competitions and human beings.

Install dependencies

Install dependencies:

git clone https://github.com/FakeSoundData/FakeSound
conda install --yes --file requirements.txt

Install pre-trained models EAT into the model/ directory.

cd models
mkdir EAT
cd EAT
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
git clone https://github.com/cwx-worst-one/EAT

Data Preparation

Due to copyright issues, we are unable to provide the original AudioCaps audio data. You can download the raw audio from AudioCaps. The manipulated audio can be downloaded from (1) HuggingfaceDataset or (2) FakeSound, with the extraction code "fake".

We provide the results of the Grounding model for key region detection. You can also reproduce FakeSound dataset by regenerating key regions based on the results of the grounding, using audio generation models AudioLDM/AudioLDM2 and super resolution model AudioSR.
The metadata for the training and test sets is contained in the file "deepfake_data/{}.json", where

  • the "audio_id" format is {AudioCaps_id}{onset}{offset} or {AudioCaps_id},
  • the "label" is "0" for deepfake audio, with reconstructed regions indicated as "onset_offset".

Train & Inference

The training and testing codes are named train.py and inference.py, respectively. You need to modify the WORKSPACE_PATH inside them to match your own directory path.

  python train.py --train_file FakeSound/meta_data/train.json
  python inference.py 

Acknowledgement

Our code referred to the DKU speech deepfake detection, EAT, AudioLDM, and AudioLDM2. We appreciate their open-sourcing of their code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages