LLM-generated news detection benchmark 🔍 📰

The purpose of this benchmark is to evaluate LLM detectors, especially against evasion attacks. So far, the benchmark is based on the detection of short LLM-generated news articles, but it can be extended to cover different detection tasks.
The main consideration is to make the benchmark easy to extend with different datasets, detectors and evasion attacks.

| Documentation | Paper | Old repository |

Features

Generating an (adversarial) benchmark with a specific configuration, used for testing detectors.
Detectors and watermark detection benchmarking (adversarial + non-adversarial)
Modularity: possible to add new datasets, detectors, attacks and watermarking schemes without much effort

Table of content

Table of content
Getting started
Repo structure
- 0. Class folders
- 1. Config files
- ...

Getting started

Click here for the full doc.

Installation

Create a conda environment (highly recommended to avoid compatibility issues) and activate it

conda create -n "llm_detector" python=3.10.12 ipython
conda activate llm_detector

Install pytorch with a version compatible with your CUDA driver

For CUDA version 11.8 (check your version with nvidia-smi and see PyTorch’s website):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Clone and install the package

git clone [email protected]:marluxiaboss/benchmark_ai_news_detection.git
pip install -e .

Generate the benchmark

attack="generation_base"
watermark_scheme="watermark_base"

create_dataset generation=$attack watermark=$watermark_scheme

Test a detector on the created benchmark

detector="fast_detect_gpt"

test_detector detection=$detector

Repo structure

0. Class folders

detector_benchmark/dataset_loader: folder for the dataset_loader classes
detector_benchmark/detector: folder for the detector classes and detector loader
detector_benchmark/generation: folder for the bases generation + adversarial generation classes and also the generator loader
detector_benchmark/pipeline: folder for the pipeline classes (text generation, testing detection and evaluating generated text quality)
detector_benchmark/text_quality_evaluation: folder for text quality evaluator classes
detector_benchmark/watermark: folder for watermark classes (general and different watermark schemes)

1. Config files

Configuration files (hydra configuration) are located in:

detector_benchmark/conf/ for the detection, generation, pipeline and watermark configurations.

2. Python scripts

detector_benchmark/create_dataset.py: script to create a dataset using a dataset loader and a generation config
detector_benchmark/test_detector.py: script to test a detector on a dataset created using the script above
detector_benchmark/test_text_quality.py: script to run basic text quality evaluation on generated text (non-watermarked, watermarked or even human written)

3. Bash scripts

Bash scripts for running the different experiments can be found under bash_scripts. It contains the following subfolder corresponding to the different experiment types:

bash_scritps/create_envs for creating the two conda environments (see the environment installation part).
bash_scripts/big_gen_bench for running the evaluation from BiGGen-Bench with different watermarking schemes (one bash script per watermarking scheme).
bash_scripts/bigcode_eval for running the evaluation from bigcode-evaluation-harness on selected tasks with different watermarking schemes. There is one subfolder bash_scripts/bigcode_eval/generation for generating the text to be evaluated and bash_scripts/bigcode_eval/evaluation for running the evaluations. The latter will actually run the code generated by the LLM, it's therefore advised to launch it inside a sandbox.
bash_scripts/generating_datasets for generating the different datasets using the different watermarking schemes.
bash_scripts/lm_harness for running the evaluation from lm-evluation-harness on selected tasks with different watemarking schemes (one bash scripts for all the watermarking schemes).
bash_scripts/test_detectors for running the detection evaluation scritps with the detectors/watermark detectors on the different generated datasets. There is one subfolder bash_scripts/test_detectors/test_watermark_detectors for testing the watermark detectors on the respective text generated with watermark and bash_scripts/test_detectors/test_zero_shot_detectors for testing zero shot detectors.
bash_scripts/text_quality_pipeline for running basic text quality evaluation such as computing the perplexity. Currently only has the subfolder bash_scripts/text_quality_pipeline/ppl_scorer for computing the perplexity of the text generated by different watermarking schemes and also human text

4. Data folders

detector_benchmark/data/generated_datasets for the datasets adversarial + non-adversarial datasets generated with the LLMs with or without watermarking. The arborescence is as follows:

generated_datasets  
│
└───{source_dataset} (e.g. cnn_dailymail)
    │
    └───{adversarial_attack} (e.g. no_attack)
        │ 
        └───{watermarking_scheme} (e.g. no_watermark)
            │  
            └───log 
            │   log.txt (terminal logs)
            │
            │   
            └───{generator_name}_{experiment_name} (actual generated hugginface dataset)

Where source_dataset is the name of the dataset used to obtain the true human written samples and the prefixes for the fake AI-written samples, adversarial_attack and watermarking_scheme are the respective attack and watermark used to generate the datasets. experiment_name is the name used for the specific run of the generation.

7. Results folders

Here are the following folders used to save the results from the scripts/benchmarks:

detector_benchmark/detection_test_results containing the results when running the detection script detector_benchmark/test_detector.py.
detector_benchmark/text_quality_eval_results containing the results when running the text quality evaluation script detector_benchmark/test_text_quality.py
TODO: add info about the respective results folder for the external libraries

Name		Name	Last commit message	Last commit date
Latest commit History 394 Commits
.github/workflows		.github/workflows
bash_scripts		bash_scripts
detector_benchmark		detector_benchmark
docs		docs
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-generated news detection benchmark 🔍 📰

Features

Table of content

Getting started

Installation

Generate the benchmark

Test a detector on the created benchmark

Repo structure

0. Class folders

1. Config files

2. Python scripts

3. Bash scripts

4. Data folders

7. Results folders

Future work

About

Releases

Packages

Languages

Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection

Folders and files

Latest commit

History

Repository files navigation

LLM-generated news detection benchmark 🔍 📰

Features

Table of content

Getting started

Installation

Generate the benchmark

Test a detector on the created benchmark

Repo structure

0. Class folders

1. Config files

2. Python scripts

3. Bash scripts

4. Data folders

7. Results folders

Future work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages