Skip to content

Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection

 
 

Repository files navigation

LLM-generated news detection benchmark 🔍 📰

The purpose of this benchmark is to evaluate LLM detectors, especially against evasion attacks. So far, the benchmark is based on the detection of short LLM-generated news articles, but it can be extended to cover different detection tasks.
The main consideration is to make the benchmark easy to extend with different datasets, detectors and evasion attacks.

| Documentation | Paper | Old repository |

Features

  • Generating an (adversarial) benchmark with a specific configuration, used for testing detectors.
  • Detectors and watermark detection benchmarking (adversarial + non-adversarial)
  • Modularity: possible to add new datasets, detectors, attacks and watermarking schemes without much effort

Table of content

Getting started

Click here for the full doc.

Installation

  1. Create a conda environment (highly recommended to avoid compatibility issues) and activate it
conda create -n "llm_detector" python=3.10.12 ipython
conda activate llm_detector
  1. Install pytorch with a version compatible with your CUDA driver

For CUDA version 11.8 (check your version with nvidia-smi and see PyTorch’s website):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  1. Clone and install the package
git clone [email protected]:marluxiaboss/benchmark_ai_news_detection.git
pip install -e .

Generate the benchmark

attack="generation_base"
watermark_scheme="watermark_base"

create_dataset generation=$attack watermark=$watermark_scheme

Test a detector on the created benchmark

detector="fast_detect_gpt"

test_detector detection=$detector 

Repo structure

0. Class folders

  • detector_benchmark/dataset_loader: folder for the dataset_loader classes
  • detector_benchmark/detector: folder for the detector classes and detector loader
  • detector_benchmark/generation: folder for the bases generation + adversarial generation classes and also the generator loader
  • detector_benchmark/pipeline: folder for the pipeline classes (text generation, testing detection and evaluating generated text quality)
  • detector_benchmark/text_quality_evaluation: folder for text quality evaluator classes
  • detector_benchmark/watermark: folder for watermark classes (general and different watermark schemes)

1. Config files

Configuration files (hydra configuration) are located in:

  • detector_benchmark/conf/ for the detection, generation, pipeline and watermark configurations.

2. Python scripts

  • detector_benchmark/create_dataset.py: script to create a dataset using a dataset loader and a generation config
  • detector_benchmark/test_detector.py: script to test a detector on a dataset created using the script above
  • detector_benchmark/test_text_quality.py: script to run basic text quality evaluation on generated text (non-watermarked, watermarked or even human written)

3. Bash scripts

Bash scripts for running the different experiments can be found under bash_scripts. It contains the following subfolder corresponding to the different experiment types:

  • bash_scritps/create_envs for creating the two conda environments (see the environment installation part).
  • bash_scripts/big_gen_bench for running the evaluation from BiGGen-Bench with different watermarking schemes (one bash script per watermarking scheme).
  • bash_scripts/bigcode_eval for running the evaluation from bigcode-evaluation-harness on selected tasks with different watermarking schemes. There is one subfolder bash_scripts/bigcode_eval/generation for generating the text to be evaluated and bash_scripts/bigcode_eval/evaluation for running the evaluations. The latter will actually run the code generated by the LLM, it's therefore advised to launch it inside a sandbox.
  • bash_scripts/generating_datasets for generating the different datasets using the different watermarking schemes.
  • bash_scripts/lm_harness for running the evaluation from lm-evluation-harness on selected tasks with different watemarking schemes (one bash scripts for all the watermarking schemes).
  • bash_scripts/test_detectors for running the detection evaluation scritps with the detectors/watermark detectors on the different generated datasets. There is one subfolder bash_scripts/test_detectors/test_watermark_detectors for testing the watermark detectors on the respective text generated with watermark and bash_scripts/test_detectors/test_zero_shot_detectors for testing zero shot detectors.
  • bash_scripts/text_quality_pipeline for running basic text quality evaluation such as computing the perplexity. Currently only has the subfolder bash_scripts/text_quality_pipeline/ppl_scorer for computing the perplexity of the text generated by different watermarking schemes and also human text

4. Data folders

  • detector_benchmark/data/generated_datasets for the datasets adversarial + non-adversarial datasets generated with the LLMs with or without watermarking. The arborescence is as follows:
generated_datasets  
│
└───{source_dataset} (e.g. cnn_dailymail)
    │
    └───{adversarial_attack} (e.g. no_attack)
        │ 
        └───{watermarking_scheme} (e.g. no_watermark)
            │  
            └───log 
            │   log.txt (terminal logs)
            │
            │   
            └───{generator_name}_{experiment_name} (actual generated hugginface dataset)

Where source_dataset is the name of the dataset used to obtain the true human written samples and the prefixes for the fake AI-written samples, adversarial_attack and watermarking_scheme are the respective attack and watermark used to generate the datasets. experiment_name is the name used for the specific run of the generation.

7. Results folders

Here are the following folders used to save the results from the scripts/benchmarks:

  • detector_benchmark/detection_test_results containing the results when running the detection script detector_benchmark/test_detector.py.
  • detector_benchmark/text_quality_eval_results containing the results when running the text quality evaluation script detector_benchmark/test_text_quality.py
  • TODO: add info about the respective results folder for the external libraries

Future work

Releases

No releases published

Packages

No packages published

Languages

  • Python 52.3%
  • Jupyter Notebook 38.4%
  • Shell 8.9%
  • Dockerfile 0.4%