The purpose of this benchmark is to evaluate LLM detectors, especially against evasion attacks. So far, the benchmark is based on the detection of short LLM-generated news articles, but it can be extended to cover different detection tasks.
The main consideration is to make the benchmark easy to extend with different datasets, detectors and evasion attacks.
| Documentation | Paper | Old repository |
- Generating an (adversarial) benchmark with a specific configuration, used for testing detectors.
- Detectors and watermark detection benchmarking (adversarial + non-adversarial)
- Modularity: possible to add new datasets, detectors, attacks and watermarking schemes without much effort
Click here for the full doc.
- Create a conda environment (highly recommended to avoid compatibility issues) and activate it
conda create -n "llm_detector" python=3.10.12 ipython
conda activate llm_detector
- Install pytorch with a version compatible with your CUDA driver
For CUDA version 11.8 (check your version with nvidia-smi and see PyTorch’s website):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- Clone and install the package
git clone [email protected]:marluxiaboss/benchmark_ai_news_detection.git
pip install -e .
attack="generation_base"
watermark_scheme="watermark_base"
create_dataset generation=$attack watermark=$watermark_scheme
detector="fast_detect_gpt"
test_detector detection=$detector
detector_benchmark/dataset_loader
: folder for the dataset_loader classesdetector_benchmark/detector
: folder for the detector classes and detector loaderdetector_benchmark/generation
: folder for the bases generation + adversarial generation classes and also the generator loaderdetector_benchmark/pipeline
: folder for the pipeline classes (text generation, testing detection and evaluating generated text quality)detector_benchmark/text_quality_evaluation
: folder for text quality evaluator classesdetector_benchmark/watermark
: folder for watermark classes (general and different watermark schemes)
Configuration files (hydra configuration) are located in:
detector_benchmark/conf/
for the detection, generation, pipeline and watermark configurations.
detector_benchmark/create_dataset.py
: script to create a dataset using a dataset loader and a generation configdetector_benchmark/test_detector.py
: script to test a detector on a dataset created using the script abovedetector_benchmark/test_text_quality.py
: script to run basic text quality evaluation on generated text (non-watermarked, watermarked or even human written)
Bash scripts for running the different experiments can be found under bash_scripts
. It contains the following subfolder corresponding to the different experiment types:
bash_scritps/create_envs
for creating the two conda environments (see the environment installation part).bash_scripts/big_gen_bench
for running the evaluation from BiGGen-Bench with different watermarking schemes (one bash script per watermarking scheme).bash_scripts/bigcode_eval
for running the evaluation from bigcode-evaluation-harness on selected tasks with different watermarking schemes. There is one subfolderbash_scripts/bigcode_eval/generation
for generating the text to be evaluated andbash_scripts/bigcode_eval/evaluation
for running the evaluations. The latter will actually run the code generated by the LLM, it's therefore advised to launch it inside a sandbox.bash_scripts/generating_datasets
for generating the different datasets using the different watermarking schemes.bash_scripts/lm_harness
for running the evaluation from lm-evluation-harness on selected tasks with different watemarking schemes (one bash scripts for all the watermarking schemes).bash_scripts/test_detectors
for running the detection evaluation scritps with the detectors/watermark detectors on the different generated datasets. There is one subfolderbash_scripts/test_detectors/test_watermark_detectors
for testing the watermark detectors on the respective text generated with watermark andbash_scripts/test_detectors/test_zero_shot_detectors
for testing zero shot detectors.bash_scripts/text_quality_pipeline
for running basic text quality evaluation such as computing the perplexity. Currently only has the subfolderbash_scripts/text_quality_pipeline/ppl_scorer
for computing the perplexity of the text generated by different watermarking schemes and also human text
detector_benchmark/data/generated_datasets
for the datasets adversarial + non-adversarial datasets generated with the LLMs with or without watermarking. The arborescence is as follows:
generated_datasets
│
└───{source_dataset} (e.g. cnn_dailymail)
│
└───{adversarial_attack} (e.g. no_attack)
│
└───{watermarking_scheme} (e.g. no_watermark)
│
└───log
│ log.txt (terminal logs)
│
│
└───{generator_name}_{experiment_name} (actual generated hugginface dataset)
Where source_dataset
is the name of the dataset used to obtain the true human written samples and the prefixes for the fake AI-written samples, adversarial_attack
and watermarking_scheme
are the respective attack and watermark used to generate the datasets. experiment_name
is the name used for the specific run of the generation.
Here are the following folders used to save the results from the scripts/benchmarks:
detector_benchmark/detection_test_results
containing the results when running the detection scriptdetector_benchmark/test_detector.py
.detector_benchmark/text_quality_eval_results
containing the results when running the text quality evaluation scriptdetector_benchmark/test_text_quality.py
- TODO: add info about the respective results folder for the external libraries