AGB-DE is a legal NLP corpus for the automated detection of potentially void clauses in German standard form consumer contracts. It consists of 3,764 clauses that have been legally assessed by experts and annotated as potentially void (1) or valid (0). Additionally, each clause is annotated with a topic label. This repository contains the corpus itself, code that was uses to anonymize the data, code that was used to train and evaluate baseline models, and the results of the baseline evaluation itself.
@inproceedings{braun-matthes-2024-agb,
title = "AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts",
author = "Braun, Daniel and Matthes, Florian",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
publisher = "Association for Computational Linguistics"
}
The dataset consists of 3,764 clauses from 93 German consumer standard form contracts. Each clause has been annotated with its topic(s) and whether the clause is valid (0) or potentially void (1). The table below shows the distribution of topics among the clauses and the share of potentially void clauses per topic. For more details about the data please take a look at the paper and the datasheet.
topic | number of clauses | share_void |
---|---|---|
age | 20 | 0 |
applicability | 148 | 2.03 |
applicableLaw | 87 | 3.45 |
arbitration | 97 | 1.03 |
changes | 9 | 11.11 |
codeOfConduct | 29 | 0 |
conclusionOfContract | 557 | 5.92 |
contractLanguage | 41 | 0 |
delivery | 475 | 7.16 |
description | 46 | 0 |
disposal | 36 | 0 |
intellectualProperty | 39 | 0 |
language | 9 | 11.11 |
liability | 211 | 9 |
party | 0 | 0 |
payment | 642 | 6.07 |
personalData | 115 | 0.87 |
placeOfJurisdiction | 53 | 1.89 |
prices | 147 | 1.36 |
retentionOfTitle | 125 | 2.4 |
severability | 35 | 11.43 |
textStorage | 57 | 1.75 |
warranty | 314 | 6.37 |
withdrawal | 506 | 3.75 |
Total lvl 1 | 3798 | 4.8 |
Model | Precision | Recall | F1-Score |
---|---|---|---|
svm |
0.37 | 0.27 | 0.31 |
bert-base-german-cased |
0.50 | 0.27 | 0.35 |
xlm-roberta-base |
0.00 | 0.00 | 0.00 |
gerpt2 |
0.71 | 0.14 | 0.23 |
gpt-3.5-turbo-0125 |
0.06 | 0.92 | 0.11 |
Model | Precision | Recall | F1-Score |
---|---|---|---|
svm |
0.40 | 0.32 | 0.36 |
bert-base-german-cased |
0.51 | 0.57 | 0.54 |
xlm-roberta-base |
0.75 | 0.08 | 0.15 |
gerpt2 |
0.64 | 0.43 | 0.52 |
gpt-3.5-turbo-0125 |
0.13 | 0.92 | 0.22 |
The easiest way to use the corpus is through the dataset and baseline model provided on 🤗 Huggingface. See also example.py.
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import pipeline
# load corpus
ds = load_dataset("d4br4/agb-de")
# load model
modelname = "d4br4/AGBert"
model = AutoModelForSequenceClassification.from_pretrained(modelname)
tokenizer = AutoTokenizer.from_pretrained(modelname)
# create classification pipeline
clf = pipeline("text-classification", model, tokenizer=tokenizer, max_length=512, truncation=True)
# classify clause text
prediction = clf.predict(ds["test"][0]["text"])
# check classification output
if prediction[0]["label"] == "valid":
print("This clause is valid.")
else:
print("This clause is void.")
If you have any question, please contact:
Daniel Braun (University of Twente)
[email protected]
The data collection and annotation was supported by funds of the Federal Ministry of Justice and Consumer Protection (BMJV) based on a decision of the Parliament of the Federal Republic of Germany via the Federal Office for Agriculture and Food (BLE) under the innovation support programme.