The code for "TransGEC: Improving Grammaticial Error Correction with Translationese". Our models were trained using the NVIDIA Tesla V100 32G and A100 40G GPUs.
@inproceedings{fang-etal-2023-transgec,
title = "{T}rans{GEC}: Improving Grammatical Error Correction with Translationese",
author = "Fang, Tao and
Liu, Xuebo and
Wong, Derek F. and
Zhan, Runzhe and
Ding, Liang and
Chao, Lidia S. and
Tao, Dacheng and
Zhang, Min",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.223",
pages = "3614--3633",
}
We released the translationese GEC models (TransGEC
) fine-tuned on (m)T5-large pre-trained language model. If you want to quickly explore our job, the following instructions may be useful to you.
-
Step 1: Requirements and Installation
This implementation is based on huggingface/transformers(v4.13.0)
- PyTorch version >= 1.3.1
- Python version >= 3.6
git clone https://github.com/NLP2CT/Trans4GEC.git cd transformers pip install . pip install -r requirements.txt
-
Step 2: Download Translationese (m)T5-GEC Models and Data
Lang. Model Description Model-Download Data-Download En TransGEC
Fine-tuned with cLang8-en and translationese TransGEC.en.model data.en De TransGEC
Fine-tuned with cLang8-de and translationese TransGEC.de.model data.de Ru TransGEC
Fine-tuned with cLang8-ru and translationese TransGEC.ru.model data.ru Zh TransGEC
Fine-tuned with Lang8-zh and translationese TransGEC.zh.model data.zh The directory of the downloaded data follows the following format:
data_xx/ |--train |--translationese.tsv |--train-translationese.json |--dev |--dev.xx.json |--test |--test.xx.json |--test.xx.M2
-
Step 3: Generation and Evaluation
If you want to use the downloaded TransGEC models to generate and evaluate, please refer to the script
transgec_generate.sh
for detailed information.
If you want to fine-tune (m)T5-large pre-trained language model from scratch using translationese, please follow the instructions below.
sh /shell_finetune-T5/train_en.sh
sh /shell_finetune-T5/train_de.sh
sh /shell_finetune-T5/train_ru.sh
sh /shell_finetune-T5/train_zh.sh
sh /shell_finetune-T5/Generate_evaluate_en.sh
sh /shell_finetune-T5/Generate_evaluate_de.sh
sh /shell_finetune-T5/Generate_evaluate_ru.sh
sh /shell_finetune-T5/Generate_evaluate_zh.sh
Please refer to the following instructions for more information on our work: