A faster and simpler implementation of GECToR – Grammatical Error Correction: Tag, Not Rewrite with amp and distributed support by deepspeed. To make it faster and more readable, we remove allennlp dependencies and reconstruct related codes.
NOTE: the project is now maintained by cofe-ai, updates and issue fixes will be on https://github.com/cofe-ai/fast-gector . Please check it.
-
Install Pytorch with cuda support
conda create -n gector_env python=3.7.6 -y conda activate gector_env conda install pytorch=1.10.1 cudatoolkit -c pytorch
-
Install NVIDIA-Apex (for using amp with deepspeed)
git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
-
Install following packages by conda/pip
python==3.7.6 transformers==4.14.1 scikit-learn==1.0.2 numpy==1.21.2 deepspeed==0.5.10
-
Tokenize your data (one sentence per line, split words by space)
-
Generate edits from parallel sents
python utils/preprocess_data.py -s source_file -t target_file -o output_edit_file
-
*(Optional) Define your own target vocab (data/vocabulary/labels.txt)
- Edit deepspeed_config.json according to your config params. Note that lr and batch_size options will be overrided by args
bash scripts/train.sh
- Edit deepspeed_config.json according to your config params
bash scripts/predict.sh
[1] Omelianchuk, K., Atrasevych, V., Chernodub, A., & Skurzhanskyi, O. (2020). GECToR -- Grammatical Error Correction: Tag, Not Rewrite. arXiv:2005.12592 [cs]. http://arxiv.org/abs/2005.12592