Code for our IEEE ICWS2021 short paper "Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction".
Clone this repository and enter.
Create a python 3.7 virtual environment and run the following command:
pip install -r requirements.txt
All datasets used in the paper can be found here.
The M2 format file should be converted to tsv format file with source sentence and target sentences pairs per line, which can be done by using utils/m2_to_tsv.py
.
Our trained model can be downloaded here.
- Download BERT or SpanBERT from here.
- Prepare train and dev datasets.
- Run the following command:
python train.py --bert_dir [BERT_DIR] \
--train_file [TRAIN_FILE/DIR] \
--valid_file [DEV_FILE/DIR] \
--output_dir [OUTPUT_DIR] \
--gpus 0 \
--truncate 50 \
--epoch 3 \
--batch_size 128 \
--lr 3e-5
- The trained model is in [OUTPUT_DIR]/model/epoch-[3]
The default threshold is 0.5, you can find a better one by grid search in the development set.
- Set the
model_dir
andvalid_file
ingrid.sh
- Run
bash grid.sh
python predict.py --model_dir [TRAINED_MODEL_DIR] \
--output_dir [OUTPUT_DIR] \
--test_file [TEST_FILE] \
--discriminating_threshold [0.5] \
--batch_size 16 \
--gpu 0
- Start the gRPC server with command:
python grpc_server.py --model_dir [TRAINED_MODEL_DIR]
- Call the api like
grpc_client.py