Hierarchical-GEC

Code for our IEEE ICWS2021 short paper "Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction".

Installation

Clone this repository and enter.

Create a python 3.7 virtual environment and run the following command:

pip install -r requirements.txt

Datasets and Trained Model

All datasets used in the paper can be found here. The M2 format file should be converted to tsv format file with source sentence and target sentences pairs per line, which can be done by using utils/m2_to_tsv.py.

Our trained model can be downloaded here.

Train Model

Download BERT or SpanBERT from here.
Prepare train and dev datasets.
Run the following command:

python train.py --bert_dir [BERT_DIR] \
                --train_file [TRAIN_FILE/DIR] \
                --valid_file [DEV_FILE/DIR] \
                --output_dir [OUTPUT_DIR] \
                --gpus 0 \
                --truncate 50 \
                --epoch 3 \
                --batch_size 128 \
                --lr 3e-5

The trained model is in [OUTPUT_DIR]/model/epoch-[3]

Predict

Choose Threshold

The default threshold is 0.5, you can find a better one by grid search in the development set.

Set the model_dir and valid_file in grid.sh
Run bash grid.sh

Predict File

python predict.py --model_dir [TRAINED_MODEL_DIR] \
                  --output_dir [OUTPUT_DIR] \
                  --test_file [TEST_FILE] \
                  --discriminating_threshold [0.5] \
                  --batch_size 16 \
                  --gpu 0

Use gRPC

Start the gRPC server with command:

python grpc_server.py --model_dir [TRAINED_MODEL_DIR]

Call the api like grpc_client.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
grpc_proto		grpc_proto
model		model
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
grid.py		grid.py
grid.sh		grid.sh
grpc_client.py		grpc_client.py
grpc_server.py		grpc_server.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical-GEC

Installation

Datasets and Trained Model

Train Model

Predict

Choose Threshold

Predict File

Use gRPC

About

Releases

Packages

Languages

License

AnticPan/Hierarchical-GEC

Folders and files

Latest commit

History

Repository files navigation

Hierarchical-GEC

Installation

Datasets and Trained Model

Train Model

Predict

Choose Threshold

Predict File

Use gRPC

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages