GEC Information

Policy

This repository aims to collect and categorize GEC (Grammatical Error Correction) papers.
Unlike NLP-progress, GEC-Info does not consider performance on benchmarks.
- Authors and conferences are also not be considered.
The papers are limited to refereed papers in international conferences for now.
- This is not the case for survey papers.

Contributing

Pull Requests for adding papers are accepted. Please make a commit changing only lines regarding the addition of papers (and take care of changing by auto-formatting).
You can also request to add papers as an issue.

It can also be viewed on GitHub Pages

Overview

Surveys

Title	Year	Page
"Automated Grammatical Error Correction: A Comprehensive Review"	2017	[paper]
"A Comprehensive Survey of Grammar Error Correction"	2020	[paper]
"Recent Trends in the Use of Deep Learning Models for Grammar Error Handling"	2020	[paper]
"Grammatical Error Correction: A Survey of the State of the Art"	2022	[paper]

Shared Tasks

Name	Year	Paper	Note
HOO 2011	2011	[paper]	[website]
HOO 2012	2012	[paper]	[website]
CoNLL-2013	2013	[paper]	[website]
CoNLL-2014	2014	[paper]	[website] [system outputs]
BEA-2019	2019	[paper]	[website] [system outpus]

Datasets

For Training (Real Data)

Name	Year	Paper	Note
EFCamDat	2014	[Automatic Linguistic Annotation ofLarge Scale L2 Databases: The EF-Cambridge Open Language Database(EFCamDat)] [The EF Cambridge Open Language Database (efcamdat) Information for Users]	[download v2]
GitHub Typo Corpus	2019	[GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors]	[download]
W&I+LOCNESS on BEA2019 Shared Task	2019	[Developing an Automated Writing Placement System for ESL Learners ]	[direct download]
FCE	2011	[A New Dataset and Method for Automatically Grading ESOL Texts]	[direct download]
NUCLE	2013	[Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English]	[download]
ICNALE	2013	[The ICNALE and Sophisticated Contrastive Interlanguage Analysis of Asian Learners of English]	[download]
Lang-8	2011	[Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners]	[website] [download: Fill this form] Related tools are useful. See the [Other Tools] for the details.

For Training (Pseudo/Systhetic Data)

Name	Year	Paper	Note
PIE-synthetic	2019	[Parallel Iterative Edit Models for Local Sequence Transduction]	[download]

For Evaluation

Name	Year	Paper	Note
KJ	2011	[Creating a manually error-tagged and shallow-parsed learner corpus]	[download]
CoNLL-2013	2013	[The CoNLL-2013 Shared Task on Grammatical Error Correction]	[direct download]
CoNLL-2014	2014	[The CoNLL-2014 Shared Task on Grammatical Error Correction]	[direct download]
10 additional annotations for the CoNLL14	2015	[How Far are We from Fully Automatic High Quality Grammatical Error Correction?]	[direct download]
8 additional annotations for the CoNLL14	2016	[Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality]	[download]
JFLEG	2017	[JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction]	[download]
GMEG-Data	2019	[Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses]	[code]
CWEB	2020	[Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses]	[download]
ErAConD	2021	[ErAConD : Error Annotated Conversational Dialog Dataset for Grammatical Error Correction]	[data] Training dataset is also included.
RobustGEC	2023	RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation	[code]
CSW Lang-8 Dataset	2024	Grammatical Error Correction for Code-Switched Sentences by Learners of English	[code/data]

Performance measures

Reference-based

Name	Year	Paper	Note
M^2 Scorer	2012	[Better Evaluation for Grammatical Error Correction]	[code] It is often used to evaluate CoNLL-2013 and CoNLL-2014.
GLEU	2015	[Ground Truth for Grammatical Error Correction Metrics] [GLEU Without Tuning]	[code] It is often used to evaluate JFLEG.
I-measure	2015	[Towards a standard evaluation method for grammatical error detection and correction]	[code] Code is available only python 2.x.
ERRANT	2016	[Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments] [Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction]	[code] It is often used to evaluate BEA-2019.
GMEG-Metric	2019	[Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses]	[code] Ridge regression using existing metrics (e.g. ERRANT, GLEU) as features.
GoToScorer	2019	[Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation]	[code] It can be evaluated systems considering error correction difficulty.
PT-M2	2022	Revisiting Grammatical Error Correction Evaluation and Beyond	[code]
CLEME	2023	CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction	[code]

Reference-less

Keywords / Overview	Year	Paper	Note
Scoring by counting the errors	2016	[There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction]	[code]
Fluency + grammaticality + meaning preservation	2017	[Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems]
USim	2018	[Reference-less Measure of Faithfulness for Grammatical Error Correction]	[code]
SOME	2020	[SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction]	[code]
Scribendi Score	2021	[Is this the end of the gold standard? A straightforward reference-less grammatical error correction metric]	[Unofficial code]
IMPARA	2022	IMPARA: Impact-Based Metric for GEC Using Parallel Data	[code]
	2024	Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction

Meta-evaluation

Keywords / Overview	Year	Paper	Note
Re-rank the CoNLL14 systems by human evaluation	2015	Human Evaluation of Grammatical Error Correction Systems	[code]
Reassess M^2, I-measure, GLEU by comparing human evaluation	2018	[A Reassessment of Reference-Based Grammatical Error Correction Metrics]	[code]
MAEGE	2018	Automatic Metric Validation for Grammatical Error Correction	[code]
SEEDA	2024	Revisiting Meta-evaluation for Grammatical Error Correction	[code]

Quality Estimation

Keywords / Overview	Year	Paper	Note
	2022	Proficiency Matters Quality Estimation in Grammatical Error Correction

Models / Architectures

Supervised

Keywords / Overview	Year	Paper	Note
	2006	Correcting ESL Errors Using Phrasal SMT Techniques
	2009	Using First and Second Language Models to Correct Preposition Errors in Second Language Authoring
	2010	Generating Confusion Sets for Context-Sensitive Error Correction
	2011	Correcting Semantic Collocation Errors with L1-induced Paraphrases
	2012	Tense and Aspect Error Correction for ESL Learners Using Global Context
	2012	Exploring Grammatical Error Correction with Not-So-Crummy Machine Translation
	2014	Grammatical error correction using hybrid systems and type filtering	CoNLL2014: CAMB
	2014	The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation	CoNLL2014: AMU
	2014	The Illinois-Columbia System in the CoNLL-2014 Shared Task	CoNLL2014: CUUI
	2014	RACAI GEC – A hybrid approach to Grammatical Error Correction	CoNLL2014: RAC
	2014	Grammatical Error Detection Using Tagger Disagreement	CoNLL2014: UFC
	2014	CoNLL 2014 Shared Task: Grammatical Error Correction with a Syntactic N-gram Language Model from a Big Corpora	CoNLL2014: IPN
	2014	Tuning a Grammar Correction System for Increased Precision	CoNLL2014: IITB
	2014	POSTECH Grammatical Error Correction System in the CoNLL-2014 Shared Task	CoNLL2014: POST
	2014	Grammatical Error Detection and Correction using a Single Maximum Entropy Model	CoNLL2014: SJTU
	2014	Factored Statistical Machine Translation for Grammatical Error Correction	CoNLL2014: UMC
	2014	NTHU at the CoNLL-2014 Shared Task	CoNLL2014: NTHU
	2014	A Unified Framework for Grammar Error Correction	CoNLL2014: PKU
	2016	Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical Error Correction
	2016	Adapting Grammatical Error Correction Based on the Native Language of Writers with Neural Network Joint Models
Phrase-based SMT	2016	[Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction]	[code]
Neural reinforcement learning	2017	[Grammatical Error Correction with Neural Reinforcement Learning]	[code]
Word-level SMT enhanced NNJMs + char-based SMT	2017	[Connecting the Dots: Towards Human-Level Grammatical Error Correction]	[code]
First NMT-based approach	2016	[Grammatical error correction using neural machine translation]
	2016	Neural Network Translation Models for Grammatical Error Correction
SMEG	2017	[Systematically Adapting Machine Translation for Grammatical Error Correction]	[code]
A nested attention (word and char attention)	2017	[A Nested Attention Neural Hybrid Model for Grammatical Error Correction]
Re-ranking N-best sentence (by SMT) with LSTM-based GED	2017	[Neural Sequence-Labelling Models for Grammatical Error Correction]
CNN-based Encder-Decoder approach	2018	[A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction]	[code]
Fluency boosting learning	2018	[Fluency Boost Learning and Inference for Neural Grammatical Error Correction]	[code] ACL2018
Fluency boosting learning (added round-way error correction)	2018	[Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study]	[code] Microsoft Research Technical Report
Hybrid SMT and NMT	2018	[Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation]
Copy-Augmented Architecture	2019	[Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data]	[code]
Consider a few previous sentences	2019	[Cross-Sentence Grammatical Error Correction]	[code]
PIE	2019	[Parallel Iterative Edit Models for Local Sequence Transduction]	[code]
LaserTagger	2019	[Encode, Tag, Realize: High-Precision Text Editing]	[code]
Pretrain by DAE + sequential transfer learning	2019	[A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning]	[code] BEA-2019: Kakao&Brain
Use sentence-level error dectection	2019	[The AIP-Tohoku System at the BEA-2019 Shared Task]	BEA-2019: AIP-Tohoku
Four CNN + eight Transformer	2019	[The LAIX Systems in the BEA-2019 GEC Shared Task]	BEA-2019: LAIX
Combine Transformer+CNN with FST + Re-ranking	2019	[Neural and FST-based approaches to grammatical error correction]	BEA-2019: CAMB-CLED
Transformer seq2seq + BERT re-ranker	2019	[TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track]	BEA-2019: TMU
Apply noisy channel with BERT and GPT-2 as LM	2019	[Noisy Channel for Low Resource Grammatical Error Correction]	BEA-2019: Siteimprove
Use Finite State Transducers	2019	[Neural Grammatical Error Correction with Finite State Transducers]
GECToR	2020	[GECToR – Grammatical Error Correction: Tag, Not Rewrite]	[code]
BERT-fuse	2020	[Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction]	[code]
Adversarial approach (G:seq2seq D:sentence-pair classification)	2020	[Adversarial Grammatical Error Correction]
Erroneous span correction and detection	2020	[Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction]
Document-level approach	2020	[Document-level grammatical error correction]	[code]
Seq2Edits	2020	[Seq2Edits: Sequence Transduction Using Span-level Edit Operations]	[code]
Beam search considering copy probability	2020	[Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction]
BART-based	2020	[Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model]	[code]
VERNet	2021	[Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction]	[code]
Shallow Aggressive Decoding	2021	[Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding]	[code]
T5-based	2021	[A Simple Recipe for Multilingual Grammatical Error Correction]	[code]
GAN-like sequence labeling	2021	[Grammatical Error Correction as GAN-like Sequence Labeling]
Use multiclass GED for Transformer seq2seq and reranking	2021	[Multi-Class Grammatical Error Detection for Correction: A Tale of Two Systems]
GEC for writing improvement model adapted to the writer’s L1	2021	[Beyond Grammatical Error Correction: Improving L1-influenced research writing in English using pre-trained encoder-decoder models]	[code]
Constrastive Leaning approach	2021	[Grammatical Error Correction with Contrastive Learning in Low Error Density Domains]	[code]
Sequence Span Rewriting	2021	[Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting]
Dependent Self-Attention (DSA)	2021	[Grammatical Error Correction with Dependency Distance]
	2021	Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction	[code]
A GEC model using only 11.6MB	2021	An efficient system for grammatical error correction on mobile devices
	2022	Interpretability for Language Learners Using Example-Based Grammatical Error Correction	[code]
	2022	Type-Driven Multi-Turn Corrections for Grammatical Error Correction	[code]
GECToR Large	2022	Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction	[code] [Author's Master Thesis]
	2022	Position Offset Label Prediction for Grammatical Error Correction
SynGEC	2022	SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser	[code]
	2022	Improved grammatical error correction by ranking elementary edits	[code]
EdiT5	2022	EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start	[code]
GEC-DePenD	2023	GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding	[code]
TemplateGEC	2023	TemplateGEC: Improving Grammatical Error Correction with Detection Template	[code]
LET	2023	LET: Leveraging Error Type Information for Grammatical Error Correction
	2023	Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction
Use speech information	2023	Improving Grammatical Error Correction with Multimodal Feature Integration	[code]
	2023	Improving Autoregressive Grammatical Error Correction with Non-autoregressive Models
	2023	Reducing Sequence Length by Predicting Edit Spans with Large Language Models
	2024	No Error Left Behind: Multilingual Grammatical Error Correction with Pre-trained Translation Models
EDU Copy Mechanism	2024	Improving Copy-oriented Text Generation via EDU Copy Mechanism
	2024	Improving Grammatical Error Correction by Correction Acceptability Discrimination
mEdIT	2024	mEdIT: Multilingual Text Editing via Instruction Tuning	[code]
	2024	Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction	[code]
	2024	Detection-Correction Structure via General Language Model for Grammatical Error Correction	[code]

Unsupervised

Keywords / Overview	Year	Paper	Note
5-gram LM based approach	2018	[Language Model Based Grammatical Error Correction without Annotated Training Data]	[code]
Train GRU models for each of five error types	2018	[A Simple but Effective Classification Model for Grammatical Error Correction]
Use Finite State Transducers	2019	[Neural Grammatical Error Correction with Finite State Transducers]
LSTM tagger for word coice task	2019	[Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems]	[code]
Use LM (BERT, GPT-1,2)	2019	[The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction]
Create erroneous data from monolingual data	2019	[Minimally-Augmented Grammatical Error Correction]	Supervised setting is also performed
LM-Critic	2021	[LM-Critic: Language Models for Unsupervised Grammatical Error Correction]	[code] Supervised setting is also performed
	2023	Unsupervised Grammatical Error Correction Rivaling Supervised Methods	[code]

Ensemble Methods

Keywords / Overview	Year	Paper	Note
Use MENT	2014	System Combination for Grammatical Error Correction
	2016	Grammatical Error Correction: Machine Translation and Classifiers
	2019	[Learning to combine Grammatical Error Corrections]	[code]
Diversity-Driven Combination (DDC)	2021	[Diversity-Driven Combination for Grammatical Error Correction]	[code]
Select a system for each error type with IP	2021	[System Combination for Grammatical Error Correction Based on Integer Programming]	[code]
	2022	Frustratingly Easy System Combination for Grammatical Error Correction	[code]
GRECO	2023	System Combination via Quality Estimation for Grammatical Error Correction	[code]

Strategies

Keywords / Overview	Year	Paper	Note
	2012	A Beam-Search Decoder for Grammatical Error Correction
	2016	Discriminative Reranking for Grammatical Error Correction with Statistical Machine Translation
	2016	Candidate re-ranking for SMT-based grammatical error correction
Some methods that can be adapted neural MT	2018	[Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task]	[code]
Iterative decoding	2018	[Weakly Supervised Grammatical Error Correction using Iterative Decoding]
	2019	Controlling Grammatical Error Correction Using Word Edit Rate
Add adversarial examples continually	2020	[Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples]
Cross-lingual Transfer Learning	2020	[Cross-lingual Transfer Learning for Grammatical Error Correction]
Data Weighted Training Strategies	2020	[Data Weighted Training Strategies for Grammatical Error Correction]
Align-and-Predict Decoding	2022	Adjusting the Precision-Recall Trade-Off with Align-and-Predict Decoding for Grammatical Error Correction	[code]
	2023	Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting	[code]
	2023	An Extended Sequence Tagging Vocabulary for Grammatical Error Correction	[code]
BTR	2023	Bidirectional Transformer Reranker for Grammatical Error Correction	[code]
	2023	Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule
MainGEC	2023	Grammatical Error Correction via Mixed-Grained Weighted Training
	2023	Improving Seq2Seq Grammatical Error Correction via Decoding Interventions	[code]

Data Augmentation

Keywords / Overview	Year	Paper	Note
Make artificial errors in a probabilistic manner	2014	[Generating artificial errors for grammatical error correction]
Back translation	2016	[Improving Neural Machine Translation Models with Monolingual Data]
SMT based MT + pattern extraction	2017	[Artificial Error Generation with Machine Translation and Syntactic Patterns]
Diverse back translation with noisy beam search	2018	[Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction]
DirectNoise	2019	[Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data]	The method was first called "DirectNoise" by [kiyono+ 2019]?
Substituting words using confusion sets	2019	[Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data]	[synthetic data] BEA-2019: UEDIN-MS
Error+Context Dictionary	2019	[Improving Precision of Grammatical Error Correction with a Cheat Sheet]	BEA-2019: Buffalo
Use Google Translate for making pseudo data	2019	[(Almost) Unsupervised Grammatical Error Correction using a Synthetic Comparable Corpus]	BEA-2019: TMU in Low Resource
Inverted Spellchecker + Patterns+POS	2019	[A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction]
Methods for erroneous data generation	2019	[Erroneous data generation for Grammatical Error Correction]	BEA-2019: Shuyao
Wikipedia revision & Wikipedia round-trip translation	2019	[Corpora Generation for Grammatical Error Correction]
Create confusion sets by edit distance, word embeddings, spell-breaking	2019	[Minimally-Augmented Grammatical Error Correction]	Supervised setting is also performed
Explore methods to make pseude data, seed corpus, training settings	2019	[An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction]	[code]
	2020	[Massive Exploration of Pseudo Data for Grammatical Error Correction]
Control error rates and error types by rule-based corruption and filtered back-translation	2020	[Controllable Data Synthesis Method for Grammatical Error Correction]
Use machine translation pairs	2020	[Improving Grammatical Error Correction with Machine Translation Pairs]
Edit latent representation	2020	[Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation]
Consider learner’s error tendency	2020	[Grammatical Error Correction Using Pseudo Learner Corpus Considering Learner’s Error Tendency]
Tagged corruption	2021	[Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models]	[code]
Use 188 modules	2021	[Various Errors Improve Neural Grammatical Error Correction]	[code]
Use real error petterns and linguistic knowledge	2021	[Data Augmentation of Incorporating Real Error Patterns and Linguistic Knowledge for Grammatical Error Correction]
Divide non-English sentence into chunks → translate to English for each of them → concatenate	2021	[Grammatical Error Generation Based on Translated Fragments]
	2023	Grammatical Error Correction through Round-Trip Machine Translation
TransGEC	2023	TransGEC: Improving Grammatical Error Correction with Translationese	[code]
Focus on gender bias	2023	Gender-Inclusive Grammatical Error Correction through Augmentation	[code]
	2023	Training for Grammatical Error Correction Without Human-Annotated L2 Learners’ Corpora
MixEdit	2023	MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction	[code]
	2024	Synthetic Data Generation for Low-resource Grammatical Error Correction with Tagged Corruption Models
	2024	Improving Grammatical Error Correction via Contextual Data Augmentation	[code]

Data Cleaning

Keywords / Overview	Year	Paper	Note
A Self-Refinement Strategy for Noise Reduction	2020	[A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction]
cLang8 (Cleaned Lang-8)	2021	[A Simple Recipe for Multilingual Grammatical Error Correction]	[code]

Analyses

Keywords / Overview	Year	Paper	Note
	2011	Algorithm Selection and Model Adaptation for ESL Correction Tasks
	2012	The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings
	2015	[How Far are We from Fully Automatic High Quality Grammatical Error Correction?]
Human annotation focused on fluency	2016	[Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality]	[code]
	2017	[GEC into the future: Where are we going and how do we get there?]
	2018	[Inherent Biases in Reference-based Evaluation for Grammatical Error Correction]	[code]
	2018	[Assessing Grammatical Correctness in Language Learning]
Quality estimation (and re-ranking using estimated score)	2018	[Neural Quality Estimation of Grammatical Error Correction]	[code]
Evaluate four systems (SMT, CNN, LSTM, Transformer) for six corpora (CoNLL13&14, FCE, JFLEG, KJ, ICNALE)	2019	[Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?]
Compare CNN, Transformer, PRPN, ON-LSTM as back-translation models	2019	[The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction]
GEC for post-processing	2019	Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study
CGOP	2020	[Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection]	Metric Considering overcorrection
Create new gold data by post-editing system outputs	2021	[How Good (really) are Grammatical Error Correction Systems?]
Explore whether models have grammatical knowledge with Known-setting and Unknown-setting	2021	[Do Grammatical Error Correction Models Realize Grammatical Generalization?]
Compare CNN, LSTM, transformer or combinations of them as BT models	2021	[Comparison of Grammatical Error Correction Using Back-Translation Models]
	2022	Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models
	2022	Grammatical Error Correction: Are We There Yet?
	2022	Grammatical Error Correction Systems for Automated Assessment: Are They Susceptible to Universal Adversarial Attacks?	[code]
	2023	ChatBack: Investigating Methods of Providing Grammatical Error Feedback in a GUI-based Language Learning Chatbot
	2023	Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods
	2023	A Closer Look at k-Nearest Neighbors Grammatical Error Correction
	2023	Grammatical Error Correction for Sentence-level Assessment in Language Learning
	2023	Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
	2024	Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency
	2024	GPT-3.5 for Grammatical Error Correction	Target languages: CZ, DE, EN, RU, SV, UA
	2024	[Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models]	[code]
	2024	Likelihood-based Mitigation of Evaluation Bias in Large Language Models
	2024	Prompting open-source and commercial language models for grammatical error correction of English learner text	[code]

Spoken Domain

Keywords / Overview	Year	Paper
	2019	AUTOMATIC GRAMMATICAL ERROR DETECTION OF NON-NATIVE SPOKEN LEARNER ENGLISH
	2020	Grammatical error detection in transcriptions of spoken English
Disfluency detection (DD) model	2020	Spoken Language ‘Grammatical Error Correction’
	2022	On Assessing and Developing Spoken ’Grammatical Error Correction’ Systems

Applications

Name	Year	Paper	Note
GECko++		[GECko+: a Grammatical and Discourse Error Correction Tool]	[website] [code] An English assiting tool. Correction grammatical error and re-ordering sentences automatically.
MiSS	2021	[MiSS: An Assistant for Multi-Style Simultaneous Translation]	[website] [demo video]
ALLECS	2023	ALLECS: A Lightweight Language Error Correction System	[website] [code]
	2023	Doolittle: Benchmarks and Corpora for Academic Writing Formalization	[code]

Projects

Name	Website
GramFormer	[GitHub]

Other Tools

Name	Code	Note
Lang8-NAIST-extractor	[code]	Scripts for extracting error-correct pairs from the Lang-8 Corpus.
M2Converter	[code]	Scripts for converting m2 file into source file and target file.
EFCamDat-Preprocess	[code]

Other materials

Name	Paper	Note
NLP-progress		[website] The performance ranking on some datasets.
A Crash Course in Automatic Grammatical Error Correction	[paper]	[materials] The tutorial about GEC in COLING2020.
Chunngai/gec-papers		[github] The papers are being compiled around 2019-2020?

Related Tasks

Grammatical Error Detection

Keywords / Overview	Year	Paper	Note
	2003	Automatic Error Detection in the Japanese Learners’ English Spoken Data
	2006	Detecting errors in English article usage by non-native speakers
	2008	The Ups and Downs of Preposition Error Detection in ESL Writing
	2010	Evaluating performance of grammatical error detection to maximize learning effect
A weighted measure according to crowdsourcing results (for GED)	2011	[They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems]
	2014	Detecting Learner Errors in the Choice of Content Words Using Compositional Distributional Semantics
	2016	Compositional Sequence Labeling Models for Error Detection in Learner Writing
	2017	Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings	[code]
	2018	[Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection]	[code]
Bi-LSTM with contextual word embeddings	2019	[Context is Key: Grammatical Error Detection with Contextual Word Representations]
Multi-head and multi-layer attention	2019	[Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection]
	2021	[Exploring the Capacity of a Large-scale Masked Language Model to Recognize Grammatical Errors]
	2022	Probing for targeted syntactic knowledge through grammatical error detection	[code]

Feedback Comment Generation

Keywords / Overview	Year	Paper	Note
	2014	[Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages]
English grammar checker with feedback in Japanese	2018	[Grammatical Error Checker for Japanese Learners of English]	This is not a research as a feedback comment generation, but I classify it here for now
	2019	[Toward a Task of Feedback Comment Generation for Writing Learning]
	2020	[Creating Corpora for Research in Feedback Comment Generation]
	2021	[Shared Task on Feedback Comment Generation for Language Learners]
	2023	Template-guided Grammatical Error Feedback Comment Generation

Explainable Grammatical Error Correction

Studies to explain the reasons for and intentions of error correction.

Keywords / Overview	Year	Paper	Note
EXPECT	2023	Enhancing Grammatical Error Correction Systems with Explanations	[code]
XGEC dataset	2024	Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction	[data]
GEE	2024	GEE! Grammar Error Explanation with Large Language Models	[code]

Document-level Revision

Keywords / Overview	Year	Paper	Note
TETRA	2024	Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond	[code]

Other Languages

Arabic

Keywords / Overview	Year	Paper	Note
Arabic Learner Corpus	2013	[Arabic Learner Corpus v1: A New Resource for Arabic Language Research]	[website]
QALB	2014	[Large Scale Arabic Error Annotation: Guidelines and Framework]	[QALB Project Website]
QALB 2014 Shared Task	2014	[The First QALB Shared Task on Automatic Text Correction for Arabic]	[website]
QALB 2015 Shared Task	2015	[The Second QALB Shared Task on Automatic Text Correction for Arabic]
ARETA	2021	[Automatic Error Type Annotation for Arabic]	[code]
	2023	Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation	[code]
	2023	Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction	[[code]]

Bangla

Keywords / Overview	Year	Paper	Note
	2021	[Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation]

Chinese

Keywords / Overview	Year	Paper	Note
	2013	Chinese Spelling Checker Based on Statistical Machine Translation
	2014	Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners
	2015	Improving Chinese Grammatical Error Correction with Corpus Augmentation and Hierarchical Phrase-based Statistical Machine Translation
NLPCC-2018 Shared Task	2018	[Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction]	[data]
Two-stage: Spell checker → seq2seq	2019	[A Two-Stage Model for Chinese Grammatical Error Correction]
CNN-based seq2seq	2019	[Chinese Grammatical Error Correction Based on Convolutional Sequence to Sequence Model]
MaskGEC	2020	[MaskGEC: Improving Neural Grammatical Error Correction via Dynamic Masking]
	2020	[Chinese Grammatical Error Detection Based on BERT Model]
	2020	[BERT Enhanced Neural Machine Translation and Sequence Tagging Model for Chinese Grammatical Error Diagnosis]
	2020	[Heterogeneous Recycle Generation for Chinese Grammatical Error Correction]
NLPTEA-2020 Shared Task	2020	[Overview of NLPTEA-2020 Shared Task for Chinese Grammatical Error Diagnosis]
Tail-to-Tail Non-Autoregressive Sequence Prediction	2021	[Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction]
	2021	"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction
	2022	Pre-Training-Based Grammatical Error Correction Model for the Written Language of Chinese Hearing Impaired Students
	2022	MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction	[code]
	2022	Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation	[code]
	2022	String Editing Based Chinese Grammatical Error Diagnosis
CLG	2022	Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction	[code]
	2022	From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction
FCGEC	2022	FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction	[code]
	2023	Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction?	[code]
	2023	Focal Training and Tagger Decouple for Grammatical Error Correction
NaSGEC	2023	NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts	[code]
TLM	2023	TLM: Token-Level Masking for Transformers	[code]
	2024	LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction	[code]
Alirector	2024	Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector	[code]
	2024	Towards Better Utilization of Multi-Reference Training Data for Chinese Grammatical Error Correction	[code]

Czech

Keywords / Overview	Year	Paper	Note
AKCES-GEC dataset	2019	[Grammatical Error Correction in Low-Resource Scenarios]	[data]
Grammar Error Correction Corpus for Czech (GECCC)	2022	Czech Grammar Error Correction with a Large and Diverse Corpus	[data]

Finnish

Keywords / Overview	Year	Paper	Note
	2024	Correcting Challenging Finnish Learner Texts With Claude, GPT-3.5 and GPT-4 Large Language Models

Geek

Keywords / Overview	Year	Paper	Note
Greek Learner Corpus	2018	[Stand-off annotation in learner corpora: compiling the Greek Learner Corpus (GLC)]
ELERRANT	2021	[ELERRANT: Automatic Grammatical Error Type Classification for Greek]	[code]

German

Keywords / Overview	Year	Paper	Note
Falko-MERLIN dataset	2018	[Using Wikipedia Edits in Low Resource Grammatical Error Correction]	[data]

Hindi

Keywords / Overview	Year	Paper	Note
	2014	[Detection and correction of non word spelling errors in Hindi language]
HiWikiEd dataset	2020	[Generating Inflectional Errors for Grammatical Error Correction in Hindi]	[data]

Icelandic

Keywords / Overview	Year	Paper	Note
Byte-level approach	2023	Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora	[code]

Japanese

Keywords / Overview	Year	Paper	Note
Character-level RNN-based seq2seq	2018	[Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation]
Constructing retrieval system for Japanese GEC	2019	[Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language]
TMU Evaluation Corpus for Japanese Learners	2020	[Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language]	[data: Fill this form]
Non-Autoregressive approach	2020	[Non-Autoregressive Grammatical Error Correction Toward a Writing Support System]
	2022	Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Korean

Keywords / Overview	Year	Paper	Note
KAGAS	2023	Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation	[code] [data request form]

Lithuanian

Keywords / Overview	Year	Paper	Note
	2022	Towards Lithuanian grammatical error correction	[code]

Romain

Keywords / Overview	Year	Paper	Note
	2020	[Neural Grammatical Error Correction for Romanian]	[code]

Russian

Keywords / Overview	Year	Paper	Note
RULEC-GEC dataset	2019	[Grammar Error Correction in Morphologically Rich Languages: The Case of Russian]	[data]
RU-Lang8 dataset	2021	[New Dataset and Strong Baselines for the Grammatical Error Correction of Russian]	[data]
Additional annotations for RULEC and RU-Lang8	2024	Multi-Reference Benchmarks for Russian Grammatical Error Correction	[RULEC] [RU-Lang8]
	2024	Universal Dependencies for Learner Russian	[code]

Spanish

Keywords / Overview	Year	Paper	Note
COWS-L2H	2020	[Developing NLP Tools with a New Corpus of Learner Spanish]	[data]

Swedish

Keywords / Overview	Year	Paper	Note
	2024	Evaluation of Really Good Grammatical Error Correction	code

Turkish

Keywords / Overview	Year	Paper	Note
ERRANT-TR	2023	Towards Automatic Grammatical Error Type Classification for Turkish	[code]

Ukrainian

Keywords / Overview	Year	Paper	Note
UA-GEC	2023	[UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language]	[data]
UNLP 2023 Shared Task	2023	The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian
	2023	Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction	UNLP-2023: Pravopysnyk
	2023	A Low-Resource Approach to the Grammatical Error Correction of Ukrainian	UNLP-2023: QC-NLP
	2023	RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans	UNLP-2023: WebSpellChecker

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
README.md		README.md
_config.yml		_config.yml

gotutiyan/GEC-Info

Folders and files

Latest commit

History

Repository files navigation

GEC Information

Policy

Contributing

Overview

Surveys

Shared Tasks

Datasets

For Training (Real Data)

For Training (Pseudo/Systhetic Data)

For Evaluation

Performance measures

Reference-based

Reference-less

Meta-evaluation

Quality Estimation

Models / Architectures

Supervised

Unsupervised

Ensemble Methods

Strategies

Data Augmentation

Data Cleaning

Analyses

Spoken Domain

Applications

Projects

Other Tools

Other materials

Related Tasks

Grammatical Error Detection

Feedback Comment Generation

Explainable Grammatical Error Correction

Document-level Revision

Other Languages

Arabic

Bangla

Chinese

Czech

Finnish

Geek

German

Hindi

Icelandic

Japanese

Korean

Lithuanian

Romain

Russian

Spanish

Swedish

Turkish

Ukrainian

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages