- This repository aims to collect and categorize GEC (Grammatical Error Correction) papers.
- Unlike NLP-progress, GEC-Info does not consider performance on benchmarks.
- Authors and conferences are also not be considered.
- The papers are limited to refereed papers in international conferences for now.
- This is not the case for survey papers.
- Pull Requests for adding papers are accepted. Please make a commit changing only lines regarding the addition of papers (and take care of changing by auto-formatting).
- You can also request to add papers as an issue.
It can also be viewed on GitHub Pages
- Surveys
- Shared Tasks
- Datasets
- Performance Measures
- Quality Estimation
- Models / Methods
- Ensemble Methods
- Strategies
- Data Augmentation
- Analyses
- Other Tools
- Spoken Domain
- Applications
- Projects
- Other Materials
- Related Tasks
- Other Languages
Title | Year | Page | Note |
---|---|---|---|
"Automated Grammatical Error Correction: A Comprehensive Review" | 2017 | [paper] | |
"A Comprehensive Survey of Grammar Error Correction" | 2020 | [paper] | |
"Recent Trends in the Use of Deep Learning Models for Grammar Error Handling" | 2020 | [paper] | |
"Grammatical Error Correction: A Survey of the State of the Art" | 2022 | [paper] |
Name | Year | Paper | Note |
---|---|---|---|
HOO 2011 | 2011 | [paper] | [website] |
HOO 2012 | 2012 | [paper] | [website] |
CoNLL-2013 | 2013 | [paper] | [website] |
CoNLL-2014 | 2014 | [paper] | [website] [system outputs] |
BEA-2019 | 2019 | [paper] | [website] [system outpus] |
Name | Year | Paper | Note |
---|---|---|---|
PIE-synthetic | 2019 | [Parallel Iterative Edit Models for Local Sequence Transduction] | [download] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Scoring by counting the errors | 2016 | [There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction] | [code] |
Fluency + grammaticality + meaning preservation | 2017 | [Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems] | |
USim | 2018 | [Reference-less Measure of Faithfulness for Grammatical Error Correction] | [code] |
SOME | 2020 | [SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction] | [code] |
Scribendi Score | 2021 | [Is this the end of the gold standard? A straightforward reference-less grammatical error correction metric] | [Unofficial code] |
IMPARA | 2022 | IMPARA: Impact-Based Metric for GEC Using Parallel Data | [code] |
2024 | Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Re-rank the CoNLL14 systems by human evaluation | 2015 | Human Evaluation of Grammatical Error Correction Systems | [code] |
Reassess M^2, I-measure, GLEU by comparing human evaluation | 2018 | [A Reassessment of Reference-Based Grammatical Error Correction Metrics] | [code] |
MAEGE | 2018 | Automatic Metric Validation for Grammatical Error Correction | [code] |
SEEDA | 2024 | Revisiting Meta-evaluation for Grammatical Error Correction | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2022 | Proficiency Matters Quality Estimation in Grammatical Error Correction |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
5-gram LM based approach | 2018 | [Language Model Based Grammatical Error Correction without Annotated Training Data] | [code] |
Train GRU models for each of five error types | 2018 | [A Simple but Effective Classification Model for Grammatical Error Correction] | |
Use Finite State Transducers | 2019 | [Neural Grammatical Error Correction with Finite State Transducers] | |
LSTM tagger for word coice task | 2019 | [Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems] | [code] |
Use LM (BERT, GPT-1,2) | 2019 | [The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction] | |
Create erroneous data from monolingual data | 2019 | [Minimally-Augmented Grammatical Error Correction] | Supervised setting is also performed |
LM-Critic | 2021 | [LM-Critic: Language Models for Unsupervised Grammatical Error Correction] | [code] Supervised setting is also performed |
2023 | Unsupervised Grammatical Error Correction Rivaling Supervised Methods | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Use MENT | 2014 | System Combination for Grammatical Error Correction | |
2016 | Grammatical Error Correction: Machine Translation and Classifiers | ||
2019 | [Learning to combine Grammatical Error Corrections] | [code] | |
Diversity-Driven Combination (DDC) | 2021 | [Diversity-Driven Combination for Grammatical Error Correction] | [code] |
Select a system for each error type with IP | 2021 | [System Combination for Grammatical Error Correction Based on Integer Programming] | [code] |
2022 | Frustratingly Easy System Combination for Grammatical Error Correction | [code] | |
GRECO | 2023 | System Combination via Quality Estimation for Grammatical Error Correction | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
A Self-Refinement Strategy for Noise Reduction | 2020 | [A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction] | |
cLang8 (Cleaned Lang-8) | 2021 | [A Simple Recipe for Multilingual Grammatical Error Correction] | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2019 | AUTOMATIC GRAMMATICAL ERROR DETECTION OF NON-NATIVE SPOKEN LEARNER ENGLISH | ||
2020 | Grammatical error detection in transcriptions of spoken English | ||
Disfluency detection (DD) model | 2020 | Spoken Language ‘Grammatical Error Correction’ | |
2022 | On Assessing and Developing Spoken ’Grammatical Error Correction’ Systems |
Name | Year | Paper | Note |
---|---|---|---|
GECko++ | [GECko+: a Grammatical and Discourse Error Correction Tool] | [website] [code] An English assiting tool. Correction grammatical error and re-ordering sentences automatically. |
|
MiSS | 2021 | [MiSS: An Assistant for Multi-Style Simultaneous Translation] | [website] [demo video] |
ALLECS | 2023 | ALLECS: A Lightweight Language Error Correction System | [website] [code] |
2023 | Doolittle: Benchmarks and Corpora for Academic Writing Formalization | [code] |
Name | Website |
---|---|
GramFormer | [GitHub] |
Name | Code | Note |
---|---|---|
Lang8-NAIST-extractor | [code] | Scripts for extracting error-correct pairs from the Lang-8 Corpus. |
M2Converter | [code] | Scripts for converting m2 file into source file and target file. |
EFCamDat-Preprocess | [code] |
Name | Paper | Note |
---|---|---|
NLP-progress | [website] The performance ranking on some datasets. |
|
A Crash Course in Automatic Grammatical Error Correction | [paper] | [materials] The tutorial about GEC in COLING2020. |
Chunngai/gec-papers | [github] The papers are being compiled around 2019-2020? |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2014 | [Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages] | ||
English grammar checker with feedback in Japanese | 2018 | [Grammatical Error Checker for Japanese Learners of English] | This is not a research as a feedback comment generation, but I classify it here for now |
2019 | [Toward a Task of Feedback Comment Generation for Writing Learning] | ||
2020 | [Creating Corpora for Research in Feedback Comment Generation] | ||
2021 | [Shared Task on Feedback Comment Generation for Language Learners] | ||
2023 | Template-guided Grammatical Error Feedback Comment Generation |
- Studies to explain the reasons for and intentions of error correction.
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
EXPECT | 2023 | Enhancing Grammatical Error Correction Systems with Explanations | [code] |
XGEC dataset | 2024 | Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction | [data] |
GEE | 2024 | GEE! Grammar Error Explanation with Large Language Models | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
TETRA | 2024 | Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Arabic Learner Corpus | 2013 | [Arabic Learner Corpus v1: A New Resource for Arabic Language Research] | [website] |
QALB | 2014 | [Large Scale Arabic Error Annotation: Guidelines and Framework] | [QALB Project Website] |
QALB 2014 Shared Task | 2014 | [The First QALB Shared Task on Automatic Text Correction for Arabic] | [website] |
QALB 2015 Shared Task | 2015 | [The Second QALB Shared Task on Automatic Text Correction for Arabic] | |
ARETA | 2021 | [Automatic Error Type Annotation for Arabic] | [code] |
2023 | Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation | [code] | |
2023 | Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction | [[code]] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2021 | [Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
AKCES-GEC dataset | 2019 | [Grammatical Error Correction in Low-Resource Scenarios] | [data] |
Grammar Error Correction Corpus for Czech (GECCC) | 2022 | Czech Grammar Error Correction with a Large and Diverse Corpus | [data] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2024 | Correcting Challenging Finnish Learner Texts With Claude, GPT-3.5 and GPT-4 Large Language Models | ||
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Greek Learner Corpus | 2018 | [Stand-off annotation in learner corpora: compiling the Greek Learner Corpus (GLC)] | |
ELERRANT | 2021 | [ELERRANT: Automatic Grammatical Error Type Classification for Greek] | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Falko-MERLIN dataset | 2018 | [Using Wikipedia Edits in Low Resource Grammatical Error Correction] | [data] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2014 | [Detection and correction of non word spelling errors in Hindi language] | ||
HiWikiEd dataset | 2020 | [Generating Inflectional Errors for Grammatical Error Correction in Hindi] | [data] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Byte-level approach | 2023 | Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
Character-level RNN-based seq2seq | 2018 | [Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation] | |
Constructing retrieval system for Japanese GEC | 2019 | [Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language] | |
TMU Evaluation Corpus for Japanese Learners | 2020 | [Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language] | [data: Fill this form] |
Non-Autoregressive approach | 2020 | [Non-Autoregressive Grammatical Error Correction Toward a Writing Support System] | |
2022 | Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
KAGAS | 2023 | Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation | [code] [data request form] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2022 | Towards Lithuanian grammatical error correction | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2020 | [Neural Grammatical Error Correction for Romanian] | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
RULEC-GEC dataset | 2019 | [Grammar Error Correction in Morphologically Rich Languages: The Case of Russian] | [data] |
RU-Lang8 dataset | 2021 | [New Dataset and Strong Baselines for the Grammatical Error Correction of Russian] | [data] |
Additional annotations for RULEC and RU-Lang8 | 2024 | Multi-Reference Benchmarks for Russian Grammatical Error Correction | [RULEC] [RU-Lang8] |
2024 | Universal Dependencies for Learner Russian | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
COWS-L2H | 2020 | [Developing NLP Tools with a New Corpus of Learner Spanish] | [data] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
2024 | Evaluation of Really Good Grammatical Error Correction | code |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
ERRANT-TR | 2023 | Towards Automatic Grammatical Error Type Classification for Turkish | [code] |
Keywords / Overview | Year | Paper | Note |
---|---|---|---|
UA-GEC | 2023 | [UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language] | [data] |
UNLP 2023 Shared Task | 2023 | The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian | |
2023 | Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction | UNLP-2023: Pravopysnyk | |
2023 | A Low-Resource Approach to the Grammatical Error Correction of Ukrainian | UNLP-2023: QC-NLP | |
2023 | RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans | UNLP-2023: WebSpellChecker |