All `fairseq` Modifications for GEC

All newly added files have the following prefix at the beginning:

###############################################################################
# CUSTOM MODULE FOR GEC
###############################################################################

Among all changes, the only modification that requires a package rebuild is fairseq/data/token_labeled_language_pair_dataset.py and its import in fairseq/data/__init__.py, because datasets are not registered separately. Please run:

pip install --upgrade --editable .

Modifications

eval_lm_fp16.py: single-line edit for fp16 lm evaluation
fairseq/models/copy_augmented_transformer_el.py: copy-augmented transformer + edit label prediction model definition
fairseq/data/token_labeled_language_pair_dataset.py: custom dataset loader for "m3" (i.e. ori-cor sentence pairs along with token-level edit labels)
fairseq/data/__init__.py: include token_labeled_language_pair_dataset in the module definition (somehow there's no registry for datasets)
fairseq/criterion/gec_loss.py: weighted cross-entropy using target-side edit labels, along with an auxiliary source-side edit label prediction loss.
fairseq/tasks/gec.py: define a GEC task using custom models, datasets, and losses
fairseq/sequence_copygenerator.py: a "fork" of fairseq/sequence_generator.py that also keeps track of & returns copy scores in decoding
generate_or_copy.py: generation with <unk>'s replaced based on copy scores
fairseq/scripts/test_gec_modules.py: unit tests for newly created modules
lm_scorer.py: scoring using pre-trained neural language models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODIFICATIONS.md

MODIFICATIONS.md

All `fairseq` Modifications for GEC

Modifications

Files

MODIFICATIONS.md

Latest commit

History

MODIFICATIONS.md

File metadata and controls

All fairseq Modifications for GEC

Modifications

All `fairseq` Modifications for GEC