Skip to content

Latest commit

 

History

History
31 lines (27 loc) · 1.45 KB

README.md

File metadata and controls

31 lines (27 loc) · 1.45 KB

mlqe-pe

Multilingual Quality Estimation and Automatic Post-editing Dataset. This is an updated version of the MLQE dataset to include post-editing data, as well Ru-En data. Please refer to the MLQE repo for the NMT models that generated the data. The multilingual NMT models used to generate translations for the zero-shot language pairs can be found here: mBART50 (many-to-one for Ps-En and Km-En, and one-to-many for En-Cs and En-Ja).

Citation

If you use this data in your work, please cite:

@article{fomicheva2020mlqepe,
    title={{MLQE-PE}: A Multilingual Quality Estimation and Post-Editing Dataset}, 
    author={Marina Fomicheva and Shuo Sun and Erick Fonseca and Fr\'ed\'eric Blain and Vishrav Chaudhary and Francisco Guzm\'an and Nina Lopatina and Lucia Specia and Andr\'e F.~T.~Martins},
    year={2020},
    journal={arXiv preprint arXiv:2010.04480}
}
@article{tacl2020,
    title = {Unsupervised Quality Estimation for Neural Machine Translation},
    author = {Fomicheva, Marina and Sun, Shuo and Yankovskaya, Lisa and Blain, Frédéric and Guzmán, Francisco and Fishel, Mark and Aletras, Nikolaos and Chaudhary, Vishrav and Specia, Lucia},
    journal = {Transactions of the Association for Computational Linguistics},
    volume = {8},
    pages = {539-555},
    year = {2020}
}