Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.32 KB

README.md

File metadata and controls

33 lines (26 loc) · 1.32 KB

Impact of transliteration on Arabic to English Machine Translation

Bartosz Cywiński & Łukasz Staniszewski (Warsaw Univerity of Technology)

banner

Task

Task was to investigate impact of transliteration on Machine Translation from Arabic to English. Full documentation written in Polish is here.

Instalation:

  1. To download all data and preprocess it go to notebooks/data_preprocessing.ipynb and use it to get all processed data in data/processed/

  2. To get your processed data sentencepieced, go to model_scripts, and, using python environment with fairseq installed, run bash script:

$ bash train_decode_bpe_sentencepiece.sh
  1. Now you need to binarize all data to have it work with fairseq - run bash script:
$ bash preprocess_fairseq.sh
  1. Now start learning of model:
$ bash train_fairseq.sh
  1. Generate model predictions:
$ bash generate_fairseq.sh
  1. To evaluate metrics go to notebooks/ and run metrics.ipynb