Bartosz Cywiński & Łukasz Staniszewski (Warsaw Univerity of Technology)
Task was to investigate impact of transliteration on Machine Translation from Arabic to English. Full documentation written in Polish is here.
-
To download all data and preprocess it go to notebooks/data_preprocessing.ipynb and use it to get all processed data in data/processed/
-
To get your processed data sentencepieced, go to model_scripts, and, using python environment with fairseq installed, run bash script:
$ bash train_decode_bpe_sentencepiece.sh
- Now you need to binarize all data to have it work with fairseq - run bash script:
$ bash preprocess_fairseq.sh
- Now start learning of model:
$ bash train_fairseq.sh
- Generate model predictions:
$ bash generate_fairseq.sh
- To evaluate metrics go to notebooks/ and run metrics.ipynb