Bartosz Cywiński & Łukasz Staniszewski (Warsaw Univerity of Technology)
Task was to investigate impact of transliteration on Machine Translation from Arabic to English. Full documentation written in Polish is here.
To download all data and preprocess it go to notebooks/data_preprocessing.ipynb and use it to get all processed data in data/processed/
To get your processed data sentencepieced, go to model_scripts, and, using python environment with fairseq installed, run bash script:
$ bash
- Now you need to binarize all data to have it work with fairseq - run bash script:
$ bash
- Now start learning of model:
$ bash
- Generate model predictions:
$ bash
- To evaluate metrics go to notebooks/ and run metrics.ipynb