Impact of transliteration on Arabic to English Machine Translation

Task

Task was to investigate impact of transliteration on Machine Translation from Arabic to English. Full documentation written in Polish is here.

To download all data and preprocess it go to notebooks/data_preprocessing.ipynb and use it to get all processed data in data/processed/
To get your processed data sentencepieced, go to model_scripts, and, using python environment with fairseq installed, run bash script:

$ bash train_decode_bpe_sentencepiece.sh

Now you need to binarize all data to have it work with fairseq - run bash script:

$ bash preprocess_fairseq.sh

$ bash train_fairseq.sh

$ bash generate_fairseq.sh

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
checkpoints		checkpoints
data		data
doc		doc
logs		logs
model_scripts		model_scripts
notebooks		notebooks
predictions		predictions
tools		tools
.gitignore		.gitignore
README.md		README.md