Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Add new problem: Macedonian to English (SETimes corpus) #158

Merged
merged 3 commits into from
Jul 18, 2017
Merged

Add new problem: Macedonian to English (SETimes corpus) #158

merged 3 commits into from
Jul 18, 2017

Conversation

stefan-it
Copy link
Contributor

@stefan-it stefan-it commented Jul 15, 2017

Hi,

this PR adds a new problem called setimes_mken_tokens_32k. It is now possible to build a neural machine translation system for Macedonian to English.

The SETimes corpus is used. It consists of 207,777 parallel sentences. The training set uses 205,777 sentences and the development set has a size of 1,000 sentences.

For a test set with 1,000 sentences a BLEU-score of 52.77 (using Moses' multi-bleu.perl) can be achieved.

Copy link
Contributor

@lukaszkaiser lukaszkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants