Skip to content
This repository has been archived by the owner on Aug 14, 2021. It is now read-only.

Finalize the input corpus pairs #9

Closed
soumendrak opened this issue Nov 22, 2019 · 3 comments
Closed

Finalize the input corpus pairs #9

soumendrak opened this issue Nov 22, 2019 · 3 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@soumendrak
Copy link
Owner

soumendrak commented Nov 22, 2019

The input English-Odia pairs need to be finalized.
With how many pairs we are going to start.

We should go ahead with 5k curated high-quality pairs.
It's fine if the pairs are sentences, phrases or words.

The pairs need to be retrieved from all the Individual files and the Combined file.

The final dataset will be added into this repository.

@soumendrak soumendrak self-assigned this Nov 22, 2019
@soumendrak soumendrak added good first issue Good for newcomers help wanted Extra attention is needed labels Nov 22, 2019
@soumendrak soumendrak added this to the Feb-02/02/2020 release milestone Nov 22, 2019
@soumendrak
Copy link
Owner Author

Mozilla Pontoon English Odia pairs have been added to the consolidated file

The Parallel pairs count have been increased to 11,805.

@a-parida12
Copy link

Is anyone working on this? Do you think I can work on this?

@soumendrak
Copy link
Owner Author

@a-parida12 this one is completed.
You can see this folder for details:
https://github.com/soumendrak/MTEnglish2Odia/tree/master/data/output/organised

Thanks for reaching out. You may look into the other issues or help to prepare step by step instructions for possible projects. Thank you.
CC: @subhadarship

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants