Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider rebalancing datasets with clustering #844

Open
Tracked by #216
eu9ene opened this issue Sep 12, 2024 · 0 comments
Open
Tracked by #216

Consider rebalancing datasets with clustering #844

eu9ene opened this issue Sep 12, 2024 · 0 comments
Labels
quality Improving robustness and translation quality

Comments

@eu9ene
Copy link
Collaborator

eu9ene commented Sep 12, 2024

See paper: Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach.

This can be helpful for example for monolingual data where we have a lot of it ( all en-xx language pairs).

Related to #231

@eu9ene eu9ene added the quality Improving robustness and translation quality label Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quality Improving robustness and translation quality
Projects
None yet
Development

No branches or pull requests

1 participant