Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
totw_nlp_dat_aug.ipynb		totw_nlp_dat_aug.ipynb

README.md

Transformer-based Data Augmentation

Ever struggled with having a limited non-English NLP dataset for a project? 🤯 Fear not, data augmentation to the rescue ⛑️ In this week's tip, we look at backtranslation 🔀 and contextual word embedding insertions as data augmentation techniques for multilingual NLP. We'll be using the MarianMT and distilled BERT pre-trained models, available on huggingface.

The training size will impact the performace of a model heavily, this notebook looks into the possibilities of performing data augmentation on an NLP dataset. Data augmentation techniques are used to generate additional samples. Data augmentation is already standard practice in computer vision projects 👌, but can also be leveraged in multilingual NLP problems.

We recommend to open the notebook using Colab for an interactive explainable experience and optimal rendering of the visuals 👇:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021_06_18_data_augmentation

2021_06_18_data_augmentation

README.md

Transformer-based Data Augmentation

Files

2021_06_18_data_augmentation

Directory actions

More options

Directory actions

More options

Latest commit

History

2021_06_18_data_augmentation

Folders and files

parent directory

README.md

Transformer-based Data Augmentation