PyTorch Lightning pipeline for test task based on RussianSuperGLUE benchmark.
Main frameworks used:
Used models:
ruRoberta-large
Used datasets:
- bin/ - executable files
- ruroberta_train.py - main train loop
- ruroberta_validation.py - model estimation on the val dataset
- ruroberta_test.py - model inference on the test dataset
- ruroberta_inference.py - model inference on individual examples
- cfg/ - configs with experiment settings
- augmentation
- callbacks
- datamodule
- inference
- logging
- loss
- metric
- model
- optimizer
- private
- scheduler
- trainer
- training
- data/ - datasets
- notebooks/ - jupyter-notebooks
- outputs/ - results of experiments
- pipeline/ - main pipeline code
- callbacks - various callbacks
- datamodules - LightningDataModule classes
- datasets - PyTorch dataset classes
- losses - custom losses
- metrics - custom metrics
- models - PyTorch model classes
- schedulers - lr schedulers
- wrappers - pytorch-lightning model wrappers
- src/ - various utilities and support functions
- requirements.txt
- ruRoberta-large fine-tuned on TERRa dataset
best model train/val/test accuracy: 0.935/0.850/0.786
- ruRoberta-large fine-tuned on RUSSE dataset
best model train/val/test accuracy: 0.969/0.886/0.726
- make multi-task learning
- try to fine-tune ruT5-large
- use Label Smoothing Loss
First, install dependencies
# clone project
git clone https://github.com/Thurs88/russian_superglue_task
# install project
cd russian_superglue_task
pip install pre-commit
pip install -r requirements.txt
pre-commit install
Next, navigate to any file and run it.
- train models
For example, this command will run training on TERRa dataset:
>>> python bin/ruroberta_train.py --config-name='ruroberta_terra_config'
- validate models
>>> python bin/ruroberta_terra_validation.py
- test models and make submission
>>> python bin/ruroberta_terra_test.py
- inference
>>> python bin/ruroberta_inference.py --task_name=terra
Example:
Введите sentence1: Гвардейцы подошли к грузовику, который как оказалось, попросту сломался.
Введите sentence2: Гвардейцы подошли к сломанному грузовику.
Predicted label: entailment