ScilitBERT Example

ScilitBERT Example

In this repository, you will find:

A Jupyter notebook explaining how to load ScilitBERT and its tokenizer using Hugging Face
How to test the Mask filling feature
A dataset for fine-tuning on the Journal Finder task
A notebook to quick start the fine-tuning.

What is ScilitBERT?

ScilitBERT is a BERT model for academic language representation developed by MDPI. The training data is extracted from Scilit. for more details check the paper available at: (not available at the moment)

Getting started

you can run the init script in the root of the repository to:
1. get the model without the Journal-Finder task dataset (you will be able to run the example_mlm notebook):
```
  chmod +x init.sh
  ./init.sh --target model
```
1. get the dataset without the ScilitBERT pre-trained model (you will not be able to run any of the notebook):
```
   chmod +x init.sh
   ./init.sh --target dataset
```
1. get both the model and the dataset (you will be able to run both the masked_mlm and the fine_tuning_journal_finder notebooks)
```
  chmod +x init.sh
  ./init.sh --target both
```
Get access to a Jupyter environment
Install a PyTorch version addapted to your CUDA version. (or run it on CPU, it is a no go for fine-tuning).
Install dependencies in your python environment using pip or anaconda

pip install -r ./requirements.txt

Masked token prediction

If you followed the getting started steps and used the init script to dwnload the model, you can now explore the notebook: notebooks/example_mlm.ipynb

Fine-Tuning on the Journal Finder task

A fine-tuning quick start notebook on the Journal Finder task is given: fine tuning example

The hyper-parameters can be managed in the fine-tune function found in utils.

The fine tuned models are stored in the results folder (to rerun an experiment change the output folder or delete the previous output folder content.)

A csv describing the model performances on the test set will be generated in the file /evaluation_results/journal_finder_output.csv the first row describes the f1-score the following rows describe the top-k macro averaged accuracies for k ranging from 1 to 10.

Contribute

You can contribute to this work by:

Helping to make the model ready for publication on the Hugging Face model base.
Finding good hyper-paremeters for the fine-tuning on the Journal-Finder task.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
init.sh		init.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScilitBERT Example

What is ScilitBERT?

Getting started

Masked token prediction

Fine-Tuning on the Journal Finder task

Contribute

About

Releases

Packages

Contributors 2

Languages

License

JeanBaptiste-dlb/ScilitBERT

Folders and files

Latest commit

History

Repository files navigation

ScilitBERT Example

What is ScilitBERT?

Getting started

Masked token prediction

Fine-Tuning on the Journal Finder task

Contribute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages