Fine tuning a GPT-2 model on your own input text

First preprocess the data

Use the notebook data_cleanup/clean_harry_book.ipynb The clean up steps can be improved by more manual inspection of the quality of cleaned text. This is sufficient to get started with the training.

Copy cleaned train and val text to examples/input_data

Start Training the GPT-2 model

python run_lm_finetuning.py \
    --output_dir=output \
    --model_type=gpt2 \
    --model_name_or_path=gpt2-medium \
    --do_train \
    --train_data_file='input_data/train_harry.txt' \
    --do_eval \
    --eval_data_file='input_data/val_harry.txt'\
    --overwrite_output_dir\
    --block_size=200\
    --per_gpu_train_batch_size=1\
    --save_steps 5000\
    --num_train_epochs=2

Inference

cd examples

python run_generation.py --model_type gpt2 --model_name_or_path output --length 300 --prompt "Standing in the doorway, illuminated by the shivering flames in Lupin’s hand, was a cloaked figure that towered to the ceiling."

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data_cleanup		data_cleanup
examples		examples
transformers		transformers
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine tuning a GPT-2 model on your own input text

First preprocess the data

Start Training the GPT-2 model

Inference

About

Releases

Packages

Languages

rohinb/Deep-Learning

Folders and files

Latest commit

History

Repository files navigation

Fine tuning a GPT-2 model on your own input text

First preprocess the data

Start Training the GPT-2 model

Inference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages