This example demonstrates how to fine-tune the GPT-2 network on the WikiText2 dataset.
A pre-trained GPT-2 network is instantiated from the library of standard models, and applied to an instance of the WikiText dataset. A custom training loop is defined, and the training and test losses and accuracies for each epoch are shown during training.
To begin, you'll need the latest version of Swift for
TensorFlow
installed. Make sure you've added the correct version of swift
to your path.
To train the model, run:
cd swift-models
swift run -c release GPT2-WikiText2