This document provides instructions for pretraining model of Yuan2.0.
Three models are provided, the main parameters are as follows:
Layer number | Hidden size | Attention head | |
---|---|---|---|
2B | 24 | 2048 | 32 |
51B | 42 | 8192 | 64 |
102B | 84 | 8192 | 64 |
The scripts describe three models in Yuan2.0:
-
51B :
pretrain_yuan2.0_51B.sh
-
102B :
pretrain_yuan2.0_102B.sh
An example script to run Yuan-2.1B pretraining is:
bash examples/pretrain_yuan2.0_2.1B.sh
Before running the script, the relevant arguments should be set correctly.
Firstly, make any desired modifications including setting the environment variables for CHECKPOINT_PATH
, DATA_PATH
, TOKENIZER_MODEL_PATH
and TENSORBOARD_PATH
.
If the dataset path is:
/path/dataset.bin
The DATA_PATH
can be set :
#DATA_PATH='weight dataset_path'
DATA_PATH='1 /path/dataset'
The dataset preprocess can see documentation here.
A simple and efficient three-dimensional model-parallel approach can be controlled by --tensor-model-parallel-size
and --pipeline-model-parallel-size
flag. If the --pipeline-model-parallel-method
flag is set to block
, the number of transformer layers shoule be specified by the --pipeline-model-parallel-blocks
for each pipeline stage.
The Localized Filtering-based Attention(LFA) can be activated by the '--use-lf-gate
flag. And the --lf-conv2d-num-pad
flag shoule be set to 1
for training and 0
for inference.
The --use-distributed-optimizer
and --recompute-method
can control the use of memory during Training.
Further command line arguments are described in the source file arguments.py
and Megatron-LM