-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] ValueError: optimizer got an empty parameter list #12
Comments
Since the final EE layer is located in Stage 2, subsequent pipeline stages do not contain an EE layer, hence there are no parameters to optimize in these later stages. You only need to set |
Additionally, bear in mind that after fine-tuning with the aforementioned approach, your final output files will only contain parameters from the first two pipeline stages. You will need to manually merge the parameter folders of the last two pipeline stages from the original checkpoint path with the folders of the first two stages generated by the fine-tuning process to obtain a complete checkpoint. |
I modified the script Looking for your early response :) #!/bin/bash
PROJECT_NAME=EE-TUNE
GROUP_NAME=llama-2-17B-chat-1-EXIT-pt
CURRENT_TIME=`date "+%m%d-%H%M"`
MASTER_NAME=${CURRENT_TIME}
export CUDA_DEVICE_MAX_CONNECTIONS=1
export OMP_NUM_THREADS=4
# Checkpoint configuration
LOAD_PATH=/data3/lk/EE-LLM/model/ee_llm_format/llama-2-7b-chat # your checkpoint path
TOKENIZER_PATH=/data3/lk/llm/model/Llama-2-7b-chat-hf/tokenizer.model # your tokenizer path
CHECKPOINT_PATH=/data3/lk/EE-LLM/model/checkpoints # checkpoint save path
# Data configuration
DATA_HOME=
DATASET_ARXIV=${DATA_HOME}/redpajama-arxiv/all
DATASET_BOOKS=${DATA_HOME}/redpajama-book/all
DATASET_C4=${DATA_HOME}/redpajama-c4/all
DATASET_CC=${DATA_HOME}/redpajama-cc/all
DATASET_STACKEXCHANGE=${DATA_HOME}/redpajama-pile-stackexchange/all
DATASET_CODE=${DATA_HOME}/redpajama-stack-code/all
DATASET_WIKIPEDIA=${DATA_HOME}/redpajama-wiki/all
DATASET_PILE_EUROPARL=${DATA_HOME}/the-pile-europarl/all
DATASET_PILE_FREELAW=${DATA_HOME}/the-pile-freelaw/all
DATASET_PILE_HACKERNEWS=${DATA_HOME}/the-pile-hackernews/all
DATASET_PILE_NIH=${DATA_HOME}/the-pile-nih/all
DATASET_PILE_PHILPAPER=${DATA_HOME}/the-pile-philpaper/all
DATASET_PILE_PMA=${DATA_HOME}/the-pile-pubmed-abstract/all
DATASET_PILE_PMC=${DATA_HOME}/the-pile-pubmed-central/all
DATASET_PILE_USPTO=${DATA_HOME}/the-pile-uspto/all
DATA_PATH="\
0.0362 ${DATASET_ARXIV} \
0.0657 ${DATASET_BOOKS} \
0.2264 ${DATASET_C4} \
0.4491 ${DATASET_CC} \
0.0246 ${DATASET_STACKEXCHANGE} \
0.0810 ${DATASET_CODE} \
0.0548 ${DATASET_WIKIPEDIA} \
0.0010 ${DATASET_PILE_EUROPARL} \
0.0162 ${DATASET_PILE_FREELAW} \
0.0006 ${DATASET_PILE_HACKERNEWS} \
0.0005 ${DATASET_PILE_NIH} \
0.0006 ${DATASET_PILE_PHILPAPER} \
0.0065 ${DATASET_PILE_PMA} \
0.0318 ${DATASET_PILE_PMC} \
0.0050 ${DATASET_PILE_USPTO} \
"
NLAYERS=32
HIDDEN=4096
HEADS=32
SEQ=2048
FFN_SIZE=11008
TP=1
PP=4 # Set pipeline model parallel size to 1
MICRO_BATCH=4 # Reduce batch size for single GPU
GLOBAL_BATCH=16
MASTER_ADDR=127.0.0.1
MASTER_PORT=5901
WORLD_SIZE=1
RANK=0
NPROC_PER_NODE=4 # Set number of processes per node to 1
TRAIN_ITER=40000
EVAL_INTERVAL=50000
SAVE_INTERVAL=20000
DIST_ARGS="
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
--nproc_per_node $NPROC_PER_NODE \
--nnodes $WORLD_SIZE \
--node_rank $RANK \
"
GPT_ARGS="
--tensor-model-parallel-size $TP \
--pipeline-model-parallel-size $PP \
--query-key-layer-scaling \
--num-layers $NLAYERS \
--hidden-size $HIDDEN \
--num-attention-heads $HEADS \
--seq-length $SEQ \
--max-position-embeddings $SEQ \
--micro-batch-size $MICRO_BATCH \
--global-batch-size $GLOBAL_BATCH \
--lr 0.0001 \
--train-iters $TRAIN_ITER \
--min-lr 1.0e-5 \
--lr-warmup-fraction .01 \
--adam-beta1 0.9 \
--adam-beta2 0.95 \
--adam-eps 1e-5 \
--clip-grad 1.0 \
--bf16 \
--disable-bias-linear \
--use-flash-attn \
--normalization RMSNorm \
--position-embedding-type rope \
--swiglu \
--untie-embeddings-and-output-weights \
--padded-vocab-size 32000 \
--ffn-hidden-size $FFN_SIZE \
--finetune \
--tune-exit \
--untie-exit-output-weights \
--use-exit-norm \
--use-exit-mlp \
--tune-exit-pipeline-parallel-size 2 \
--exit-layer-nums 10 \
"
DATA_ARGS="
--data-path $DATA_PATH \
--tokenizer-type Llama2Tokenizer \
--tokenizer-model $TOKENIZER_PATH \
--split 990,9,1 \
"
# OUTPUT_ARGS_BAK="
# --log-interval 10 \
# --log-timers-to-tracker \
# --save-interval $SAVE_INTERVAL \
# --eval-interval $EVAL_INTERVAL \
# --eval-iters 1 \
# --wandb-project $PROJECT_NAME \
# --wandb-group $GROUP_NAME \
# --wandb-exp-name $MASTER_NAME \
# "
OUTPUT_ARGS="
--log-interval 10 \
--log-timers-to-tracker \
--save-interval $SAVE_INTERVAL \
--eval-interval $EVAL_INTERVAL \
--eval-iters 1 \
"
CUR_DIR=$(cd $(dirname "$0") && pwd)
MEGATRON_ROOT_PATH=$(cd "$CUR_DIR/../../.." && pwd)
cd $MEGATRON_ROOT_PATH
torchrun $DIST_ARGS \
pretrain_early_exit_gpt.py \
$GPT_ARGS \
$DATA_ARGS \
$OUTPUT_ARGS \
--load $LOAD_PATH \
--save $CHECKPOINT_PATH
|
After investigation, this indeed is a bug, and we will address it in future updates. The bug arises when using
|
Thanks a lot!! |
Marking as stale. No activity in 60 days. |
Describe the bug
I use llama-2 7b, and when I start stage 2 in EE-Tuning, the bug occurs.
To Reproduce
here is
llama2_7B_1_exit_mlp_pt.sh
I modified:Expected behavior
A clear and concise description of what you expected to happen.
Stack trace/logs
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: