Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

official code lead to OOM ! #2086

Closed
Xuekai-Zhu opened this issue Jul 26, 2023 · 3 comments
Closed

official code lead to OOM ! #2086

Xuekai-Zhu opened this issue Jul 26, 2023 · 3 comments

Comments

@Xuekai-Zhu
Copy link

With transformers=3.31.0 and accelerate=0.22.0.dev0 (latest main) On 2 x A6000 48G, fine-tuning LLaMA 7B , the warning is:

FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
FSDP Warning: When using FSDP, it is efficient and recommended to call prepare for the model before creating the optimizer.
FSDP Warning: When using FSDP, several parameter groups will be conflated into a single one due to nested module wrapping and parameter flattening.

And i use the default command:

torchrun --nproc_per_node=2 --master_port=20001 fastchat/train/train_mem.py \
    --model_name_or_path /root/zhuxuekai/data_pruning/pre-trained-model/huggyllama/llama-7b  \
    --data_path data/dummy_conversation.json \
    --bf16 True \
    --output_dir output_vicuna \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --disable_tqdm False

I install fastchat from source.

And i also try your solution, it is also not worked.

I found you have the problem, please help me to fix this !

@Xuekai-Zhu
Copy link
Author

Xuekai-Zhu commented Jul 26, 2023

I use this hacking method to revise the code in trainer.py ; the warning is disappeared but still hit OOM;

Still warning:

FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead

the following warning disappear:

FSDP Warning: When using FSDP, it is efficient and recommended to call prepare for the model before creating the optimizer.
FSDP Warning: When using FSDP, several parameter groups will be conflated into a single one due to nested module wrapping and parameter flattening.

@lw3259111
Copy link

lw3259111 commented Jul 27, 2023

flash-attn v1 is work, but flash-attn v2 oom(same code)

@merrymercy
Copy link
Member

Try to use transformers==4.28.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants