Update training_args.py to fix runtime error #25900

sahel-sh · 2023-09-01T03:53:58Z

Fix File "..../transformers/training_args.py", line 1544, in post_init
for k, v in self.fsdp_config.items():
RuntimeError: dictionary keys changed during iteration

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Fix File "..../transformers/training_args.py", line 1544, in __post_init__ for k, v in self.fsdp_config.items(): RuntimeError: dictionary keys changed during iteration

ArthurZucker · 2023-09-01T11:53:43Z

Hey! Thanks for opening a PR 🤗 could you also share a reproducer? Would help determine whether this is an intended use case or not!

sahel-sh · 2023-09-02T00:29:06Z

From FastChat repo:

torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \
    --model_name_or_path lmsys/vicuna-7b-v1.5 \
    --data_path data/dummy_conversation.json \
    --bf16 True \
    --output_dir output_vicuna \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" --fsdp_config $HOME/fsdp_config.json \
    --tf32 True \
    --model_max_length 4096 \
    --gradient_checkpointing True \
    --lazy_preprocess True

Where fsdp_config.json is:

{
    "fsdp_auto_wrap_policy": "FULL_SHARD",
    "fsdp_transformer_layer_cls_to_wrap": "LlamaDecoderLayer"
}

Any command with this config file or any other config files that has json keys starting with "fsdp_" will reproduce the issue. The code is updating dictionary elements while iterating over it.

sahel-sh · 2023-09-02T00:34:44Z

I think #25920 is a better solution though.

sahel-sh · 2023-09-02T00:35:25Z

Closing this in favor of #25920

Update training_args.py to fix runtime error

e6e9a05

Fix File "..../transformers/training_args.py", line 1544, in __post_init__ for k, v in self.fsdp_config.items(): RuntimeError: dictionary keys changed during iteration

sahel-sh closed this Sep 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update training_args.py to fix runtime error #25900

Update training_args.py to fix runtime error #25900

sahel-sh commented Sep 1, 2023 •

edited

Loading

ArthurZucker commented Sep 1, 2023

sahel-sh commented Sep 2, 2023

sahel-sh commented Sep 2, 2023

sahel-sh commented Sep 2, 2023

Update training_args.py to fix runtime error #25900

Update training_args.py to fix runtime error #25900

Conversation

sahel-sh commented Sep 1, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

ArthurZucker commented Sep 1, 2023

sahel-sh commented Sep 2, 2023

sahel-sh commented Sep 2, 2023

sahel-sh commented Sep 2, 2023

sahel-sh commented Sep 1, 2023 •

edited

Loading