[`core` / `modeling`] Fix training bug with PEFT + GC #28031

younesbelkada · 2023-12-14T09:59:29Z

What does this PR do?

4.36.0 led to a bug when users are in the case of GC + training which should force-set use_cache to False here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1008 - which is force-set to True during the backward pass for some reason, only in the case where one uses PEFT + GC.

The fix is to force-set use_cache to False before computing past_key_value_length here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1042

cc @amyeroberts

amyeroberts

Thanks for fixing!

amyeroberts · 2023-12-14T10:37:22Z

src/transformers/models/deprecated/open_llama/modeling_open_llama.py

@@ -578,6 +578,13 @@ def forward(
        seq_length_with_past = seq_length
        past_key_values_length = 0

+        if self.gradient_checkpointing and self.training:


I don't see why moving the logic here should make a difference?

It is very unusual but apparently when using GC + training the backward pass calls again the forward pass but with a non-None past_key_value.
I think use_cache is set to True by default on all configs; therefore avoiding the block

if use_cache: use_legacy_cache = not isinstance(past_key_values, Cache) if use_legacy_cache: past_key_values = DynamicCache.from_legacy_cache(past_key_values) past_key_values_length = past_key_values.get_usable_length(seq_length)

to be called seems to be the right fix. I will add tests directly on PEFT as it is related to PEFT

would be nice to also get why we have to do this (why the forward re-runs) but static cache might actually do the same

) fix trainign bug

fix trainign bug

) fix trainign bug

DailyCasual · 2024-02-27T20:22:55Z

I am having this issue with Kohya Dreammaker as well. Downgrading to 4.27.1 has seemed to help. How do I go about actually fixing this issue? I have all the pre reqs installed and my models work, but I have no idea how to use python to fix this issue?

amyeroberts · 2024-02-27T22:02:59Z

@DailyCasual Could you open a new issue, detailing the error encountered, what you've experimented e.g. versions and their behaviour? This helps us better track what bugs are new and have been resolved.

fix trainign bug

03de60c

younesbelkada mentioned this pull request Dec 14, 2023

PEFT+gradient checkpointing causes attention mask shape mismatch during backward pass #28023

Closed

4 tasks

younesbelkada requested a review from amyeroberts December 14, 2023 10:01

younesbelkada changed the title ~~[core / modeling] Fix training bug with new cache refactor~~ [core / modeling] Fix training bug with PEFT + GC Dec 14, 2023

amyeroberts approved these changes Dec 14, 2023

View reviewed changes

younesbelkada merged commit 73de510 into huggingface:main Dec 14, 2023
19 checks passed

younesbelkada deleted the fix-training-bug branch December 14, 2023 11:20

iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023

[core / modeling] Fix training bug with PEFT + GC (huggingface#28031

673a2a6

) fix trainign bug

amyeroberts pushed a commit that referenced this pull request Dec 18, 2023

[core / modeling] Fix training bug with PEFT + GC (#28031)

d1dec79

fix trainign bug

oraluben mentioned this pull request Dec 22, 2023

Significant memory usage increase since 4.36 #28024

Closed

4 tasks

staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024

[core / modeling] Fix training bug with PEFT + GC (huggingface#28031

8b5ac53

) fix trainign bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core` / `modeling`] Fix training bug with PEFT + GC #28031

[`core` / `modeling`] Fix training bug with PEFT + GC #28031

younesbelkada commented Dec 14, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Dec 14, 2023

younesbelkada Dec 14, 2023

ArthurZucker Dec 14, 2023

DailyCasual commented Feb 27, 2024

amyeroberts commented Feb 27, 2024

[core / modeling] Fix training bug with PEFT + GC #28031

[core / modeling] Fix training bug with PEFT + GC #28031

Conversation

younesbelkada commented Dec 14, 2023 • edited Loading

What does this PR do?

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Dec 14, 2023

Choose a reason for hiding this comment

younesbelkada Dec 14, 2023

Choose a reason for hiding this comment

ArthurZucker Dec 14, 2023

Choose a reason for hiding this comment

DailyCasual commented Feb 27, 2024

amyeroberts commented Feb 27, 2024

[`core` / `modeling`] Fix training bug with PEFT + GC #28031

[`core` / `modeling`] Fix training bug with PEFT + GC #28031

younesbelkada commented Dec 14, 2023 •

edited

Loading