Generate: fix assisted generation with `past_key_values` passed as kwargs #31644

gante · 2024-06-26T17:12:31Z

What does this PR do?

This PR:

Fixes assisted generation when generate is called with the past_key_values kwarg (contrarily to other kwargs, this one shouldn't be passed to the assistant model, as it is the cache of the main model)
Renames maximum_length to max_length in the newly added DynamicCache.crop function (max_length is the common variable name to depict a maximum length, standardizes API before the function is released)

ArthurZucker

LGTM

ArthurZucker · 2024-06-26T17:20:25Z

src/transformers/generation/candidate_generator.py

                assistant_kwargs[key] = (
                    value.detach().to(device) if isinstance(value, torch.Tensor) else copy.deepcopy(value)
                )

-        # Remove potential default DynamicCache if assistant does not support it


were tests failing because of this?

@ArthurZucker assisted generation if we passed past_key_values to generate was! The assistant should not copy the cache from the main model by default (cuz they will likely have different decoders)

amyeroberts

LGTM - thanks for fixing!

amyeroberts · 2024-06-26T17:22:31Z

src/transformers/cache_utils.py

-    def crop(self, maximum_length: int):
-        """Crop the past key values up to a new `maximum_length` in terms of tokens. `maximum_length` can also be
-        negative to remove `maximum_length` tokens. This is used in assisted decoding and contrastive search."""
+    def crop(self, max_length: int):


What's the reason for changing the name?

max_length seems more consistent with the rest of the repo, but wondering if there's another reason?

none, just consistency 🤗 It wasn't caught in a former PR, fixing it now before the function gets released :)

gante · 2024-06-26T17:24:14Z

Thanks for the quick review 💛

HuggingFaceDocBuilderDev · 2024-06-26T17:31:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

assisted gen fixes

29cd9aa

gante mentioned this pull request Jun 26, 2024

Add interactive chat session huggingface/local-gemma#1

Merged

gante requested a review from amyeroberts June 26, 2024 17:13

ArthurZucker approved these changes Jun 26, 2024

View reviewed changes

amyeroberts approved these changes Jun 26, 2024

View reviewed changes

gante merged commit a3fb96a into huggingface:main Jun 26, 2024
21 checks passed

gante deleted the past_kv_assisted_gen branch June 26, 2024 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: fix assisted generation with `past_key_values` passed as kwargs #31644

Generate: fix assisted generation with `past_key_values` passed as kwargs #31644

gante commented Jun 26, 2024 •

edited

Loading

ArthurZucker left a comment

ArthurZucker Jun 26, 2024

gante Jun 26, 2024 •

edited

Loading

amyeroberts left a comment

amyeroberts Jun 26, 2024

gante Jun 26, 2024

gante commented Jun 26, 2024

HuggingFaceDocBuilderDev commented Jun 26, 2024

Generate: fix assisted generation with past_key_values passed as kwargs #31644

Generate: fix assisted generation with past_key_values passed as kwargs #31644

Conversation

gante commented Jun 26, 2024 • edited Loading

What does this PR do?

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jun 26, 2024

Choose a reason for hiding this comment

gante Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jun 26, 2024

Choose a reason for hiding this comment

gante Jun 26, 2024

Choose a reason for hiding this comment

gante commented Jun 26, 2024

HuggingFaceDocBuilderDev commented Jun 26, 2024

Generate: fix assisted generation with `past_key_values` passed as kwargs #31644

Generate: fix assisted generation with `past_key_values` passed as kwargs #31644

gante commented Jun 26, 2024 •

edited

Loading

gante Jun 26, 2024 •

edited

Loading