Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨🚨 Generate: change order of ops in beam sample to avoid nans #26843

Merged
merged 4 commits into from
Oct 17, 2023

Conversation

gante
Copy link
Member

@gante gante commented Oct 16, 2023

What does this PR do?

Fixes #26332 -- see this comment for a full explanation.

TL;DR: in beam_sample, logits_warper are now applied BEFORE adding the beam scores. We have been postponing this change to avoid introducing output differences, but the truth is that the order of operations causes issues (e.g. 1 2)

This is technically a bug fix (beam_sample is unusable for long generations with temperature < 1.0 before this change), hence the lack of a deprecation cycle. However, it may alter some generated outputs (short generations with low temperature), hence the 🚨🚨.

Please note that other common operators, like top_p and top_k, are unaltered by this change.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 16, 2023

The documentation is not available anymore as the PR was closed or merged.

@gante gante changed the title 🚨🚨 Generate: change order of ops in beam sample 🚨🚨 Generate: change order of ops in beam sample to avoid nans Oct 16, 2023
@gante gante marked this pull request as ready for review October 16, 2023 18:15
@gante gante requested a review from ArthurZucker October 16, 2023 18:15
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left small comments, LGTM

src/transformers/generation/utils.py Outdated Show resolved Hide resolved
src/transformers/generation/utils.py Show resolved Hide resolved
src/transformers/generation/tf_utils.py Show resolved Hide resolved
@gante gante merged commit 4b423e6 into huggingface:main Oct 17, 2023
3 checks passed
@gante gante deleted the beam_sample_order_of_ops branch October 17, 2023 09:32
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beam Search Fails for Llama 70b
3 participants