-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation refactor: new interface, new classes. #25061
Conversation
This commit proposes a new Generation interface, which allows for extensibility and for explicit declaration of the generation strategy, while keeping compatibility with old code and interface.
Hi @manueldeprada 👋 Thank you for the proposal, but this PR adds further parameterization to Related to this PR: we are internally discussing how to refactor Bear with us, we also want to make |
Thanks for your reply, @gante! It's great to hear that a comprehensive rethink of
To provide some context, my experience is primarily derived from porting example 1 to Huggingface Transformers. This necessitated forking the entire transformers library, which is not an ideal approach. Given that the reimagining of |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey @gante any progress in the internal discussion? As a quick reminder of what this thread was about, my PR tried to address the two main obstacles that the "decoding research" community encounters in transformers:
In the meantime, more libraries have appeared that work around this limitation of What is the current state of the discussion? Would you be open to adding a Also, would you accept a PR that, without adding new parametrization, decoupled |
Summary
This PR introduces an updated
generate()
interface for the Huggingface Transformers library. The update focuses on enhancing extensibility and enabling the explicit declaration of the generation strategy, while ensuring backward compatibility.Detailed Description
Introducing the "generation_strategy" Argument
In the existing
generate()
function, a user must pass arguments such asnum_beams=5, do_sample=True
to choose a specific generation strategy (in this case, beam sampling). This approach can be somewhat confusing, especially when aiming for a specific decoding strategy. For instance, one might assumedo_sample=False
by default. However, when a user changes the model, and the new model hasdo_sample=True
as the default, the intended generation method also inadvertently changes. See a previous PR for a scenario where this happened.This PR proposes a new parameter,
generation_strategy
, within thegenerate()
function. This addition allows the user to pass a string (greedy
,beam_search
,beam_sample
, ...) to explicitly choose the intended generation method. Alternatively, instead of a string, the user can pass a custom GenerationStrategy object as the parameter (more on this later). If the provided parameters are not compatible with the requested strategy, an Exception is raised, alerting the user to the discrepancy. This update does not modify the default behaviour of thegenerate()
function, nor does it break compatibility. To this end, I locally executed the generation tests, and they all pass with the same warnings (edit: I see they are not passing in CircleCI, I will investigate later).Enhancing Extensibility of Generation Strategies
While the Huggingface Transformers library is well-regarded for its extensibility, particularly regarding model innovations and implementations, the generation interface has lacked this quality to some degree.
Implementing a new generation strategy, like tweaking the Beam Search code, can be challenging. The associated code resides deep inside the
GenerationMixin
, a class that users cannot subclass. Additionally, there's no option to pass a custom BeamScorer togenerate()
.A potential workaround is subclassing the model and overriding the
generate()
method. However, this requires rewriting a substantial amount of code fromgenerate()
, with a complex network of dependencies withinGenerationMixin
that isn't clear to interact with. Thus, enhancing the extensibility and making the generation part more "hack-friendly" was an important motivation for this PR.Proposed Changes
With these considerations in mind, the PR proposes a new abstract class,
GenerationStrategy
(or alternativelyDecoder
, naming can be discussed), which defines a common interface for implementing anyGenerationStrategy
variant. Concrete strategies are referred to as "Decoders", such as the "BeamSearchDecoder".All existing strategies have been refactored into their respective
GenerationStrategy
class. This approach ensuresgenerate()
is agnostic to the decoding strategy and that each strategy checks its parameters and the generation config independently.Subsequently, the
generate()
function has been refactored to use the new classes. Facade methods likebeam_search()
, which merely instantiate and call the new Decoders, have been retained ingeneration/utils
for backwards compatibility.With this change, now it is possible to elegantly create a custom GenerationStrategy or subclass an existing strategy, and just pass the customized object to
generate()
. This will allow the emerging research in generation strategies to use HF (right now, you can see in the literature that fairseq is more common).New use case examples
Remaining Work
The proposed code in this PR currently lacks docstrings for the new classes, as it would be more appropriate to add these after finalizing the naming conventions and other details through discussion in this thread.
Additionally, the PR introduces changes to the library LazyImport init files, and feedback on best practices for working with Lazy imports would be greatly appreciated (as I don't have any experience). New tests to validate these changes will be added once the code receives some feedback.
Looking forward to your valuable feedback to improve this PR further.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
I see @gante @sgugger @patrickvonplaten @thomwolf very active in the git history for generation commits.