Generation refactor: new interface, new classes. #25061

manueldeprada · 2023-07-24T22:46:14Z

Summary

This PR introduces an updated generate() interface for the Huggingface Transformers library. The update focuses on enhancing extensibility and enabling the explicit declaration of the generation strategy, while ensuring backward compatibility.

Detailed Description

Introducing the "generation_strategy" Argument

In the existing generate() function, a user must pass arguments such as num_beams=5, do_sample=True to choose a specific generation strategy (in this case, beam sampling). This approach can be somewhat confusing, especially when aiming for a specific decoding strategy. For instance, one might assume do_sample=False by default. However, when a user changes the model, and the new model has do_sample=True as the default, the intended generation method also inadvertently changes. See a previous PR for a scenario where this happened.

This PR proposes a new parameter, generation_strategy, within the generate() function. This addition allows the user to pass a string (greedy, beam_search, beam_sample, ...) to explicitly choose the intended generation method. Alternatively, instead of a string, the user can pass a custom GenerationStrategy object as the parameter (more on this later). If the provided parameters are not compatible with the requested strategy, an Exception is raised, alerting the user to the discrepancy. This update does not modify the default behaviour of the generate() function, nor does it break compatibility. To this end, I locally executed the generation tests, and they all pass with the same warnings (edit: I see they are not passing in CircleCI, I will investigate later).

Enhancing Extensibility of Generation Strategies

While the Huggingface Transformers library is well-regarded for its extensibility, particularly regarding model innovations and implementations, the generation interface has lacked this quality to some degree.

Implementing a new generation strategy, like tweaking the Beam Search code, can be challenging. The associated code resides deep inside the GenerationMixin, a class that users cannot subclass. Additionally, there's no option to pass a custom BeamScorer to generate().

A potential workaround is subclassing the model and overriding the generate() method. However, this requires rewriting a substantial amount of code from generate(), with a complex network of dependencies within GenerationMixin that isn't clear to interact with. Thus, enhancing the extensibility and making the generation part more "hack-friendly" was an important motivation for this PR.

Proposed Changes

With these considerations in mind, the PR proposes a new abstract class, GenerationStrategy (or alternatively Decoder, naming can be discussed), which defines a common interface for implementing any GenerationStrategy variant. Concrete strategies are referred to as "Decoders", such as the "BeamSearchDecoder".

All existing strategies have been refactored into their respective GenerationStrategy class. This approach ensures generate() is agnostic to the decoding strategy and that each strategy checks its parameters and the generation config independently.

Subsequently, the generate() function has been refactored to use the new classes. Facade methods like beam_search(), which merely instantiate and call the new Decoders, have been retained in generation/utils for backwards compatibility.

With this change, now it is possible to elegantly create a custom GenerationStrategy or subclass an existing strategy, and just pass the customized object to generate(). This will allow the emerging research in generation strategies to use HF (right now, you can see in the literature that fairseq is more common).

New use case examples

# selecting strategy with a string
outputs = model.generate(input_ids=input_ids, generation_strategy='greedy')
# using a custom strategy
decoder = CustomBeamSearch()
outputs = model.generate(input_ids=input_ids, generation_strategy=decoder)

Remaining Work

The proposed code in this PR currently lacks docstrings for the new classes, as it would be more appropriate to add these after finalizing the naming conventions and other details through discussion in this thread.

Additionally, the PR introduces changes to the library LazyImport init files, and feedback on best practices for working with Lazy imports would be greatly appreciated (as I don't have any experience). New tests to validate these changes will be added once the code receives some feedback.

Looking forward to your valuable feedback to improve this PR further.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

I see @gante @sgugger @patrickvonplaten @thomwolf very active in the git history for generation commits.

This commit proposes a new Generation interface, which allows for extensibility and for explicit declaration of the generation strategy, while keeping compatibility with old code and interface.

gante · 2023-07-25T10:08:12Z

Hi @manueldeprada 👋

Thank you for the proposal, but this PR adds further parameterization to generate(), and we want to go in the opposite direction whenever possible -- .generate() already has too many possible arguments.

Related to this PR: we are internally discussing how to refactor .generate() -- isolating the generation strategies like you did here is indeed one of the candidates. However, until we settle on a complete plan, we will not be considering refactors like this one :)

Bear with us, we also want to make .generate() more extensible!

manueldeprada · 2023-07-25T11:20:50Z

Thanks for your reply, @gante!

It's great to hear that a comprehensive rethink of generate() is in progress. In this PR, my aim was to be minimally invasive while preserving backward compatibility. However, I am eager to see a thorough refactor, even if it entails introducing breaking changes. As someone who has worked extensively with generation strategies, I have a few observations that might be useful for the internal discussions:

The current system of arguments is highly complex and can lead to confusion when selecting a generation strategy. Moreover, the situation is complicated further by models setting their own generation_config, as defaults can inadvertently change between models. My suggestion is to streamline generate() to have two primary arguments: generation_method and generation_config. Default behaviors could be as follows:
- If no method or configuration is passed, the function reverts to the model's default.
- If a method is specified, a valid configuration should also be provided. Users could create a new valid generation_config, or if they want to override certain parameters from the model's defaults, they can retrieve the model's default configuration (perhaps using model.generation_config), modify the desired parameters, and then pass it to generate().
- Besides strings, the generation_method could accept custom objects, similar to what I proposed in this PR.
I think beam_search() and similar methods from generation.utils should be deprecated and replaced with extensible classes, as demonstrated in this PR.
The feature needs consistent and clear naming. Terms such as generators, decoders, generation adapters, or generation strategies could be used. It's crucial to establish a convention and stick to it.
Isolating the strategies could foster an ecosystem where novel strategies and implementations could be shared via Spaces, similar to the current practice with metrics.
The isolation of strategies could also make the refactoring of individual strategies much simpler. My work with beam_search() indicates that it could also benefit from a rethink.
- For instance, completely separating beam_search from group_beam_search could simplify and streamline the code, making it easier to extend. This could also open up possibilities for full vectorization of beam_search along the batch dimension (removing the for loops inside BeamSearchScorer).
- There is also important details that are hidden in obscure corners in the implementation. For example, all the early_stopping related info, and how it relates to length_penalty, is not clear in the documentation.
I'm curious about the progress of the discussions on the future of generate. Do you have an estimated timeline for these changes? Given the growing body of literature on decoding strategies (example 1, example 2, example 3)—mostly developed on Meta's Fairseq—I believe easy generation extensibility in Huggingface would attract these advancements to the platform.

To provide some context, my experience is primarily derived from porting example 1 to Huggingface Transformers. This necessitated forking the entire transformers library, which is not an ideal approach. Given that the reimagining of generate() will likely take some time, I plan to publish my changes as a separate companion package in the interim. Please keep us updated on the discussion, and let me know if I can assist further with the refactor! 🤗

github-actions · 2023-08-24T08:02:25Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

manueldeprada · 2023-11-08T11:51:48Z

Hey @gante any progress in the internal discussion?

As a quick reminder of what this thread was about, my PR tried to address the two main obstacles that the "decoding research" community encounters in transformers:

Being able to tell generate() explicitly what generation strategy to use (instead of relying on setting the arguments).
Being able to extend a generation strategy and pass it to generate().

In the meantime, more libraries have appeared that work around this limitation of generate(). Look for example at ZurichNLP/mbr by @jvamvas.

What is the current state of the discussion? Would you be open to adding a generation_strategy param to either generate() or GenerationConfig?

Also, would you accept a PR that, without adding new parametrization, decoupled generate() from the generation strategies? So that the new _get_generation_strategy() returns a GenerationStrategy subclass, beam_search(), sample() and such methods are encapsulated in their classes (with all the extra logic, such as creating the BeamScorer, etc).

Generation refactor: new interface, new classes.

9de337a

This commit proposes a new Generation interface, which allows for extensibility and for explicit declaration of the generation strategy, while keeping compatibility with old code and interface.

github-actions bot closed this Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation refactor: new interface, new classes. #25061

Generation refactor: new interface, new classes. #25061

manueldeprada commented Jul 24, 2023 •

edited

Loading

gante commented Jul 25, 2023

manueldeprada commented Jul 25, 2023

github-actions bot commented Aug 24, 2023

manueldeprada commented Nov 8, 2023 •

edited

Loading

Generation refactor: new interface, new classes. #25061

Generation refactor: new interface, new classes. #25061

Conversation

manueldeprada commented Jul 24, 2023 • edited Loading

Summary

Detailed Description

Introducing the "generation_strategy" Argument

Enhancing Extensibility of Generation Strategies

Proposed Changes

New use case examples

Remaining Work

Before submitting

Who can review?

gante commented Jul 25, 2023

manueldeprada commented Jul 25, 2023

github-actions bot commented Aug 24, 2023

manueldeprada commented Nov 8, 2023 • edited Loading

manueldeprada commented Jul 24, 2023 •

edited

Loading

manueldeprada commented Nov 8, 2023 •

edited

Loading