Adding `optimizer_cls_and_kwargs` to `Trainer.init` #34358

apoorvkh · 2024-10-23T20:26:03Z

What does this PR do?

Currently, Trainer must be extended (as follows) to provide a custom torch.optim.Optimizer. The existing optimizers argument for Trainer assumes the model is already initialized on the correct devices (which is usually handled by Trainer).

class CustomOptimizerTrainer(Trainer):
    @staticmethod
    def get_optimizer_cls_and_kwargs(args: HfTrainingArguments, model=None) -> tuple[type[torch.optim.Optimizer], dict[str, Any]]:
        optimizer = torch.optim.AdamW
        optimizer_kwargs = {
            "lr": 4e-3,
            "betas": (0.9, 0.999),
            "weight_decay": 0.05,
        }
        return optimizer, optimizer_kwargs

This PR adds an optimizer_cls_and_kwargs argument to Trainer. This simply allows a user to pass Type[torch.optim.Optimizer] and Dict[str, Any] when initializing the Trainer rather than having to extend the class (as above).

I am making this PR after #31875 (cc: @amyeroberts @muellerzr)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

muellerzr

Nice! Seems like a great fix :)

muellerzr · 2024-10-24T13:55:02Z

If you do pip install -e .[quality]; make fixup it should fix the style failures

SunMarc

Thanks for the PR ! LGTM ! I'm curious about why the optimizers args is not enough for you use case ?

HuggingFaceDocBuilderDev · 2024-10-24T14:21:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

apoorvkh · 2024-10-24T15:00:26Z

Great, thanks both!

@muellerzr I had already run make fixup but it results in the following error(s). I think these are unrelated to this PR?

make repo-consistency

python utils/check_modular_conversion.py

Differences found between the generated code and src/transformers/models/glm/modeling_glm.py:

   1 --- src/transformers/models/glm/modeling_glm.py_generated
   2 +++ src/transformers/models/glm/modeling_glm.py
   3 @@ -38,7 +38,6 @@
   4  )
   5  from ...modeling_utils import PreTrainedModel
   6  from ...utils import (
   7 -    add_code_sample_docstrings,
   8      add_start_docstrings,
   9      add_start_docstrings_to_model_forward,
  10      is_flash_attn_2_available,
  11 @@ -1223,11 +1222,6 @@
  12          self.model.embed_tokens = value
  13  
  14      @add_start_docstrings_to_model_forward(GLM_INPUTS_DOCSTRING)
  15 -    @add_code_sample_docstrings(
  16 -        checkpoint=_CHECKPOINT_FOR_DOC,
  17 -        output_type=TokenClassifierOutput,
  18 -        config_class=_CONFIG_FOR_DOC,
  19 -    )
  20      def forward(
  21          self,
  22          input_ids: Optional[torch.LongTensor] = None,

make quality

Traceback (most recent call last):
  File "transformers/src/transformers/utils/import_utils.py", line 1778, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "transformers/.pixi/envs/default/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "transformers/src/transformers/convert_graph_to_onnx.py", line 23, in <module>
    from transformers.pipelines import Pipeline, pipeline
  File "transformers/src/transformers/pipelines/__init__.py", line 50, in <module>
    from .automatic_speech_recognition import AutomaticSpeechRecognitionPipeline
  File "transformers/src/transformers/pipelines/automatic_speech_recognition.py", line 23, in <module>
    from .audio_utils import ffmpeg_read
  File "transformers/src/transformers/pipelines/audio_utils.py", line 54, in <module>
    ffmpeg_additional_args: Optional[list[str]] = None,
TypeError: 'type' object is not subscriptable

make style does seems to work (resulting in no changes).

edit: resolved by #34360

apoorvkh · 2024-10-24T15:11:12Z

@SunMarc I am writing a codebase that uses Trainer for training with arbitrary models / hyper-parameters, especially using methods like FSDP / Deepspeed. (Happy to post a link shortly :) )

I believe the optimizers argument pre-supposes that a model is copied/sharded onto the right devices. But if we rely on Trainer to do the copying/sharding, the model parameters won't be available earlier (when defining the optimizers argument). I think that necessitates using optimizer_cls and optimizer_kwargs in Trainer. This PR simply tries to expose those options more conveniently :)

apoorvkh · 2024-10-24T21:51:38Z

Most checks pass now! A few tests in tests_torch_and_flax fail, but this does not seem specific to this PR.

SunMarc · 2024-10-24T23:47:38Z

Makes sense ! Thanks for explaining ! Maybe you can add a note explaining that in the description of the arg. As for the CI, this PR is indeed unrelated. We will fix it shortly !

apoorvkh · 2024-10-25T00:03:48Z

For sure! Done.

SunMarc · 2024-10-28T01:29:11Z

Sorry for the wait ! When the CI is fixed from our side, we will merge the PR @apoorvkh

apoorvkh · 2024-10-28T01:30:04Z

Sounds great, thanks!

ArthurZucker

Thanks, would you like to add a bit of documentation for discoverability? With an example use case maybe? 🤗

…sformers into trainer-init-optimizer-cls

apoorvkh · 2024-10-29T02:55:50Z

Sure, how do those changes to trainer.md look?

ArthurZucker

LGTM thanks

…34358) * Adding `optimizer_cls_and_kwargs` to `Trainer.__init__` * formatting * make fix-copies docstring * added more docs for optimizer_cls_and_kwargs * add docs for Trainer(optimizer_cls_and_kwargs) * reverting anchor names

apoorvkh added 2 commits October 23, 2024 16:01

Adding optimizer_cls_and_kwargs to Trainer.__init__

7c0e0d0

formatting

85baaeb

muellerzr approved these changes Oct 24, 2024

View reviewed changes

muellerzr requested review from ArthurZucker and SunMarc October 24, 2024 13:54

muellerzr added the trainer label Oct 24, 2024

SunMarc approved these changes Oct 24, 2024

View reviewed changes

apoorvkh added 2 commits October 24, 2024 15:50

Merge branch 'main' into trainer-init-optimizer-cls

4d2ab67

make fix-copies docstring

862ce52

added more docs for optimizer_cls_and_kwargs

9da61fb

Merge branch 'main' into trainer-init-optimizer-cls

f2cd1ad

ArthurZucker approved these changes Oct 28, 2024

View reviewed changes

apoorvkh added 4 commits October 28, 2024 21:58

add docs for Trainer(optimizer_cls_and_kwargs)

b7f1e6a

Merge branch 'main' into trainer-init-optimizer-cls

d77d645

reverting anchor names

c50f2e4

Merge branch 'trainer-init-optimizer-cls' of github.com:apoorvkh/tran…

b6f4ee9

…sformers into trainer-init-optimizer-cls

ArthurZucker approved these changes Oct 29, 2024

View reviewed changes

ArthurZucker merged commit e9ad460 into huggingface:main Oct 29, 2024
17 of 22 checks passed

apoorvkh deleted the trainer-init-optimizer-cls branch October 29, 2024 16:06

qgallouedec mentioned this pull request Oct 31, 2024

Add optimizer_cls_and_kwargs attribute to PPO and RLOO huggingface/trl#2302

Merged

5 tasks

NanoCode012 mentioned this pull request Nov 4, 2024

refactor(optimizer): use optimizer_cls_and_kwargs for custom optim axolotl-ai-cloud/axolotl#2012

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `optimizer_cls_and_kwargs` to `Trainer.init` #34358

Adding `optimizer_cls_and_kwargs` to `Trainer.init` #34358

apoorvkh commented Oct 23, 2024

muellerzr left a comment

muellerzr commented Oct 24, 2024

SunMarc left a comment

HuggingFaceDocBuilderDev commented Oct 24, 2024

apoorvkh commented Oct 24, 2024 •

edited

Loading

apoorvkh commented Oct 24, 2024 •

edited

Loading

apoorvkh commented Oct 24, 2024

SunMarc commented Oct 24, 2024 •

edited

Loading

apoorvkh commented Oct 25, 2024

SunMarc commented Oct 28, 2024

apoorvkh commented Oct 28, 2024

ArthurZucker left a comment

apoorvkh commented Oct 29, 2024

ArthurZucker left a comment

Adding optimizer_cls_and_kwargs to Trainer.__init__ #34358

Adding optimizer_cls_and_kwargs to Trainer.__init__ #34358

Conversation

apoorvkh commented Oct 23, 2024

What does this PR do?

Before submitting

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr commented Oct 24, 2024

SunMarc left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 24, 2024

apoorvkh commented Oct 24, 2024 • edited Loading

apoorvkh commented Oct 24, 2024 • edited Loading

apoorvkh commented Oct 24, 2024

SunMarc commented Oct 24, 2024 • edited Loading

apoorvkh commented Oct 25, 2024

SunMarc commented Oct 28, 2024

apoorvkh commented Oct 28, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

apoorvkh commented Oct 29, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Adding `optimizer_cls_and_kwargs` to `Trainer.init` #34358

Adding `optimizer_cls_and_kwargs` to `Trainer.init` #34358

apoorvkh commented Oct 24, 2024 •

edited

Loading

apoorvkh commented Oct 24, 2024 •

edited

Loading

SunMarc commented Oct 24, 2024 •

edited

Loading