Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optimizer_cls_and_kwargs to Trainer.__init__ #34358

Merged
merged 10 commits into from
Oct 29, 2024

Conversation

apoorvkh
Copy link
Contributor

What does this PR do?

Currently, Trainer must be extended (as follows) to provide a custom torch.optim.Optimizer. The existing optimizers argument for Trainer assumes the model is already initialized on the correct devices (which is usually handled by Trainer).

class CustomOptimizerTrainer(Trainer):
    @staticmethod
    def get_optimizer_cls_and_kwargs(args: HfTrainingArguments, model=None) -> tuple[type[torch.optim.Optimizer], dict[str, Any]]:
        optimizer = torch.optim.AdamW
        optimizer_kwargs = {
            "lr": 4e-3,
            "betas": (0.9, 0.999),
            "weight_decay": 0.05,
        }
        return optimizer, optimizer_kwargs

This PR adds an optimizer_cls_and_kwargs argument to Trainer. This simply allows a user to pass Type[torch.optim.Optimizer] and Dict[str, Any] when initializing the Trainer rather than having to extend the class (as above).

I am making this PR after #31875 (cc: @amyeroberts @muellerzr)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Seems like a great fix :)

@muellerzr
Copy link
Contributor

If you do pip install -e .[quality]; make fixup it should fix the style failures

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR ! LGTM ! I'm curious about why the optimizers args is not enough for you use case ?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@apoorvkh
Copy link
Contributor Author

apoorvkh commented Oct 24, 2024

Great, thanks both!

@muellerzr I had already run make fixup but it results in the following error(s). I think these are unrelated to this PR?

make repo-consistency
python utils/check_modular_conversion.py

Differences found between the generated code and src/transformers/models/glm/modeling_glm.py:

   1 --- src/transformers/models/glm/modeling_glm.py_generated
   2 +++ src/transformers/models/glm/modeling_glm.py
   3 @@ -38,7 +38,6 @@
   4  )
   5  from ...modeling_utils import PreTrainedModel
   6  from ...utils import (
   7 -    add_code_sample_docstrings,
   8      add_start_docstrings,
   9      add_start_docstrings_to_model_forward,
  10      is_flash_attn_2_available,
  11 @@ -1223,11 +1222,6 @@
  12          self.model.embed_tokens = value
  13  
  14      @add_start_docstrings_to_model_forward(GLM_INPUTS_DOCSTRING)
  15 -    @add_code_sample_docstrings(
  16 -        checkpoint=_CHECKPOINT_FOR_DOC,
  17 -        output_type=TokenClassifierOutput,
  18 -        config_class=_CONFIG_FOR_DOC,
  19 -    )
  20      def forward(
  21          self,
  22          input_ids: Optional[torch.LongTensor] = None,
make quality
Traceback (most recent call last):
  File "transformers/src/transformers/utils/import_utils.py", line 1778, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "transformers/.pixi/envs/default/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "transformers/src/transformers/convert_graph_to_onnx.py", line 23, in <module>
    from transformers.pipelines import Pipeline, pipeline
  File "transformers/src/transformers/pipelines/__init__.py", line 50, in <module>
    from .automatic_speech_recognition import AutomaticSpeechRecognitionPipeline
  File "transformers/src/transformers/pipelines/automatic_speech_recognition.py", line 23, in <module>
    from .audio_utils import ffmpeg_read
  File "transformers/src/transformers/pipelines/audio_utils.py", line 54, in <module>
    ffmpeg_additional_args: Optional[list[str]] = None,
TypeError: 'type' object is not subscriptable

make style does seems to work (resulting in no changes).

edit: resolved by #34360

@apoorvkh
Copy link
Contributor Author

apoorvkh commented Oct 24, 2024

@SunMarc I am writing a codebase that uses Trainer for training with arbitrary models / hyper-parameters, especially using methods like FSDP / Deepspeed. (Happy to post a link shortly :) )

I believe the optimizers argument pre-supposes that a model is copied/sharded onto the right devices. But if we rely on Trainer to do the copying/sharding, the model parameters won't be available earlier (when defining the optimizers argument). I think that necessitates using optimizer_cls and optimizer_kwargs in Trainer. This PR simply tries to expose those options more conveniently :)

@apoorvkh
Copy link
Contributor Author

Most checks pass now! A few tests in tests_torch_and_flax fail, but this does not seem specific to this PR.

@SunMarc
Copy link
Member

SunMarc commented Oct 24, 2024

Makes sense ! Thanks for explaining ! Maybe you can add a note explaining that in the description of the arg. As for the CI, this PR is indeed unrelated. We will fix it shortly !

@apoorvkh
Copy link
Contributor Author

For sure! Done.

@SunMarc
Copy link
Member

SunMarc commented Oct 28, 2024

Sorry for the wait ! When the CI is fixed from our side, we will merge the PR @apoorvkh

@apoorvkh
Copy link
Contributor Author

Sounds great, thanks!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, would you like to add a bit of documentation for discoverability? With an example use case maybe? 🤗

@apoorvkh
Copy link
Contributor Author

Sure, how do those changes to trainer.md look?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@ArthurZucker ArthurZucker merged commit e9ad460 into huggingface:main Oct 29, 2024
17 of 22 checks passed
@apoorvkh apoorvkh deleted the trainer-init-optimizer-cls branch October 29, 2024 16:06
2015aroras pushed a commit to 2015aroras/transformers that referenced this pull request Nov 15, 2024
…34358)

* Adding `optimizer_cls_and_kwargs` to `Trainer.__init__`

* formatting

* make fix-copies docstring

* added more docs for optimizer_cls_and_kwargs

* add docs for Trainer(optimizer_cls_and_kwargs)

* reverting anchor names
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
…34358)

* Adding `optimizer_cls_and_kwargs` to `Trainer.__init__`

* formatting

* make fix-copies docstring

* added more docs for optimizer_cls_and_kwargs

* add docs for Trainer(optimizer_cls_and_kwargs)

* reverting anchor names
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
…34358)

* Adding `optimizer_cls_and_kwargs` to `Trainer.__init__`

* formatting

* make fix-copies docstring

* added more docs for optimizer_cls_and_kwargs

* add docs for Trainer(optimizer_cls_and_kwargs)

* reverting anchor names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants