Select the DeepSpeedCPUOptimizer based on the original optimizer class. #3255

eljandoubi · 2024-11-24T22:17:24Z

What does this PR do?

For DeepSpeed optimizer offloading, select the appropriate optimizer based on the original optimizer class.

Who can review?

DeepSpeed: @muellerzr
Core parts of the library: @muellerzr @BenjaminBossan @SunMarc

BenjaminBossan · 2024-11-25T11:15:30Z

Thanks for this update. Just wondering aloud: If the optimizer is neither Adam, nor Adagrad, nor Lion, what would be the right thing to do? As is, Adam is used, is that correct?

eljandoubi · 2024-11-25T11:59:34Z

@BenjaminBossan By default, I have kept the Adam as it was for everyone torch optimizer.

muellerzr

Thanks for doing this! Overall I'm a fan, however let's abstract this out to a util func to keep the Accelerator a bit more readable

muellerzr · 2024-11-25T13:48:13Z

src/accelerate/accelerator.py

+                        # For DeepSpeedCPUAdagrad
+                        if compare_versions("deepspeed", ">=", "0.5.5"):
+                            # Check if the optimizer is PyTorch's Adagrad.
+                            is_ada = isinstance(optimizer, torch.optim.Adagrad)
+                            # If not, and bitsandbytes is available,
+                            # # check if the optimizer is the 32-bit bitsandbytes Adagrad.
+                            if is_bnb_available() and not is_ada:
+                                import bitsandbytes.optim as bnb_opt
+
+                                is_ada = (
+                                    isinstance(optimizer, (bnb_opt.Adagrad, bnb_opt.Adagrad32bit))
+                                    and optimizer.optim_bits == 32
+                                )
+                            if is_ada:
+                                from deepspeed.ops.adagrad import DeepSpeedCPUAdagrad
+
+                                optimizer_class = DeepSpeedCPUAdagrad
+
+                        # For DeepSpeedCPULion
+                        if is_bnb_available(min_version="0.38.0") and compare_versions("deepspeed", ">=", "0.11.0"):
+                            from bitsandbytes.optim import Lion, Lion32bit
+
+                            if isinstance(optimizer, (Lion, Lion32bit)) and optimizer.optim_bits == 32:
+                                from deepspeed.ops.lion import DeepSpeedCPULion
+
+                                optimizer_class = DeepSpeedCPULion
+
+                        optimizer = optimizer_class(optimizer.param_groups, **defaults)


This seems a bit bulky for the Accelerator, can we abstract this out to a deepspeed util of map_pytorch_optim_to_deepspeed?

HuggingFaceDocBuilderDev · 2024-11-25T13:50:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eljandoubi · 2024-11-26T18:20:22Z

@muellerzr What do you think of it like this?

Select the DeepSpeedCPUOptimizer based on the original optimizer class.

8bb6992

muellerzr reviewed Nov 25, 2024

View reviewed changes

eljandoubi and others added 3 commits November 26, 2024 00:10

Merge branch 'huggingface:main' into add_more_cpu_optimizers

92a2e1f

abstract out optimizer selection to a deepspeed util

1cdbb40

add deepspeed cpu Adam & AdamW

53e5400

eljandoubi mentioned this pull request Nov 26, 2024

create _preprare_fsdp to pre- prepare fsdp model training #3213

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select the DeepSpeedCPUOptimizer based on the original optimizer class. #3255

Select the DeepSpeedCPUOptimizer based on the original optimizer class. #3255

eljandoubi commented Nov 24, 2024

BenjaminBossan commented Nov 25, 2024

eljandoubi commented Nov 25, 2024 •

edited

Loading

muellerzr left a comment

muellerzr Nov 25, 2024

HuggingFaceDocBuilderDev commented Nov 25, 2024

eljandoubi commented Nov 26, 2024

Select the DeepSpeedCPUOptimizer based on the original optimizer class. #3255

Are you sure you want to change the base?

Select the DeepSpeedCPUOptimizer based on the original optimizer class. #3255

Conversation

eljandoubi commented Nov 24, 2024

What does this PR do?

Who can review?

BenjaminBossan commented Nov 25, 2024

eljandoubi commented Nov 25, 2024 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr Nov 25, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 25, 2024

eljandoubi commented Nov 26, 2024

eljandoubi commented Nov 25, 2024 •

edited

Loading