Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strategy argument to Trainer #8597

Merged
merged 34 commits into from
Oct 13, 2021

Conversation

kaushikb11
Copy link
Contributor

@kaushikb11 kaushikb11 commented Jul 28, 2021

What does this PR do?

Supports #6090
Related Issue #9053

strategy argument supports passing training type aliases (ddp, ddp_spawn), TrainingTypeRegistry plugins ("ddp_spawn_find_unused_parameters_false") and custom plugin objects (DDPPlugin())

At the moment, there’s a single flag accelerator tied for Accelerators as well as Training Type plugins. We wish to have them decoupled!

trainer = Trainer(accelerator=GPUaccelerator(..))
trainer = Trainer(accelerator='ddp_spawn')

Alternate flags to set Training Types

  • accelerator
    • type: Optional[Union[str, Accelerator]] = None
    • Supports training types and Accelerator Objects
  • distributed_backend
    • type: Optional[str] = None
    • Deprecated, should use accelerator instead
  • plugins
    • type: Optional[Union[List[Union[Plugin, ClusterEnvironment, str]], Plugin, ClusterEnvironment, str]] = None
    • Supports custom lightning plugins & environment

What's the difference between passing training type to accelerator, distributed_backend, or plugins?

  • accelerator and distributed_backend only support DistributedType, whereas plugins support Custom Training Types.

Exceptions:

  • Trainer(distributed_backend="ddp_cpu", strategy="ddp_spawn")
  • Trainer(accelerator="ddp", strategy="ddp_spawn")
  • Trainer(plugins="ddp_find_unused_parameters_false", strategy="ddp_spawn")

Deprecations: (Deprecated in v1.5 & will be removed in v1.6)

  • Passing training type to accelerator flag
  • Passing training type to plugins flag

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Jul 28, 2021

Codecov Report

Merging #8597 (dfecb4f) into master (28fc8d2) will decrease coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #8597    +/-   ##
=======================================
- Coverage      93%     89%    -4%     
=======================================
  Files         178     178            
  Lines       15668   15695    +27     
=======================================
- Hits        14526   13943   -583     
- Misses       1142    1752   +610     

@ananthsub
Copy link
Contributor

since training type plugins are themselves in beta, i have a naming question: training type isn't only for training, but also other stages like evaluation and prediction. people could be confused why the plugin name references training if it also applies during these other situations. with that in mind, is there another name we should formalize his under? i fully acknowledge renaming existing training type plugins would be super annoying, but it'll be much harder to change once this is on the trainer constructor

@kaushikb11 kaushikb11 self-assigned this Jul 30, 2021
@justusschock
Copy link
Member

@ananthsub I fully agree. This comes back from when we introduced it. Back then there was mainly training and validation (which was considered to be only a part of training). How would you call it though? Some kind of strategy_plugin? But strategy for what? Precision is also some kind of strategy. TBH, I initially came up with the name, because I couldn't find anything better and needed something to prototype this... And somehow we kept it :D

@kaushikb11 kaushikb11 added this to the v1.5 milestone Aug 3, 2021
@kaushikb11 kaushikb11 added the feature Is an improvement or enhancement label Aug 3, 2021
@kaushikb11 kaushikb11 marked this pull request as ready for review August 3, 2021 05:59
@kaushikb11 kaushikb11 requested a review from rohitgr7 as a code owner October 11, 2021 17:06
@mergify mergify bot removed the has conflicts label Oct 11, 2021
@kaushikb11 kaushikb11 changed the title Add accelerator_strategy argument to Trainer Add strategy argument to Trainer Oct 11, 2021
@mergify mergify bot removed the has conflicts label Oct 12, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT !

@mergify mergify bot added the ready PRs ready to be merged label Oct 12, 2021
Copy link
Contributor

@SeanNaren SeanNaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! once merged imo we should do two things:

  1. update docs
  2. send a message in our community slack notifying people of this change in general, and the motivations behind it!

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@kaushikb11 kaushikb11 enabled auto-merge (squash) October 13, 2021 11:13
@mergify mergify bot removed the has conflicts label Oct 13, 2021
@kaushikb11 kaushikb11 merged commit 05b15e6 into Lightning-AI:master Oct 13, 2021
raise MisconfigurationException(
f"You have passed `Trainer(strategy={self.strategy})` but have"
f" also passed `Trainer(distributed_backend={distributed_backend})`."
f"HINT: Use just `Trainer(strategy={self.strategy})` instead."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing whitespace here

Comment on lines +308 to +311
rank_zero_deprecation(
f"Passing {accelerator} `strategy` to the `accelerator` flag in Trainer has been deprecated"
f" in v1.5 and will be removed in v1.7. Use `Trainer(strategy={accelerator})` instead."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we weren't going to deprecate the previous accelerator and instead just print a warning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think accelerator flag is still there.. it's just that passing one of the strategies to it is deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I understood from our offline discussion was that support gpus=N and accelerator="ddp" would not be deprecated and removed as its widely used but a warning would be printed suggesting adopting the new flags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags gpus, tpu_cores, etc will still be supported but passing training strategies to accelerator will be deprecated. This decision will also help with the internal cleanup of AcceleratorConnector.

rohitgr7 added a commit to Tshimanga/pytorch-lightning that referenced this pull request Oct 18, 2021
Co-authored-by: Rohit Gupta <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants