Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Main Issue] Accelerator and Plugin refactor #10416

Closed
2 of 4 tasks
four4fish opened this issue Nov 8, 2021 · 6 comments · Fixed by #10570 or #10596
Closed
2 of 4 tasks

[Main Issue] Accelerator and Plugin refactor #10416

four4fish opened this issue Nov 8, 2021 · 6 comments · Fixed by #10570 or #10596
Assignees
Labels
accelerator design Includes a design discussion plugin refactor
Milestone

Comments

@four4fish
Copy link
Contributor

four4fish commented Nov 8, 2021

Proposed refactoring or deprecation

Motivation

Accelerator is not stable API yet, we can improve the Accelerator related logic and move towards stable Accelerator version for 1.6

Pitch

Steps

  1. Collective refactor Consolidate collective functions #7534
  2. Move Precision Plugin into TTP Precision Plugins should be part of Training Type Plugins #7324
  3. Move Accelerator into Strategy [Accelerator refactor] Move Accelerator into Strategy #10648
  4. Simplify the Spawning logic Simplify multiprocessing logic in DDPSpawn plugins #10059
  5. [RFC] Simplifying the Accelerator Connector logic and flags (can be done in parallel with aboves) Rewrite Accelerator_connector and follow up tasks #11449 11449
  6. [RFC] Revisit the inheritance of TTP Flatten the Strategy inheritance #11863

More details in: Accelerator Refactor Proposal
[updating]

FAQ

  1. Will this be a lot of breaking changes?
    Not much user facing API changes from 1,2,3,4.(Unless we found out other existing bugs during refactor) The only breaking change will be for custom plugins
    5 and 6 is still RFC stage, may have breaking changes which impact user facing APIs

  2. How does this impact lightningLite?
    Should be helpful for lightningLite too, there maybe function refactor/simplification could happen for lightningLite. (@awaelchli any suggestion about this part?)

Follow up TODOs:


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @tchaton @Borda @kaushikb11 @ananthsub

@four4fish four4fish added refactor design Includes a design discussion distributed Generic distributed-related topic labels Nov 8, 2021
@ananthsub ananthsub changed the title [Master Issue] Accelerator and Plugin refactor [Main Issue] Accelerator and Plugin refactor Nov 8, 2021
@four4fish four4fish changed the title [Main Issue] Accelerator and Plugin refactor Accelerator and Plugin refactor Nov 8, 2021
@four4fish four4fish changed the title Accelerator and Plugin refactor [Main Issue]Accelerator and Plugin refactor Nov 8, 2021
@four4fish four4fish changed the title [Main Issue]Accelerator and Plugin refactor [Main Issue] Accelerator and Plugin refactor Nov 8, 2021
@awaelchli
Copy link
Contributor

Great summary! About

How does this impact lightningLite?

As best as I can see right now:
Step 1, 2 do not impact Lite. If there are changes in required in the Trainer, they will be mirrored directly by Lite. The Trainer as well as Lite do not make any strong assumptions on the internal composition of plugins.
Step 3: not sure
Step 4: Will impact Trainer more than Lite. Lite is sort of already setting an example here how it could/should look like in the Trainer.
Step 5, 6: should not impact Lite at all (accelerator connector is shared)

@tchaton
Copy link
Contributor

tchaton commented Nov 9, 2021

Yes, definitely really excited about this. I am quite eager to finally see the accelerators marked as a stable API targetting v1.6.

@awaelchli
Copy link
Contributor

@ananthsub

#10570 (review)

It'd be very good to list what APIs this lets us simplify or remove from the Accelerator/strategy/precision plugin. Those could be listed as immediate followups to this PR

Here are some that I found and also some are mentioned in the doc

After 2)

Methods:

  • Accelerator.optimizer_step (move)
  • Accelerator.backward (move)
  • Accelerator.*_dispatch (move + reduce)
  • Accelerator.*_step (move)
  • some miscellaneous ones

Property:

  • Accelerator.amp_backend (move)
  • Accelerator.precision (move)
  • Accelerator.scaler (move)

After 3)

  • TrainingTypePlugin.on_gpu/on_tpu
  • TrainingTypePlugin.model_to_device

These will either make use of the accelerator or be moved to it.

@four4fish four4fish linked a pull request Nov 20, 2021 that will close this issue
12 tasks
@awaelchli
Copy link
Contributor

Now that the precision plugin has moved, I will take a look at this TODO here:
https://github.com/PyTorchLightning/pytorch-lightning/blob/af0bb96f0ff645102680a7adc99dc131cfeb9c0b/pytorch_lightning/lite/lite.py#L433-L439

@awaelchli
Copy link
Contributor

Here is another follow up we need to do IMO: #10686. This will unblock also #10657

@carmocca
Copy link
Contributor

Closing this issue in favor of the smaller linked issues for the pending tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment