Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate trainer.num_processe/trainer.num_gpus and remove incorrect tests #11624

Closed
four4fish opened this issue Jan 26, 2022 · 3 comments
Closed
Assignees

Comments

@four4fish
Copy link
Contributor

four4fish commented Jan 26, 2022

Proposed refactor

Trainer.num_processes() and trainer.num_gpus are not used in the code base and only exists in tests. Propose deprecate/remove these properties and remove incorrect/unnecessary tests

Motivation

  1. Simplify code and reduce confusion. Strategy.num_processes != Trainer.num_processes, and Trainer.num_processes is only called in test. (confusion raised in Lazy initialize Strategy.parallel_devices #11572)
  2. Simplify accelerator_connector rewrite Rewrite Accelerator_connector and follow up tasks #11449. The current accelerator_connector has a lot of logic related to num_processes which is not necessary and confusing, remove it first will simplify the refactor

Pitch

Steps:
1 Deprecate trainer.num_processes
https://github.com/PyTorchLightning/pytorch-lightning/blob/fe34bf2a653ebd50e6a3a00be829e3611f820c3c/pytorch_lightning/trainer/trainer.py#L1969-L1971

  1. Remove trainer.num_processes tests
    https://github.com/PyTorchLightning/pytorch-lightning/blob/fe34bf2a653ebd50e6a3a00be829e3611f820c3c/tests/trainer/test_trainer.py#L2093-L2111

  2. Do not carry self.num_processes logic over to accelerator_connector rewrite.

Notes: this won't impact Strategy.num_processes as it's irrelevant. Strategy.num_processes calculated base on parallel_devices which is not equal or related to trainer.num_processes

same for trainer.num_gpus

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @rohitgr7 @kaushikb11 @Borda @ananthsub @ninginthecloud @jjenniferdai

@four4fish four4fish self-assigned this Jan 26, 2022
@four4fish four4fish changed the title Deprecate trainer.num_processes and remove incorrect tests Deprecate trainer.num_processe/trainer.num_gpus and remove incorrect tests Jan 26, 2022
@awaelchli awaelchli added this to the 1.6 milestone Jan 29, 2022
@awaelchli
Copy link
Contributor

This is reasonable imo considering the genericTrainer(devices=x) argument was recently introduced., num_processes does not follow that terminology anyway. When deprecating num_processes, num_gpus etc., we should introduce num_devices imo.
Other than that, I think we can go ahead with this change.

@carmocca
Copy link
Contributor

Is this finished?

@rohitgr7
Copy link
Contributor

looks like the deprecation is done already. Closing this.

Repository owner moved this from Todo to Done in Frameworks Planning Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants