Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit _move_optimizer_state function for all Strategies #10820

Closed
four4fish opened this issue Nov 29, 2021 · 2 comments · Fixed by #10849
Closed

Revisit _move_optimizer_state function for all Strategies #10820

four4fish opened this issue Nov 29, 2021 · 2 comments · Fixed by #10849
Assignees
Labels
distributed Generic distributed-related topic optimizer refactor

Comments

@four4fish
Copy link
Contributor

four4fish commented Nov 29, 2021

Proposed refactor

From comments in #10596
The device isn't used anymore

Motivation

Pitch

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta

@four4fish
Copy link
Contributor Author

@tchaton seems the device still in use in GPUAccelerator's teardown function. Instead of move optimizer state to root_device, it defined move_optimizer_state to cpu to avoid memory leaking.
https://github.com/PyTorchLightning/pytorch-lightning/blame/master/pytorch_lightning/accelerators/gpu.py#L78-L80

@tchaton
Copy link
Contributor

tchaton commented Nov 30, 2021

Hey @four4fish, sounds good to me. I believe the code can be shared across strategy though. Need to revisit the TPU one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed Generic distributed-related topic optimizer refactor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants