TPU-related docs are somewhat confusing #14330

dinhanhx · 2022-08-21T03:58:33Z

📚 Documentation

In Trainer class api,

it said not to use accelerator parameter and will be removed in 1.7.0 (current stable version is 1.7.1), BUT tpu_cores said to use accelator.

In Accelerator: TPU Training (also stable version), all the examples use accelerator='tpu'.

I don't really know which one to follow. And when I head to, What is a Strategy?

I don't really understand why we need strategy="ddp_spawn" here when without it, the model is still trained on 8 cores?
Moreover, Learn more of TPU links to examples that don't use any TPU strategy?

This example should use precision='bf16'

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.

cc @Borda @carmocca @JackCaoG @steventk-g @Liyang90 @rohitgr7 @kaushikb11

The text was updated successfully, but these errors were encountered:

rohitgr7 · 2022-08-23T20:21:53Z

it said not to use accelerator parameter and will be removed in 1.7.0 (current stable version is 1.7.1), BUT tpu_cores said to use accelator.

it says passing strategy arguments ('ddp', dp', etc..) to accelerator is deprecated, not the complete parameter.

I don't really understand why we need strategy="ddp_spawn" here when without it, the model is still trained on 8 cores?

yes, but it depends upon devices if you don't set the strategy flag explicitly. With devices=8, it uses ddp_spawn else trains on one TPU core.

This example should use precision='bf16'

yes, it will through a warning and use bf16. But it's a little weird, maybe we should deprecate 16 for TPU and remove it later?
it was added here by @carmocca

carmocca · 2022-08-23T22:21:21Z

maybe we should deprecate 16 for TPU and remove it later?

I had the same question in #10020 (comment) (the comment liked as "fixed" by #10026). But I did not end up deprecating it to "ease transition between accelerator environments" (https://github.com/Lightning-AI/lightning/pull/10026/files#diff-2d97e0bfd5a239885ea0702336cbf3f6dbdaee6a5e255f5ca4da15cb5e260ea9R613)

I somewhat lean towards deprecation in case it ever becomes supported. The docs should be updated regardless of this decision

rohitgr7 · 2022-08-23T22:37:56Z

But I did not end up deprecating it to "ease transition between accelerator environments"

you mean someone changing from CPU or GPU to TPU?

carmocca · 2022-08-23T23:06:36Z

Yes

dinhanhx · 2022-08-23T23:34:42Z

Re @

it says passing strategy arguments ('ddp', dp', etc..) to accelerator is deprecated, not the complete parameter.

This was on me. I misread the sentences. It means passing strategy options.

Thanks for the rest

stale · 2023-04-15T03:23:54Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!

dinhanhx added the needs triage Waiting to be triaged by maintainers label Aug 21, 2022

rohitgr7 added question Further information is requested accelerator: tpu Tensor Processing Unit and removed needs triage Waiting to be triaged by maintainers labels Aug 23, 2022

carmocca added the docs Documentation related label Aug 23, 2022

stale bot added the won't fix This will not be worked on label Apr 15, 2023

carmocca closed this as completed Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPU-related docs are somewhat confusing #14330

TPU-related docs are somewhat confusing #14330

dinhanhx commented Aug 21, 2022 •

edited by github-actions bot

Loading

rohitgr7 commented Aug 23, 2022

carmocca commented Aug 23, 2022

rohitgr7 commented Aug 23, 2022

carmocca commented Aug 23, 2022

dinhanhx commented Aug 23, 2022 •

edited

Loading

stale bot commented Apr 15, 2023

TPU-related docs are somewhat confusing #14330

TPU-related docs are somewhat confusing #14330

Comments

dinhanhx commented Aug 21, 2022 • edited by github-actions bot Loading

📚 Documentation

If you enjoy Lightning, check out our other projects! ⚡

rohitgr7 commented Aug 23, 2022

carmocca commented Aug 23, 2022

rohitgr7 commented Aug 23, 2022

carmocca commented Aug 23, 2022

dinhanhx commented Aug 23, 2022 • edited Loading

stale bot commented Apr 15, 2023

dinhanhx commented Aug 21, 2022 •

edited by github-actions bot

Loading

dinhanhx commented Aug 23, 2022 •

edited

Loading