No TPU devices were found in a TPU pod env. #6692
Labels
accelerator: tpu
Tensor Processing Unit
bug
Something isn't working
help wanted
Open to be worked on
priority: 0
High priority task
Milestone
🐛 Bug
To Reproduce
tpu_cores=8
to the boringmodelCommand to run:
The exception occurs immediately after the pytorch_lightning is installed. The exception repeats itself because it happens on each instance. Here I copy only one.
Expected behavior
TPU is definitely avaialbe.
Environment
IDE
: Please, use our python bug_report_model.py template.Additional context
I have tried a simple workaround by setting
_TPU_AVAILABLE = True
in https://github.com/PyTorchLightning/pytorch-lightning/blob/0e45220263f4e2045dfe7f68e3e0eaac0b2033d5/pytorch_lightning/utilities/__init__.py#L52. And it works. No more exceptions and model trains perfectly!I think the logic of TPU detection in a pod environment is wrong or out-dated w.r.t the current xla (note it works with single TPU device). I see the official xla code uses
xmp.spawn
to spawn a process to get the potential TPU device.Besides, I think most places that checking
_TPU_AVAILABLE
(guarding to import XLA) at the top-level can be replaced by checking_XLA_AVAILABLE
.The text was updated successfully, but these errors were encountered: