Merge branch 'master' into 1.5-release

Lightning-AI · Nov 2, 2021 · f2b81f0 · f2b81f0
2 parents 76cb77e + f6ed0bd
commit f2b81f0
Show file tree

Hide file tree

Showing 17 changed files with 220 additions and 40 deletions.
diff --git a/.azure-pipelines/gpu-tests.yml b/.azure-pipelines/gpu-tests.yml
@@ -108,8 +108,6 @@ jobs:
         bash pl_examples/run_examples.sh --trainer.gpus=1
         bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=ddp
         bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=ddp --trainer.precision=16
-        bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=dp
-        bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=dp --trainer.precision=16
       env:
         PL_USE_MOCKED_MNIST: "1"
       displayName: 'Testing: examples'

diff --git a/.gitignore b/.gitignore
@@ -24,6 +24,9 @@ __pycache__/
 *.py[cod]
 *$py.class
 timit_data/
+grid_generated*
+grid_ori*
+
 
 
 # C extensions

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -83,6 +83,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
     * Added Rich progress bar ([#8929](https://github.com/PyTorchLightning/pytorch-lightning/pull/8929), [#9559](https://github.com/PyTorchLightning/pytorch-lightning/pull/9559))
     * Added Support for iterable datasets ([#9734](https://github.com/PyTorchLightning/pytorch-lightning/pull/9734))
     * Added `RichModelSummary` callback ([#9546](https://github.com/PyTorchLightning/pytorch-lightning/pull/9546))
+    * Added `configure_columns` method to `RichProgressBar` ([#10288](https://github.com/PyTorchLightning/pytorch-lightning/pull/10288))
+    * Added `leave` argument to `RichProgressBar` ([#10301](https://github.com/PyTorchLightning/pytorch-lightning/pull/10301))
 - Added input validation logic for precision ([#9080](https://github.com/PyTorchLightning/pytorch-lightning/pull/9080))
 - Added support for CPU AMP autocast ([#9084](https://github.com/PyTorchLightning/pytorch-lightning/pull/9084))
 - Added `on_exception` callback hook ([#9183](https://github.com/PyTorchLightning/pytorch-lightning/pull/9183))
@@ -128,7 +130,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added support for `devices="auto"` ([#10264](https://github.com/PyTorchLightning/pytorch-lightning/pull/10264))
 - Added a `filename` argument in `ModelCheckpoint.format_checkpoint_name` ([#9818](https://github.com/PyTorchLightning/pytorch-lightning/pull/9818))
 - Added support for empty `gpus` list to run on CPU ([#10246](https://github.com/PyTorchLightning/pytorch-lightning/pull/10246))
-- Added `configure_columns` method to `RichProgressBar` ([#10288](https://github.com/PyTorchLightning/pytorch-lightning/pull/10288))
 - Added a warning if multiple batch sizes are found from ambiguous batch ([#10247](https://github.com/PyTorchLightning/pytorch-lightning/pull/10247))
 
 
@@ -178,9 +179,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Changed default value of the `max_steps` Trainer argument from `None` to -1 ([#9460](https://github.com/PyTorchLightning/pytorch-lightning/pull/9460))
 - LightningModule now raises an error when calling `log(on_step=False, on_epoch=False)` ([#10227](https://github.com/PyTorchLightning/pytorch-lightning/pull/10227))
 - Quantization aware training observers are now disabled by default during validating/testing/predicting stages ([#8540](https://github.com/PyTorchLightning/pytorch-lightning/pull/8540))
+- Raised `MisconfigurationException` when total length of `dataloader` across ranks is zero, and give warning when total length is non-zero, but only local rank length is zero. ([#9827](https://github.com/PyTorchLightning/pytorch-lightning/pull/9827))
 - Changed the model size calculation using `ByteCounter` ([#10123](https://github.com/PyTorchLightning/pytorch-lightning/pull/10123))
 - Enabled `on_load_checkpoint` for `LightningDataModule` for all `trainer_fn` ([#10238](https://github.com/PyTorchLightning/pytorch-lightning/pull/10238))
-- Allowed separate config files for parameters with class type when LightningCLI is in `subclass_mode=False` ([#10286](https://github.com/PyTorchLightning/pytorch-lightning/pull/10286))
+- Allow separate config files for parameters with class type when LightningCLI is in subclass_mode=False ([#10286](https://github.com/PyTorchLightning/pytorch-lightning/pull/10286))
 
 
 ### Deprecated

diff --git a/docs/source/starter/lightning_lite.rst b/docs/source/starter/lightning_lite.rst
@@ -3,15 +3,14 @@ LightningLite - Stepping Stone to Lightning
 ###########################################
 
 
+:class:`~pytorch_lightning.lite.LightningLite` enables pure PyTorch users to scale their existing code
+on any kind of device while retaining full control over their own loops and optimization logic.
+
 .. image:: https://pl-public-data.s3.amazonaws.com/docs/static/images/lite/lightning_lite.gif
-    :alt: Animation showing how to convert a standard training loop to a Lightning loop
-    :width: 600px
+    :alt: Animation showing how to convert your PyTorch code to LightningLite.
+    :width: 500
     :align: center
 
-|
-
-:class:`~pytorch_lightning.lite.LightningLite` enables pure PyTorch users to scale their existing code
-on any kind of device while retaining full control over their own loops and optimization logic.
 
 :class:`~pytorch_lightning.lite.LightningLite` is the right tool for you if you match one of the two following descriptions:
 
@@ -246,6 +245,9 @@ from its hundreds of features.
 
 You can see our :class:`~pytorch_lightning.lite.LightningLite` as a
 future :class:`~pytorch_lightning.core.lightning.LightningModule` and slowly refactor your code into its API.
+Below, the :meth:`~pytorch_lightning.core.lightning.LightningModule.training_step`, :meth:`~pytorch_lightning.core.lightning.LightningModule.forward`,
+:meth:`~pytorch_lightning.core.lightning.LightningModule.configure_optimizers`, :meth:`~pytorch_lightning.core.lightning.LightningModule.train_dataloader`
+are being implemented.
 
 
 .. code-block:: python
@@ -300,7 +302,7 @@ future :class:`~pytorch_lightning.core.lightning.LightningModule` and slowly ref
 
 
 Finally, change the :meth:`~pytorch_lightning.lite.LightningLite.run` into a
-:meth:`~pytorch_lightning.core.lightning.LightningModule.__init__` and drop the inner code for setting up the components.
+:meth:`~pytorch_lightning.core.lightning.LightningModule.__init__` and drop the fit method.
 
 .. code-block:: python
 

diff --git a/pl_examples/README.md b/pl_examples/README.md
@@ -25,7 +25,7 @@ In this folder, we have 2 simple examples:
 
 - [Image Classifier](./basic_examples/backbone_image_classifier.py) (trains arbitrary datasets with arbitrary backbones).
 - [Image Classifier + DALI](./basic_examples/mnist_examples/image_classifier_4_dali.py) (defines the model inside the `LightningModule`).
-- [Autoencoder](./basic_examples/autoencoder.py) (shows how the `LightningModule` can be used as a system)
+- [Autoencoder](./basic_examples/autoencoder.py)
 
 ______________________________________________________________________
 
@@ -37,6 +37,14 @@ for advanced use cases.
 
 ______________________________________________________________________
 
+## Basic Examples
+
+In this folder, we have 1 simple example:
+
+- [Image Classifier + DALI](./integration_examples/dali_image_classifier.py) (defines the model inside the `LightningModule`).
+
+______________________________________________________________________
+
 ## Loop examples
 
 Contains implementations leveraging [loop customization](https://pytorch-lightning.readthedocs.io/en/latest/extensions/loops.html) to enhance the Trainer with new optimization routines.

diff --git a/pl_examples/basic_examples/README.md b/pl_examples/basic_examples/README.md
@@ -14,7 +14,7 @@ Trains a simple CNN over MNIST using vanilla PyTorch.
 
 ```bash
 # CPU
-python image_classifier_1_pytorch.py
+python mnist_examples/image_classifier_1_pytorch.py
 ```
 
 ______________________________________________________________________
@@ -25,7 +25,7 @@ This script shows you how to scale the previous script to enable GPU and multi-G
 
 ```bash
 # CPU / multiple GPUs if available
-python image_classifier_2_lite.py
+python mnist_examples/image_classifier_2_lite.py
 ```
 
 ______________________________________________________________________
@@ -36,7 +36,7 @@ This script shows you how to prepare your conversion from [LightningLite](https:
 
 ```bash
 # CPU / multiple GPUs if available
-python image_classifier_3_lite_to_lightning_module.py
+python mnist_examples/image_classifier_3_lite_to_lightning_module.py
 ```
 
 ______________________________________________________________________
@@ -47,10 +47,10 @@ This script shows you the result of the conversion to the `LightningModule` and
 
 ```bash
 # CPU
-python image_classifier_4_lightning_module.py
+python mnist_examples/image_classifier_4_lightning_module.py
 
 # GPUs (any number)
-python image_classifier_4_lightning_module.py --trainer.gpus 2
+python mnist_examples/image_classifier_4_lightning_module.py --trainer.gpus 2
 ```
 
 ______________________________________________________________________
@@ -61,11 +61,56 @@ This script shows you how to extract the data related components into a `Lightni
 
 ```bash
 # CPU
-python image_classifier_5_lightning_datamodule.py
+python mnist_examples/image_classifier_5_lightning_datamodule.py
 
 # GPUs (any number)
-python image_classifier_5_lightning_datamodule.py --trainer.gpus 2
+python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.gpus 2
 
 # Distributed Data Parallel (DDP)
-python image_classifier_5_lightning_datamodule.py --trainer.gpus 2 --trainer.strategy 'ddp'
+python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.gpus 2 --trainer.strategy 'ddp'
+```
+
+______________________________________________________________________
+
+#### AutoEncoder
+
+This script shows you how to implement a CNN auto-encoder.
+
+```bash
+# CPU
+python autoencoder.py
+
+# GPUs (any number)
+python autoencoder.py --trainer.gpus 2
+
+# Distributed Data Parallel (DDP)
+python autoencoder.py --trainer.gpus 2 --trainer.strategy 'ddp'
+```
+
+______________________________________________________________________
+
+#### Backbone Image Classifier
+
+This script shows you how to implement a `LightningModule` as a system.
+A system describes a `LightningModule` which takes a single `torch.nn.Module` which makes exporting to producion simpler.
+
+```bash
+# CPU
+python backbone_image_classifier.py
+
+# GPUs (any number)
+python backbone_image_classifier.py --trainer.gpus 2
+
+# Distributed Data Parallel (DDP)
+python backbone_image_classifier.py --trainer.gpus 2 --trainer.strategy 'ddp'
+```
+
+______________________________________________________________________
+
+#### PyTorch Profiler
+
+This script shows you how to activate the [PyTorch Profiler](https://github.com/pytorch/kineto) with Lightning.
+
+```bash
+python profiler_example.py
 ```
diff --git a/pl_examples/basic_examples/mnist_examples/image_classifier_4_lightning_module.py b/pl_examples/basic_examples/mnist_examples/image_classifier_4_lightning_module.py
@@ -18,6 +18,7 @@
 import torch
 import torchvision.transforms as T
 from torch.nn import functional as F
+from torchmetrics import Accuracy
 
 from pl_examples import cli_lightning_logo
 from pl_examples.basic_examples.mnist_datamodule import MNIST
@@ -31,6 +32,7 @@ def __init__(self, model=None, lr=1.0, gamma=0.7, batch_size=32):
         super().__init__()
         self.save_hyperparameters()
         self.model = model or Net()
+        self.test_acc = Accuracy()
 
     def forward(self, x):
         return self.model(x)
@@ -45,6 +47,7 @@ def test_step(self, batch, batch_idx):
         x, y = batch
         logits = self.forward(x)
         loss = F.nll_loss(logits, y.long())
+        self.log("test_acc", self.test_acc(logits, y))
         return loss
 
     def configure_optimizers(self):

diff --git a/pl_examples/basic_examples/mnist_examples/image_classifier_5_lightning_datamodule.py b/pl_examples/basic_examples/mnist_examples/image_classifier_5_lightning_datamodule.py
@@ -18,6 +18,7 @@
 import torch
 import torchvision.transforms as T
 from torch.nn import functional as F
+from torchmetrics import Accuracy
 
 from pl_examples import cli_lightning_logo
 from pl_examples.basic_examples.mnist_datamodule import MNIST
@@ -31,6 +32,7 @@ def __init__(self, model, lr=1.0, gamma=0.7, batch_size=32):
         super().__init__()
         self.save_hyperparameters()
         self.model = model or Net()
+        self.test_acc = Accuracy()
 
     def forward(self, x):
         return self.model(x)
@@ -45,6 +47,7 @@ def test_step(self, batch, batch_idx):
         x, y = batch
         logits = self.forward(x)
         loss = F.nll_loss(logits, y.long())
+        self.log("test_acc", self.test_acc(logits, y))
         return loss
 
     def configure_optimizers(self):

diff --git a/pytorch_lightning/callbacks/progress/rich_progress.py b/pytorch_lightning/callbacks/progress/rich_progress.py
@@ -37,7 +37,7 @@ def render(self, task: "Task") -> ProgressBar:
                 total=max(0, task.total),
                 completed=max(0, task.completed),
                 width=None if self.bar_width is None else max(1, self.bar_width),
-                pulse=not task.started or math.isfinite(task.remaining),
+                pulse=not task.started or not math.isfinite(task.remaining),
                 animation_time=task.get_time(),
                 style=self.style,
                 complete_style=self.complete_style,
@@ -195,6 +195,7 @@ class RichProgressBar(ProgressBarBase):
 
     Args:
         refresh_rate_per_second: the number of updates per second. If refresh_rate is 0, progress bar is disabled.
+        leave: Leaves the finished progress bar in the terminal at the end of the epoch. Default: False
         theme: Contains styles used to stylize the progress bar.
 
     Raises:
@@ -205,6 +206,7 @@ class RichProgressBar(ProgressBarBase):
     def __init__(
         self,
         refresh_rate_per_second: int = 10,
+        leave: bool = False,
         theme: RichProgressBarTheme = RichProgressBarTheme(),
     ) -> None:
         if not _RICH_AVAILABLE:
@@ -213,6 +215,7 @@ def __init__(
             )
         super().__init__()
         self._refresh_rate_per_second: int = refresh_rate_per_second
+        self._leave: bool = leave
         self._enabled: bool = True
         self.progress: Optional[Progress] = None
         self.val_sanity_progress_bar_id: Optional[int] = None
@@ -323,9 +326,15 @@ def on_train_epoch_start(self, trainer, pl_module):
         total_batches = total_train_batches + total_val_batches
 
         train_description = self._get_train_description(trainer.current_epoch)
+        if self.main_progress_bar_id is not None and self._leave:
+            self._stop_progress()
+            self._init_progress(trainer, pl_module)
         if self.main_progress_bar_id is None:
             self.main_progress_bar_id = self._add_task(total_batches, train_description)
-        self.progress.reset(self.main_progress_bar_id, total=total_batches, description=train_description)
+        else:
+            self.progress.reset(
+                self.main_progress_bar_id, total=total_batches, description=train_description, visible=True
+            )
 
     def on_validation_epoch_start(self, trainer, pl_module):
         super().on_validation_epoch_start(trainer, pl_module)

diff --git a/pytorch_lightning/core/hooks.py b/pytorch_lightning/core/hooks.py
@@ -314,9 +314,13 @@ def __init__(self) -> None:
             prepare_data_per_node:
                 If True, each LOCAL_RANK=0 will call prepare data.
                 Otherwise only NODE_RANK=0, LOCAL_RANK=0 will prepare data.
+            allow_zero_length_dataloader_with_multiple_devices:
+                If True, dataloader with zero length within local rank is allowed.
+                Default value is False.
         """
         super().__init__()
         self.prepare_data_per_node: bool = True
+        self.allow_zero_length_dataloader_with_multiple_devices: bool = False
 
     def prepare_data(self) -> None:
         """Use this to download and prepare data.

diff --git a/pytorch_lightning/trainer/data_loading.py b/pytorch_lightning/trainer/data_loading.py
@@ -37,7 +37,7 @@
     CaptureMapDataset,
     FastForwardSampler,
 )
-from pytorch_lightning.utilities.data import has_iterable_dataset, has_len
+from pytorch_lightning.utilities.data import has_iterable_dataset, has_len_all_ranks
 from pytorch_lightning.utilities.enums import DistributedType
 from pytorch_lightning.utilities.exceptions import MisconfigurationException
 from pytorch_lightning.utilities.imports import _fault_tolerant_training
@@ -346,7 +346,12 @@ def reset_train_dataloader(self, model: Optional["pl.LightningModule"] = None) -
         # wrap the sequence of train loaders to a CombinedLoader object for computing the num_training_batches
         self.train_dataloader = CombinedLoader(self.train_dataloader, self._data_connector.multiple_trainloader_mode)
 
-        self.num_training_batches = len(self.train_dataloader) if has_len(self.train_dataloader) else float("inf")
+        module = model or self.lightning_module or self.datamodule
+        self.num_training_batches = (
+            len(self.train_dataloader)
+            if has_len_all_ranks(self.train_dataloader, self.training_type_plugin, module)
+            else float("inf")
+        )
 
         if isinstance(self.limit_train_batches, int) or self.limit_train_batches == 0.0:
             self.num_training_batches = min(self.num_training_batches, int(self.limit_train_batches))
@@ -371,7 +376,7 @@ def reset_train_dataloader(self, model: Optional["pl.LightningModule"] = None) -
                     "If you want to disable validation set `limit_val_batches` to 0.0 instead."
                 )
         else:
-            if not has_len(self.train_dataloader):
+            if not has_len_all_ranks(self.train_dataloader, self.training_type_plugin, module):
                 if self.val_check_interval == 1.0:
                     self.val_check_batch = float("inf")
                 else:
@@ -452,9 +457,14 @@ def _reset_eval_dataloader(
 
         # determine number of batches
         # datasets could be none, 1 or 2+
+        module = model or self.lightning_module or self.datamodule
         if len(dataloaders) != 0:
             for i, dataloader in enumerate(dataloaders):
-                num_batches = len(dataloader) if has_len(dataloader) else float("inf")
+                num_batches = (
+                    len(dataloader)
+                    if has_len_all_ranks(dataloader, self.training_type_plugin, module)
+                    else float("inf")
+                )
                 self._worker_check(dataloader, f"{mode.dataloader_prefix}_dataloader {i}")
 
                 # percent or num_steps