Skip to content

Commit

Permalink
Merge branch 'master' into 1.5-release
Browse files Browse the repository at this point in the history
  • Loading branch information
tchaton authored Nov 2, 2021
2 parents 76cb77e + f6ed0bd commit f2b81f0
Show file tree
Hide file tree
Showing 17 changed files with 220 additions and 40 deletions.
2 changes: 0 additions & 2 deletions .azure-pipelines/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,6 @@ jobs:
bash pl_examples/run_examples.sh --trainer.gpus=1
bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=ddp
bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=ddp --trainer.precision=16
bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=dp
bash pl_examples/run_examples.sh --trainer.gpus=2 --trainer.strategy=dp --trainer.precision=16
env:
PL_USE_MOCKED_MNIST: "1"
displayName: 'Testing: examples'
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ __pycache__/
*.py[cod]
*$py.class
timit_data/
grid_generated*
grid_ori*



# C extensions
Expand Down
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
* Added Rich progress bar ([#8929](https://github.com/PyTorchLightning/pytorch-lightning/pull/8929), [#9559](https://github.com/PyTorchLightning/pytorch-lightning/pull/9559))
* Added Support for iterable datasets ([#9734](https://github.com/PyTorchLightning/pytorch-lightning/pull/9734))
* Added `RichModelSummary` callback ([#9546](https://github.com/PyTorchLightning/pytorch-lightning/pull/9546))
* Added `configure_columns` method to `RichProgressBar` ([#10288](https://github.com/PyTorchLightning/pytorch-lightning/pull/10288))
* Added `leave` argument to `RichProgressBar` ([#10301](https://github.com/PyTorchLightning/pytorch-lightning/pull/10301))
- Added input validation logic for precision ([#9080](https://github.com/PyTorchLightning/pytorch-lightning/pull/9080))
- Added support for CPU AMP autocast ([#9084](https://github.com/PyTorchLightning/pytorch-lightning/pull/9084))
- Added `on_exception` callback hook ([#9183](https://github.com/PyTorchLightning/pytorch-lightning/pull/9183))
Expand Down Expand Up @@ -128,7 +130,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Added support for `devices="auto"` ([#10264](https://github.com/PyTorchLightning/pytorch-lightning/pull/10264))
- Added a `filename` argument in `ModelCheckpoint.format_checkpoint_name` ([#9818](https://github.com/PyTorchLightning/pytorch-lightning/pull/9818))
- Added support for empty `gpus` list to run on CPU ([#10246](https://github.com/PyTorchLightning/pytorch-lightning/pull/10246))
- Added `configure_columns` method to `RichProgressBar` ([#10288](https://github.com/PyTorchLightning/pytorch-lightning/pull/10288))
- Added a warning if multiple batch sizes are found from ambiguous batch ([#10247](https://github.com/PyTorchLightning/pytorch-lightning/pull/10247))


Expand Down Expand Up @@ -178,9 +179,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Changed default value of the `max_steps` Trainer argument from `None` to -1 ([#9460](https://github.com/PyTorchLightning/pytorch-lightning/pull/9460))
- LightningModule now raises an error when calling `log(on_step=False, on_epoch=False)` ([#10227](https://github.com/PyTorchLightning/pytorch-lightning/pull/10227))
- Quantization aware training observers are now disabled by default during validating/testing/predicting stages ([#8540](https://github.com/PyTorchLightning/pytorch-lightning/pull/8540))
- Raised `MisconfigurationException` when total length of `dataloader` across ranks is zero, and give warning when total length is non-zero, but only local rank length is zero. ([#9827](https://github.com/PyTorchLightning/pytorch-lightning/pull/9827))
- Changed the model size calculation using `ByteCounter` ([#10123](https://github.com/PyTorchLightning/pytorch-lightning/pull/10123))
- Enabled `on_load_checkpoint` for `LightningDataModule` for all `trainer_fn` ([#10238](https://github.com/PyTorchLightning/pytorch-lightning/pull/10238))
- Allowed separate config files for parameters with class type when LightningCLI is in `subclass_mode=False` ([#10286](https://github.com/PyTorchLightning/pytorch-lightning/pull/10286))
- Allow separate config files for parameters with class type when LightningCLI is in subclass_mode=False ([#10286](https://github.com/PyTorchLightning/pytorch-lightning/pull/10286))


### Deprecated
Expand Down
16 changes: 9 additions & 7 deletions docs/source/starter/lightning_lite.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,14 @@ LightningLite - Stepping Stone to Lightning
###########################################


:class:`~pytorch_lightning.lite.LightningLite` enables pure PyTorch users to scale their existing code
on any kind of device while retaining full control over their own loops and optimization logic.

.. image:: https://pl-public-data.s3.amazonaws.com/docs/static/images/lite/lightning_lite.gif
:alt: Animation showing how to convert a standard training loop to a Lightning loop
:width: 600px
:alt: Animation showing how to convert your PyTorch code to LightningLite.
:width: 500
:align: center

|
:class:`~pytorch_lightning.lite.LightningLite` enables pure PyTorch users to scale their existing code
on any kind of device while retaining full control over their own loops and optimization logic.

:class:`~pytorch_lightning.lite.LightningLite` is the right tool for you if you match one of the two following descriptions:

Expand Down Expand Up @@ -246,6 +245,9 @@ from its hundreds of features.

You can see our :class:`~pytorch_lightning.lite.LightningLite` as a
future :class:`~pytorch_lightning.core.lightning.LightningModule` and slowly refactor your code into its API.
Below, the :meth:`~pytorch_lightning.core.lightning.LightningModule.training_step`, :meth:`~pytorch_lightning.core.lightning.LightningModule.forward`,
:meth:`~pytorch_lightning.core.lightning.LightningModule.configure_optimizers`, :meth:`~pytorch_lightning.core.lightning.LightningModule.train_dataloader`
are being implemented.


.. code-block:: python
Expand Down Expand Up @@ -300,7 +302,7 @@ future :class:`~pytorch_lightning.core.lightning.LightningModule` and slowly ref
Finally, change the :meth:`~pytorch_lightning.lite.LightningLite.run` into a
:meth:`~pytorch_lightning.core.lightning.LightningModule.__init__` and drop the inner code for setting up the components.
:meth:`~pytorch_lightning.core.lightning.LightningModule.__init__` and drop the fit method.

.. code-block:: python
Expand Down
10 changes: 9 additions & 1 deletion pl_examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ In this folder, we have 2 simple examples:

- [Image Classifier](./basic_examples/backbone_image_classifier.py) (trains arbitrary datasets with arbitrary backbones).
- [Image Classifier + DALI](./basic_examples/mnist_examples/image_classifier_4_dali.py) (defines the model inside the `LightningModule`).
- [Autoencoder](./basic_examples/autoencoder.py) (shows how the `LightningModule` can be used as a system)
- [Autoencoder](./basic_examples/autoencoder.py)

______________________________________________________________________

Expand All @@ -37,6 +37,14 @@ for advanced use cases.

______________________________________________________________________

## Basic Examples

In this folder, we have 1 simple example:

- [Image Classifier + DALI](./integration_examples/dali_image_classifier.py) (defines the model inside the `LightningModule`).

______________________________________________________________________

## Loop examples

Contains implementations leveraging [loop customization](https://pytorch-lightning.readthedocs.io/en/latest/extensions/loops.html) to enhance the Trainer with new optimization routines.
Expand Down
61 changes: 53 additions & 8 deletions pl_examples/basic_examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Trains a simple CNN over MNIST using vanilla PyTorch.

```bash
# CPU
python image_classifier_1_pytorch.py
python mnist_examples/image_classifier_1_pytorch.py
```

______________________________________________________________________
Expand All @@ -25,7 +25,7 @@ This script shows you how to scale the previous script to enable GPU and multi-G

```bash
# CPU / multiple GPUs if available
python image_classifier_2_lite.py
python mnist_examples/image_classifier_2_lite.py
```

______________________________________________________________________
Expand All @@ -36,7 +36,7 @@ This script shows you how to prepare your conversion from [LightningLite](https:

```bash
# CPU / multiple GPUs if available
python image_classifier_3_lite_to_lightning_module.py
python mnist_examples/image_classifier_3_lite_to_lightning_module.py
```

______________________________________________________________________
Expand All @@ -47,10 +47,10 @@ This script shows you the result of the conversion to the `LightningModule` and

```bash
# CPU
python image_classifier_4_lightning_module.py
python mnist_examples/image_classifier_4_lightning_module.py

# GPUs (any number)
python image_classifier_4_lightning_module.py --trainer.gpus 2
python mnist_examples/image_classifier_4_lightning_module.py --trainer.gpus 2
```

______________________________________________________________________
Expand All @@ -61,11 +61,56 @@ This script shows you how to extract the data related components into a `Lightni

```bash
# CPU
python image_classifier_5_lightning_datamodule.py
python mnist_examples/image_classifier_5_lightning_datamodule.py

# GPUs (any number)
python image_classifier_5_lightning_datamodule.py --trainer.gpus 2
python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.gpus 2

# Distributed Data Parallel (DDP)
python image_classifier_5_lightning_datamodule.py --trainer.gpus 2 --trainer.strategy 'ddp'
python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.gpus 2 --trainer.strategy 'ddp'
```

______________________________________________________________________

#### AutoEncoder

This script shows you how to implement a CNN auto-encoder.

```bash
# CPU
python autoencoder.py

# GPUs (any number)
python autoencoder.py --trainer.gpus 2

# Distributed Data Parallel (DDP)
python autoencoder.py --trainer.gpus 2 --trainer.strategy 'ddp'
```

______________________________________________________________________

#### Backbone Image Classifier

This script shows you how to implement a `LightningModule` as a system.
A system describes a `LightningModule` which takes a single `torch.nn.Module` which makes exporting to producion simpler.

```bash
# CPU
python backbone_image_classifier.py

# GPUs (any number)
python backbone_image_classifier.py --trainer.gpus 2

# Distributed Data Parallel (DDP)
python backbone_image_classifier.py --trainer.gpus 2 --trainer.strategy 'ddp'
```

______________________________________________________________________

#### PyTorch Profiler

This script shows you how to activate the [PyTorch Profiler](https://github.com/pytorch/kineto) with Lightning.

```bash
python profiler_example.py
```
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import torch
import torchvision.transforms as T
from torch.nn import functional as F
from torchmetrics import Accuracy

from pl_examples import cli_lightning_logo
from pl_examples.basic_examples.mnist_datamodule import MNIST
Expand All @@ -31,6 +32,7 @@ def __init__(self, model=None, lr=1.0, gamma=0.7, batch_size=32):
super().__init__()
self.save_hyperparameters()
self.model = model or Net()
self.test_acc = Accuracy()

def forward(self, x):
return self.model(x)
Expand All @@ -45,6 +47,7 @@ def test_step(self, batch, batch_idx):
x, y = batch
logits = self.forward(x)
loss = F.nll_loss(logits, y.long())
self.log("test_acc", self.test_acc(logits, y))
return loss

def configure_optimizers(self):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import torch
import torchvision.transforms as T
from torch.nn import functional as F
from torchmetrics import Accuracy

from pl_examples import cli_lightning_logo
from pl_examples.basic_examples.mnist_datamodule import MNIST
Expand All @@ -31,6 +32,7 @@ def __init__(self, model, lr=1.0, gamma=0.7, batch_size=32):
super().__init__()
self.save_hyperparameters()
self.model = model or Net()
self.test_acc = Accuracy()

def forward(self, x):
return self.model(x)
Expand All @@ -45,6 +47,7 @@ def test_step(self, batch, batch_idx):
x, y = batch
logits = self.forward(x)
loss = F.nll_loss(logits, y.long())
self.log("test_acc", self.test_acc(logits, y))
return loss

def configure_optimizers(self):
Expand Down
13 changes: 11 additions & 2 deletions pytorch_lightning/callbacks/progress/rich_progress.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def render(self, task: "Task") -> ProgressBar:
total=max(0, task.total),
completed=max(0, task.completed),
width=None if self.bar_width is None else max(1, self.bar_width),
pulse=not task.started or math.isfinite(task.remaining),
pulse=not task.started or not math.isfinite(task.remaining),
animation_time=task.get_time(),
style=self.style,
complete_style=self.complete_style,
Expand Down Expand Up @@ -195,6 +195,7 @@ class RichProgressBar(ProgressBarBase):
Args:
refresh_rate_per_second: the number of updates per second. If refresh_rate is 0, progress bar is disabled.
leave: Leaves the finished progress bar in the terminal at the end of the epoch. Default: False
theme: Contains styles used to stylize the progress bar.
Raises:
Expand All @@ -205,6 +206,7 @@ class RichProgressBar(ProgressBarBase):
def __init__(
self,
refresh_rate_per_second: int = 10,
leave: bool = False,
theme: RichProgressBarTheme = RichProgressBarTheme(),
) -> None:
if not _RICH_AVAILABLE:
Expand All @@ -213,6 +215,7 @@ def __init__(
)
super().__init__()
self._refresh_rate_per_second: int = refresh_rate_per_second
self._leave: bool = leave
self._enabled: bool = True
self.progress: Optional[Progress] = None
self.val_sanity_progress_bar_id: Optional[int] = None
Expand Down Expand Up @@ -323,9 +326,15 @@ def on_train_epoch_start(self, trainer, pl_module):
total_batches = total_train_batches + total_val_batches

train_description = self._get_train_description(trainer.current_epoch)
if self.main_progress_bar_id is not None and self._leave:
self._stop_progress()
self._init_progress(trainer, pl_module)
if self.main_progress_bar_id is None:
self.main_progress_bar_id = self._add_task(total_batches, train_description)
self.progress.reset(self.main_progress_bar_id, total=total_batches, description=train_description)
else:
self.progress.reset(
self.main_progress_bar_id, total=total_batches, description=train_description, visible=True
)

def on_validation_epoch_start(self, trainer, pl_module):
super().on_validation_epoch_start(trainer, pl_module)
Expand Down
4 changes: 4 additions & 0 deletions pytorch_lightning/core/hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,9 +314,13 @@ def __init__(self) -> None:
prepare_data_per_node:
If True, each LOCAL_RANK=0 will call prepare data.
Otherwise only NODE_RANK=0, LOCAL_RANK=0 will prepare data.
allow_zero_length_dataloader_with_multiple_devices:
If True, dataloader with zero length within local rank is allowed.
Default value is False.
"""
super().__init__()
self.prepare_data_per_node: bool = True
self.allow_zero_length_dataloader_with_multiple_devices: bool = False

def prepare_data(self) -> None:
"""Use this to download and prepare data.
Expand Down
18 changes: 14 additions & 4 deletions pytorch_lightning/trainer/data_loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
CaptureMapDataset,
FastForwardSampler,
)
from pytorch_lightning.utilities.data import has_iterable_dataset, has_len
from pytorch_lightning.utilities.data import has_iterable_dataset, has_len_all_ranks
from pytorch_lightning.utilities.enums import DistributedType
from pytorch_lightning.utilities.exceptions import MisconfigurationException
from pytorch_lightning.utilities.imports import _fault_tolerant_training
Expand Down Expand Up @@ -346,7 +346,12 @@ def reset_train_dataloader(self, model: Optional["pl.LightningModule"] = None) -
# wrap the sequence of train loaders to a CombinedLoader object for computing the num_training_batches
self.train_dataloader = CombinedLoader(self.train_dataloader, self._data_connector.multiple_trainloader_mode)

self.num_training_batches = len(self.train_dataloader) if has_len(self.train_dataloader) else float("inf")
module = model or self.lightning_module or self.datamodule
self.num_training_batches = (
len(self.train_dataloader)
if has_len_all_ranks(self.train_dataloader, self.training_type_plugin, module)
else float("inf")
)

if isinstance(self.limit_train_batches, int) or self.limit_train_batches == 0.0:
self.num_training_batches = min(self.num_training_batches, int(self.limit_train_batches))
Expand All @@ -371,7 +376,7 @@ def reset_train_dataloader(self, model: Optional["pl.LightningModule"] = None) -
"If you want to disable validation set `limit_val_batches` to 0.0 instead."
)
else:
if not has_len(self.train_dataloader):
if not has_len_all_ranks(self.train_dataloader, self.training_type_plugin, module):
if self.val_check_interval == 1.0:
self.val_check_batch = float("inf")
else:
Expand Down Expand Up @@ -452,9 +457,14 @@ def _reset_eval_dataloader(

# determine number of batches
# datasets could be none, 1 or 2+
module = model or self.lightning_module or self.datamodule
if len(dataloaders) != 0:
for i, dataloader in enumerate(dataloaders):
num_batches = len(dataloader) if has_len(dataloader) else float("inf")
num_batches = (
len(dataloader)
if has_len_all_ranks(dataloader, self.training_type_plugin, module)
else float("inf")
)
self._worker_check(dataloader, f"{mode.dataloader_prefix}_dataloader {i}")

# percent or num_steps
Expand Down
Loading

0 comments on commit f2b81f0

Please sign in to comment.