LightningDataModule: GPU data augmentation support #10469

adamjstewart · 2021-11-10T21:44:27Z

🚀 Feature

Data augmentation libraries like Kornia support computation directly on the GPU, greatly speeding up the rate at which images can be sampled. I would like to be able to perform these kinds of GPU transforms in a LightningDataModule.

Motivation

In TorchGeo, we use PyTorch Lightning to organize reproducible benchmarks for geospatial datasets. Currently, we have a set of LightningDataModules for each dataset and a much smaller number of LightningModules for each task (semantic segmentation, classification, regression, etc.). However, the LightningDataModule doesn't seem to know anything about the GPU, and the LightningModule doesn't seem to know anything about the LightningDataModule. Because of this, if we want to perform dataset-specific augmentations on the GPU, we're forced to create a separate LightningModule for each dataset, increasing code duplication and defeating the whole purpose of PyTorch Lightning.

Pitch

The purpose of a LightningDataModule is to handle all dataset-specific loading and augmentation so that a generic LightningModule can handle the actual training and evaluation. However, in order to take advantage of GPU-accelerated libraries like Kornia, we're currently forced to move this logic to a LightningModule. As datasets continue to increase in size, direct support for GPU-accelerated transforms in LightningDataModules will increase in importance.

Alternatives

So far the only alternative we've found is to create a different LightningModule for each dataset and include the data augmentation there. If there's a better alternative to this we would love to know about it!

@calebrob6 @isaaccorley

tchaton · 2021-11-11T09:04:49Z

Hey @adamjstewart,

Awesome work on TorchGeo !

You can actually easily enable GPU transform from your DataModule.

Check this DataModule in Lightning Flash: https://github.com/PyTorchLightning/lightning-flash/blob/master/flash/core/data/new_data_module.py#L240.

The DataModule implements an on_after_batch_transfer hook which will be attached to the Model during training and applied right after the batch has been transferred to the device.

In Flash, the dataset owns their transform, and we extract the on_after_batch_transfer_fn directly from the InputTransform.

Have a look at our tutorial: https://github.com/PyTorchLightning/lightning-flash/blob/master/flash_examples/flash_components/custom_data_loading.py

Best,
T.C

adamjstewart · 2021-11-11T15:27:40Z

Thanks @tchaton, I knew there had to be some way to do it! We'll try this out and let you know if we hit any snags.

adamjstewart · 2021-12-17T22:49:22Z

@tchaton the docs for on_after_batch_transfer mention using self.trainer.training but I can't see where this attribute is defined. Is this a boolean? Also, one of the links that you shared before seems to be dead now.

tchaton · 2021-12-20T14:21:31Z

Hey @adamjstewart,

Yes, it is a boolean. It is defined there on the Trainer: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L2028.

By the way, we refactored entirely Lightning Flash Data API and I believe you might want to have a look to better organize your own library.

Best,
T.C

adamjstewart added the feature Is an improvement or enhancement label Nov 10, 2021

adamjstewart mentioned this issue Nov 10, 2021

Adding ETCI2021 datamodule and trainer microsoft/torchgeo#234

Merged

adamjstewart closed this as completed Nov 11, 2021

adamjstewart mentioned this issue Dec 15, 2021

Remove dataset-specific trainers microsoft/torchgeo#286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LightningDataModule: GPU data augmentation support #10469

LightningDataModule: GPU data augmentation support #10469

adamjstewart commented Nov 10, 2021

tchaton commented Nov 11, 2021 •

edited

Loading

adamjstewart commented Nov 11, 2021

adamjstewart commented Dec 17, 2021

tchaton commented Dec 20, 2021 •

edited

Loading

LightningDataModule: GPU data augmentation support #10469

LightningDataModule: GPU data augmentation support #10469

Comments

adamjstewart commented Nov 10, 2021

🚀 Feature

Motivation

Pitch

Alternatives

tchaton commented Nov 11, 2021 • edited Loading

adamjstewart commented Nov 11, 2021

adamjstewart commented Dec 17, 2021

tchaton commented Dec 20, 2021 • edited Loading

tchaton commented Nov 11, 2021 •

edited

Loading

tchaton commented Dec 20, 2021 •

edited

Loading