Data augmentation is an effective way to improve the performance of deep networks by exposing the models to more data variety. Many methods currently available are developed for high-level vision tasks (e.g., classification) and fewer are dedicated to low-level vision tasks (e.g., image restoration), specially under a GAN training formulation. The methods discarding or manipulating the pixels or features too much can potentially hamper the image restoration, where the spatial relationship is very important. In this repository there are three types of augmentations that can be used for different purposes according to the task at hand:
- Per image augmentation are single image augmentations that can be used to change (transform) the image distribution: add noise, blur, downscale interpolation method, etc. The augmentations are implemented in augmennt, '/codes/dataops/imresize.py' and '/codes/dataops/augmentations.py'.
- Batch augmentations (cutmix, mixup, blend, rgb, cutout, cutblur) is a series of augmentations that are applied to entire mini batches of images, some of which combine random regions from images inside the batch to other images in the batch. This mixture of augmentations (MoA) can help regularize the models to learn to identify locality to where and when transformations are applied. The batch augmentations are implemented in '/codes/dataops/batchaug.py'. Note that
Cutblur
has a special consideration and requires to use Pixel-Unshuffle. - Differential augmentations. Unlike the previous two augmentation types, which alter the images statistics and altering the data the Generator networks observe (potentially introducing distortions and color shifts), DiffAugment are augmentations for the Discriminator of a GAN formulation. These augmentations are applied to both
real
andfake
(generated) images and can be backpropagated to the generator, promoting Discriminators that do not memorize the exact training data and must learn to extract better details from the images. In effect, this allows to train GAN models with a fraction of the images of the typical case (even down to 10%), with limited impact on the results. The differential augmentations are implemented in '/codes/dataops/diffaug.py'.
In many cases the original trained models made available from research papers usually fail to produce good results with images from the wild (ie. the internet) and the reason is these networks are typically trained under specific conditions that can be evaluated and compared with previous research.
They generally assume the relationship between high resolution (HR
, or high quality ground-truth: HQ
, GT
) images and their low resolution (LR
or low-quality LQ
) pair is that of an ideal bicubic
kernel. The networks learn to revert this specific transformation, but most natural images have not been downscaled previously with a bicubic
kernel with antialiasing and the models fail to produce adequate results during inference with new images after the they have been trained. This is refered to as the non-blind SR (or image restoration) formulation.
This is demonstrated by custom trained models from the model database that have been trained with this repository that are able produce better results than most official models for their particular use case.
One of the first additions of this repository (2019) was to include additional downscaling methods (based on OpenCV: nearest
, bilinear
, bicubic
, lanczos
, in addition to the Matlab-like imresize antialiased bicubic
) that are applied on the fly, randomized if multiple of them are selected in the training options. Besides the downscaling method, an augmentation pipeline that includes applying random noises (two instances), compression and blur was also added to better approximate real world degradation in images. Due to the increased number and complexity of the image augmentation pipeline, these were outsourced to the augmennt repository, while the number of interpolation methods available was substantially increased and now includes antialiased versions of: blackman5
, blackman4
, blackman3
, blackman2
, sinc5
, sinc4
, sinc3
, sinc2
, gaussian
, hamming
, hanning
, catrom
, bell
, hermite
, mitchell
, cubic
(bicubic), lanczos5
, lanczos4
, lanczos3
, lanczos2
, box
, linear
(bilinear). Lastly, a new algorithm that solves typical downscaling issues present in most frameworks used to train machine learning models (OpenCV, PIL, Tensorflow, PyTorch, etc) was also adopted for the antialiased downscaling.
Later, in blind SR research papers like KSMR (2019) and Real-SR (2020) introduced the idea of using realistic kernels estimated from images (for example, using KernelGAN) and using this pool of extracted kernels to downscale HR
images and generate the LR
pairs. While the process of extracting the kernels has to be done offline (follow the instructions in DLIP), using this repository it is also possible to apply the kernels for realistically downscaling HR
images to generate the LR
on the fly, as part of the randomized methods. In addition, Real-SR
also used a previous idea of adding realistic noise extracted from images patches, which can be injected to the downscaled images that lose the natural noise properties in the downscaling process (follow the instructions in DLIP). These extracted natural noise patches can also be added to images on the fly. Using DLIP
it will be possible to apply these (and other) methods offline before training if preferred at a later time.
More recently (2021), in addition to the randomized downscaling, noise and blur pipeline, BSRGAN introduced the idea of using anisotropic gaussian blur kernels, "down and up" image scaling and RAW image camera sensor noise model (from reverse-forward camera image signal processing ISP pipeline model) to add more diversity to the degradation pipeline. These degradations are shuffled to better model real-world cases where degradations can happen in different order and, based on statistical reasoning, the shuffling greatly expand the degradations space. Real-ESRGAN (2021) repeats many of BSRGAN's additions (while ommiting some of them), and adds the sinc filter to reduce ringing artifacs in the models. Instead of shuffling the degradations, takes on a more brute-force approach where the degradation pipeline is repeated twice for the same purpose of expanding the degradations space.
Both Real-SR
and BSRGAN
use the ESRGAN
network with no change other than the images used to train the network, while Real-ESRGAN
only uses a UNet
discriminator with spectral norm besides the images, demonstrating there's still plenty of room for improvement based solely on the datasets used. Note that the strategies used in all three cases can be applied either individually or simultaneously in this repository, by the use of the proper configuration via the augmentations presets.
Unlike other repositories that are only degradation modeling-based (only the HR
target image is required and the corresponding LR
pair is generated on-the-fly, with the configured degradation and scaling options), this repository also allows using a image pairs-based strategy (where LR
and HR
are provided with the correct degradations and scale), where the code will deal with images differently, depending on their size. LR
images can be provided partially and will be used when available, otherwise the code will generate it on-the-fly automatically. Conversely, if all LR
images are provided, the aug_downscale
variable can be used to define a probability that LR images will be randomly generated on-the-fly, ignoring the provided LR
. More details in resizing options.
This uniquely allows to expand the degradations space with options like using images with other types of degradations that cannot be easily added on-the-fly, as well as preparing LR
images with networks trained to generate the low quality images like DSGAN and DeFlow.
Note that while the above examples mention only super-resolution and scaling, the strategies can also be used when no image scaling is required and only training for image restoration (denoising/deblurring). This is also refered to as 1x
scale. Pixel-Unshuffle can also be used to train, for example 1x and 2x models using exiting 4x architectures.
For more details on alternative strategies, research survey papers like those found here and here summarize the defining characteristics of other papers and how they relate to each other.