From dfe2343e0adfe5f421459fda399b3e45fca24e81 Mon Sep 17 00:00:00 2001 From: Emmanuel Benazera Date: Tue, 13 Jun 2023 17:13:50 +0200 Subject: [PATCH] feat(doc): datasets --- docs/source/dataloaders.rst | 135 +++++++++++++++++------ docs/source/datasets.rst | 211 +++++++++++++++++++----------------- 2 files changed, 208 insertions(+), 138 deletions(-) diff --git a/docs/source/dataloaders.rst b/docs/source/dataloaders.rst index a42dd1028..3cb326ec8 100644 --- a/docs/source/dataloaders.rst +++ b/docs/source/dataloaders.rst @@ -2,43 +2,106 @@ Dataloaders ############# -To choose a dataloader please use the flag ``--dataset_mode -dataloader_name``. - -******************* - Unaligned dataset -******************* - -Name : ``unaligned`` You need to create two directories to host images -from domain A ``/path/to/data/trainA`` and from domain B -``/path/to/data/trainB``. Then you can train the model with the dataset -flag ``--dataroot /path/to/data``. Optionally, you can create hold-out -test datasets at ``/path/to/data/testA`` and ``/path/to/data/testB`` to -test your model on unseen images. - -******************************* - Unaligned and labeled dataset -******************************* - -Name : ``unaligned_labeled`` You need to create two directories to host -images from domain A ``/path/to/data/trainA`` and from domain B -``/path/to/data/trainB``. In ``trainA``, you have to separate your data -into directories, each directory belongs to a class. Then you can train -the model with the dataset flag ``--dataroot /path/to/data``. -Optionally, you can create hold-out test datasets at -``/path/to/data/testA`` and ``/path/to/data/testB`` to test your model -on unseen images. +Every dataset type requires a dedicated dataloader, for the classes, +masks or bounding boxes to be processed accordingly. +There are two types of datasets and thus dataloaders: + +- **Unaligned datasets**: images from domain A and B do not come as pairs +- **Aligned datasets**: images from domain A and B are paired + +Next, there's a special type of dataloaders for online pre-processing +and augmentation: + +- **Online**: dataloaders that automatically crop and zoom in / zoom out + around various elements of images, most often according to masks and + bounding boxes. + +Online dataloaders are one of the great features of JoliGEN: they +allow to avoid a long pre-processing of the datasets, e.g. resizing +images, generating crops of interest, etc... + +There's a special type of dataloaders for sequential data: + +- **Temporal**: dataloaders that load sequences of images, used for + temporal discriminators in GANs, and temporal conditioning in DDPMs. + +Finally, there's a special type of self-supervised dataloader, used by +DDPMs only: + +- **self-supervised**: dataloaders that modify the input data in order to + generate self-supervised tasks, e.g. by removing portions of an + image for training an inpainting DDPM. + +To choose a dataloader please use the flag ``--dataset_mode dataloader_name``. + +******************** +List of dataloaders +******************** + +- unaligned: basic unaligned, e.g. horse2zebra dataset +- unaligned_labeled_cls: unaligned with classes +- unaligned_labeled_mask: unaligned with masks +- unaligned_labeled_mask_online: unaligned with masks with online + croping around masks +- unaligned_labeled_mask_cls_online: unaligned with masks and classes + with online croping around masks + +- self_supervised_labeled_cls: with class labels +- self_supervised_labeled_mask: with mask labels +- self_supervised_labeled_mask_online: with mask labels and online + croping around masks +- self_supervised_labeled_mask_cls_online: with class and mask labels, + and online croping around masks + +- temporal: basic temporal (sequential) loader +- self_supervised_temporal: self-supervised version of the temporal + loader, for DDPMs + ******************************************** - Unaligned and labeled (with masks) dataset + Online Dataloaders and Options ******************************************** -Name : ``unaligned_labeled_mask`` For each domain A and B, you have to -create a file :code:paths.txt` which each line gives paths to the image -and to the mask, separated by space, e.g. ``path/to/image -path/to/mask``. You need two create two directories to host -``paths.txt`` from each domain A ``/path/to/data/trainA`` and from -domain B ``/path/to/data/trainB``. Then you can train the model with the -dataset flag ``--dataroot /path/to/data``. Optionally, you can create -hold-out test datasets at ``/path/to/data/testA`` and -``/path/to/data/testB`` to test your model on unseen images. +Online dataloaders are useful when: + +- Images are too large to be processed fully +- Dataset is labeled and model training should concentrate on labeled areas +- Dataset is small and can benefit from augmentation from random cropping + +The online dataloader applies the following steps: + +- Loads the input image according to `--data_oinline_creation_load_size_{A,B}` +- Pick a bounding box randomly +- Builds a mask from the bounding box +- Crop around the bounding box according to fixed size + `--data_online_creation_crop_size_{A,B}` +- Randomly pick and apply a positive or negative offset to the crop + size according to `--data_online_creation_crop_delta_{A,B}`. This + allows random variations around the fixed size of the crop. +- Randomly pick and apply a positive or negative offset to the mask + according to `--data_online_creation_mask_delta_{A,B}`. This step + allows for an object in domain A to roughly match the size of an + object in domain B. E.g. turning cars into buses requires an offset + on masks from the car domain so that the mask can fit a bus. + +******************************************** + Temporal Dataloaders and Options +******************************************** + +Temporal dataloaders read sequences of images. This is useful in two +cases: + +- Temporal smoothing with GANs using a temporal discriminator +- Frame conditioning with DDPMs + +The temporal dataloader applies the following steps: + +- Uses the `--data_temporal_num_common_char` to sort the frames in the + dataset. The number of common characters defines the length of the + filename that should not be used for sorting. E.g. sorting files + named `image_xxxx` would use 6 as the number of common chars, + sorting based on xxxx only. +- Selects `--data_temporal_number_frames` interleaved with + `--data_temporal_frame_step` frames. This is useful to control the + temporal smoothing or conditioning independently from the video's + true frames per second. diff --git a/docs/source/datasets.rst b/docs/source/datasets.rst index a25ccba35..6df573e4d 100644 --- a/docs/source/datasets.rst +++ b/docs/source/datasets.rst @@ -4,122 +4,94 @@ Dataset formats ################# +JoliGEN supports datasets with and without labels. Labeled datasets +are useful because they allow for more fine-grained control of +generated images. + +Broadly speaking, labels do help constrain the search space of all +possible combinations of generated pixels. For this reason labels are +sometimes refered to as semantic constraints. + +For instance: + +- **class labels** allow to ensure matching between input and output + image, e.g. turn Mario into Sonic while keeping the action (jump, + kneel, run, ..) + +- **mask labels** allow conserve or modify only appropriate areas, + e.g. generate a car and conserve everything around it + + JoliGEN allows to derive rectangular masks from **bounding boxes**, + and more precise masks automatically with SAM. + .. _datasets-unlabeled: ******************* Unlabeled Datasets ******************* -To train a model on your own datasets, you need to create a data folder -with two subdirectories ``trainA`` and ``trainB`` that contain images -from domain A and B. You can test your model on your training set by -setting ``--phase train`` in ``test.py``. You can also create -subdirectories ``testA``` and ``testB`` if you have test data. - -.. _datasets-labels: - -********************** - Datasets with labels -********************** - -Create ``trainA`` and ``trainB`` directories as described for CycleGAN -datasets. In ``trainA``, you have to separate your data into -directories, each directory belongs to a class. - -.. _datasets-masks: - -********************* - Datasets with masks -********************* - -You can use a dataset made of images and their mask labels (it can be -segmentation or attention masks). To do so, you have to generate masks -which are pixel labels for the images. If you have n different classes, -pixel values in the mask have to be between 0 and n-1. You can specify -the number of classes with the flag ``--semantic_nclasses n``. - -.. _datasets-example: - -****************** - Example Datasets -****************** - -.. _datasets-example-im2im-without-semantics: - -Image to image without semantics -================================ +Unlabeled dataset comes as a data folder with two subdirectories +``trainA`` and ``trainB`` that contain images from domain A and B +respectively. +Subdirectories ``testA`` and ``testB`` can be added for test data. Example: horse to zebra from two sets of images - Dataset: https://joligen.com/datasets/horse2zebra.zip -.. code:: +.. code-block:: bash - horse2zebra/ - horse2zebra/trainA # horse images - horse2zebra/trainB # zebra images - horse2zebra/testA - horse2zebra/testB + horse2zebra/ + horse2zebra/trainA # horse images + horse2zebra/trainB # zebra images + horse2zebra/testA + horse2zebra/testB -.. _datasets-example-im2im-with-class-semantics: - -Image to image with class semantics -=================================== - -Example: font number conversion +.. _datasets-labels: -Dataset: https://joligen.com/datasets/mnist2USPS.zip +*************************** + Datasets with class labels +*************************** -.. code:: +A class label is a label that hold for the full image. - mnist2USPS/ - mnist2USPS/trainA - mnist2USPS/trainA/0 # images of number 0 - mnist2USPS/trainA/1 # images of number 1 - mnist2USPS/trainA/2 # images of number 2 - ... - mnist2USPS/trainB - mnist2USPS/trainB/0 # images of target number 0 - mnist2USPS/trainB/1 # images of target number 1 - mnist2USPS/trainB/2 # images of target number 2 +Dataset with class label has ``trainA`` and ``trainB`` directories. In +``trainA``, every class comes as a separate directory that holds +images for this class. -.. _datasets-example-im2im-with-mask-semantics: +Example: font number conversion +Dataset: https://joligen.com/datasets/mnist2USPS.zip -Image to image with mask semantics -================================== +.. code-block:: bash + + mnist2USPS/ + mnist2USPS/trainA + mnist2USPS/trainA/0 # images of number 0 + mnist2USPS/trainA/1 # images of number 1 + mnist2USPS/trainA/2 # images of number 2 + ... + mnist2USPS/trainB + mnist2USPS/trainB/0 # images of target number 0 + mnist2USPS/trainB/1 # images of target number 1 + mnist2USPS/trainB/2 # images of target number 2 -Example: Add glasses to a face without modifying the rest of the face +.. _datasets-bbox: -Dataset: -https://joligen.com/datasets/noglasses2glasses_ffhq_mini.zip +***************************** + Datasets with bounding boxes +***************************** -Full dataset: -https://joligen.com/datasets/noglasses2glasses_ffhq.zip +Bounding boxes are elements location in format .. code:: - noglasses2glasses_ffhq_mini - noglasses2glasses_ffhq_mini/trainA - noglasses2glasses_ffhq_mini/trainA/img - noglasses2glasses_ffhq_mini/trainA/img/0000.png # source image, e.g. face without glasses - ... - noglasses2glasses_ffhq_mini/trainA/bbox - noglasses2glasses_ffhq_mini/trainA/bbox/0000.png # source mask, e.g. mask around eyes - ... - noglasses2glasses_ffhq_mini/trainA/paths.txt # list of associated source / mask images - noglasses2glasses_ffhq_mini/trainB - noglasses2glasses_ffhq_mini/trainB/img - noglasses2glasses_ffhq_mini/trainB/img/0000.png # target image, e.g. face with glasses - ... - noglasses2glasses_ffhq_mini/trainB/bbox - noglasses2glasses_ffhq_mini/trainB/bbox/0000.png # target mask, e.g. mask around glasses - ... - noglasses2glasses_ffhq_mini/trainB/paths.txt # list of associated target / mask images + cls xmin ymin xmax ymax -.. _datasets-example-im2im-with-bbox-semantics: +where ``cls`` is an integer for the class, starting from 1. -Image to image with bounding box semantics -========================================== +Dataset with bounding boxes comes as a data folder with two subdirectories +``trainA`` and ``trainB`` that contain two subdirectories ``imgs`` and +``bbox``. In ``imgs`` the image files are stored, and ``bbox`` +contains a .txt file per image, that lists the boxes for that image. Example: Super Mario to Sonic while preserving the position and action, e.g. crouch, jump, still, ... @@ -173,10 +145,53 @@ in this order: where ``cls`` is the class, in this dataset ``2`` means ``running``. -.. _datasets-example-im2im-with-bbox-class-semantics: -Image to image with multiple semantics: bounding box and class -============================================================== +.. _datasets-masks: + +********************* + Datasets with masks +********************* + +Dataset with mask labels contain the subdirectories ``trainA`` and +``trainB``, each with two subdirectories ``imgs`` and ``bbox``. In +``imgs`` are the image files. In ``masks`` are the mask files. +A mask file is a single channel (B&W) image with labels as pixel +values. For n different classes, pixel values in the mask have to be +between 0 and n-1. The number of classes needs to be specified at +training time with ``--f_s_semantic_nclasses n``. + +Example: Add glasses to a face without modifying the rest of the face + +Dataset: +https://joligen.com/datasets/noglasses2glasses_ffhq_mini.zip + +Full dataset: +https://joligen.com/datasets/noglasses2glasses_ffhq.zip + +.. code:: + + noglasses2glasses_ffhq_mini + noglasses2glasses_ffhq_mini/trainA + noglasses2glasses_ffhq_mini/trainA/img + noglasses2glasses_ffhq_mini/trainA/img/0000.png # source image, e.g. face without glasses + ... + noglasses2glasses_ffhq_mini/trainA/bbox + noglasses2glasses_ffhq_mini/trainA/bbox/0000.png # source mask, e.g. mask around eyes + ... + noglasses2glasses_ffhq_mini/trainA/paths.txt # list of associated source / mask images + noglasses2glasses_ffhq_mini/trainB + noglasses2glasses_ffhq_mini/trainB/img + noglasses2glasses_ffhq_mini/trainB/img/0000.png # target image, e.g. face with glasses + ... + noglasses2glasses_ffhq_mini/trainB/bbox + noglasses2glasses_ffhq_mini/trainB/bbox/0000.png # target mask, e.g. mask around glasses + ... + noglasses2glasses_ffhq_mini/trainB/paths.txt # list of associated target / mask images + + +************************************************* + Datasets with bounding box and image-level class +************************************************* Example: Image seasonal modification while preserving objects with mask (cars, pedestrians, ...) and overall image weather (snow, rain, clear, @@ -208,11 +223,3 @@ https://joligen.com/datasets/daytime2dawn_dusk_lite.zip in this order: ``source image path``, ``image class``, ``image mask``, where ``image class`` in this dataset represents the weather class. - -.. _datasets-example-im2im-with-other-semantics: - -Other semantics -=============== - -Other semantics are possible, i.e. an algorithm that runs on both source -and target.