feat(doc): datasets

jolibrain · Jun 21, 2023 · dfe2343 · dfe2343
1 parent 5fdaa28
commit dfe2343
Show file tree

Hide file tree

Showing 2 changed files with 208 additions and 138 deletions.
diff --git a/docs/source/dataloaders.rst b/docs/source/dataloaders.rst
@@ -2,43 +2,106 @@
  Dataloaders
 #############
 
-To choose a dataloader please use the flag ``--dataset_mode
-dataloader_name``.
-
-*******************
- Unaligned dataset
-*******************
-
-Name : ``unaligned`` You need to create two directories to host images
-from domain A ``/path/to/data/trainA`` and from domain B
-``/path/to/data/trainB``. Then you can train the model with the dataset
-flag ``--dataroot /path/to/data``. Optionally, you can create hold-out
-test datasets at ``/path/to/data/testA`` and ``/path/to/data/testB`` to
-test your model on unseen images.
-
-*******************************
- Unaligned and labeled dataset
-*******************************
-
-Name : ``unaligned_labeled`` You need to create two directories to host
-images from domain A ``/path/to/data/trainA`` and from domain B
-``/path/to/data/trainB``. In ``trainA``, you have to separate your data
-into directories, each directory belongs to a class. Then you can train
-the model with the dataset flag ``--dataroot /path/to/data``.
-Optionally, you can create hold-out test datasets at
-``/path/to/data/testA`` and ``/path/to/data/testB`` to test your model
-on unseen images.
+Every dataset type requires a dedicated dataloader, for the classes,
+masks or bounding boxes to be processed accordingly.
 
+There are two types of datasets and thus dataloaders:
+
+- **Unaligned datasets**: images from domain A and B do not come as pairs
+- **Aligned datasets**: images from domain A and B are paired
+
+Next, there's a special type of dataloaders for online pre-processing
+and augmentation:
+
+- **Online**: dataloaders that automatically crop and zoom in / zoom out
+  around various elements of images, most often according to masks and
+  bounding boxes.
+
+Online dataloaders are one of the great features of JoliGEN: they
+allow to avoid a long pre-processing of the datasets, e.g. resizing
+images, generating crops of interest, etc...
+
+There's a special type of dataloaders for sequential data:
+
+- **Temporal**: dataloaders that load sequences of images, used for
+  temporal discriminators in GANs, and temporal conditioning in DDPMs.
+
+Finally, there's a special type of self-supervised dataloader, used by
+DDPMs only:
+
+- **self-supervised**: dataloaders that modify the input data in order to
+  generate self-supervised tasks, e.g. by removing portions of an
+  image for training an inpainting DDPM.
+
+To choose a dataloader please use the flag ``--dataset_mode dataloader_name``.
+
+********************
+List of dataloaders
+********************
+
+- unaligned: basic unaligned, e.g. horse2zebra dataset
+- unaligned_labeled_cls: unaligned with classes
+- unaligned_labeled_mask: unaligned with masks
+- unaligned_labeled_mask_online: unaligned with masks with online
+  croping around masks
+- unaligned_labeled_mask_cls_online: unaligned with masks and classes
+  with online croping around masks
+
+- self_supervised_labeled_cls: with class labels
+- self_supervised_labeled_mask: with mask labels
+- self_supervised_labeled_mask_online: with mask labels and online
+  croping around masks
+- self_supervised_labeled_mask_cls_online: with class and mask labels,
+  and online croping around masks
+
+- temporal: basic temporal (sequential) loader
+- self_supervised_temporal: self-supervised version of the temporal
+  loader, for DDPMs
+
 ********************************************
- Unaligned and labeled (with masks) dataset
+ Online Dataloaders and Options
 ********************************************
 
-Name : ``unaligned_labeled_mask`` For each domain A and B, you have to
-create a file :code:paths.txt` which each line gives paths to the image
-and to the mask, separated by space, e.g. ``path/to/image
-path/to/mask``. You need two create two directories to host
-``paths.txt`` from each domain A ``/path/to/data/trainA`` and from
-domain B ``/path/to/data/trainB``. Then you can train the model with the
-dataset flag ``--dataroot /path/to/data``. Optionally, you can create
-hold-out test datasets at ``/path/to/data/testA`` and
-``/path/to/data/testB`` to test your model on unseen images.
+Online dataloaders are useful when:
+
+- Images are too large to be processed fully
+- Dataset is labeled and model training should concentrate on labeled areas
+- Dataset is small and can benefit from augmentation from random cropping
+
+The online dataloader applies the following steps:
+
+- Loads the input image according to `--data_oinline_creation_load_size_{A,B}`
+- Pick a bounding box randomly
+- Builds a mask from the bounding box
+- Crop around the bounding box according to fixed size
+  `--data_online_creation_crop_size_{A,B}` 
+- Randomly pick and apply a positive or negative offset to the crop
+  size according to `--data_online_creation_crop_delta_{A,B}`. This
+  allows random variations around the fixed size of the crop.
+- Randomly pick and apply a positive or negative offset to the mask
+  according to `--data_online_creation_mask_delta_{A,B}`. This step
+  allows for an object in domain A to roughly match the size of an
+  object in domain B. E.g. turning cars into buses requires an offset
+  on masks from the car domain so that the mask can fit a bus.
+
+********************************************
+ Temporal Dataloaders and Options
+********************************************
+
+Temporal dataloaders read sequences of images. This is useful in two
+cases:
+
+- Temporal smoothing with GANs using a temporal discriminator
+- Frame conditioning with DDPMs
+
+The temporal dataloader applies the following steps:
+
+- Uses the `--data_temporal_num_common_char` to sort the frames in the
+  dataset. The number of common characters defines the length of the
+  filename that should not be used for sorting. E.g. sorting files
+  named `image_xxxx` would use 6 as the number of common chars,
+  sorting based on xxxx only.
+- Selects `--data_temporal_number_frames` interleaved with
+  `--data_temporal_frame_step` frames. This is useful to control the
+  temporal smoothing or conditioning independently from the video's
+  true frames per second.
diff --git a/docs/source/datasets.rst b/docs/source/datasets.rst
@@ -4,122 +4,94 @@
  Dataset formats
 #################
 
+JoliGEN supports datasets with and without labels. Labeled datasets
+are useful because they allow for more fine-grained control of
+generated images.
+
+Broadly speaking, labels do help constrain the search space of all
+possible combinations of generated pixels. For this reason labels are
+sometimes refered to as semantic constraints.
+
+For instance:
+
+- **class labels** allow to ensure matching between input and output
+  image, e.g. turn Mario into Sonic while keeping the action (jump,
+  kneel, run, ..)
+
+- **mask labels** allow conserve or modify only appropriate areas,
+  e.g. generate a car and conserve everything around it
+
+  JoliGEN allows to derive rectangular masks from **bounding boxes**,
+  and more precise masks automatically with SAM.
+
 .. _datasets-unlabeled:
 
 *******************
  Unlabeled Datasets
 *******************
 
-To train a model on your own datasets, you need to create a data folder
-with two subdirectories ``trainA`` and ``trainB`` that contain images
-from domain A and B. You can test your model on your training set by
-setting ``--phase train`` in ``test.py``. You can also create
-subdirectories ``testA``` and ``testB`` if you have test data.
-
-.. _datasets-labels:
-
-**********************
- Datasets with labels
-**********************
-
-Create ``trainA`` and ``trainB`` directories as described for CycleGAN
-datasets. In ``trainA``, you have to separate your data into
-directories, each directory belongs to a class.
-
-.. _datasets-masks:
-
-*********************
- Datasets with masks
-*********************
-
-You can use a dataset made of images and their mask labels (it can be
-segmentation or attention masks). To do so, you have to generate masks
-which are pixel labels for the images. If you have n different classes,
-pixel values in the mask have to be between 0 and n-1. You can specify
-the number of classes with the flag ``--semantic_nclasses n``.
-
-.. _datasets-example:
-
-******************
- Example Datasets
-******************
-
-.. _datasets-example-im2im-without-semantics:
-
-Image to image without semantics
-================================
+Unlabeled dataset comes as a data folder with two subdirectories
+``trainA`` and ``trainB`` that contain images from domain A and B
+respectively.
+Subdirectories ``testA`` and ``testB`` can be added for test data.
 
 Example: horse to zebra from two sets of images
-
 Dataset: https://joligen.com/datasets/horse2zebra.zip
 
-.. code::
+.. code-block:: bash
 
-   horse2zebra/
-   horse2zebra/trainA  # horse images
-   horse2zebra/trainB  # zebra images
-   horse2zebra/testA
-   horse2zebra/testB
+		horse2zebra/
+		horse2zebra/trainA  # horse images
+		horse2zebra/trainB  # zebra images
+		horse2zebra/testA
+		horse2zebra/testB
 
-.. _datasets-example-im2im-with-class-semantics:
-
-Image to image with class semantics
-===================================
-
-Example: font number conversion
+.. _datasets-labels:
 
-Dataset: https://joligen.com/datasets/mnist2USPS.zip
+***************************
+ Datasets with class labels
+***************************
 
-.. code::
+A class label is a label that hold for the full image.
 
-   mnist2USPS/
-   mnist2USPS/trainA
-   mnist2USPS/trainA/0  # images of number 0
-   mnist2USPS/trainA/1  # images of number 1
-   mnist2USPS/trainA/2  # images of number 2
-   ...
-   mnist2USPS/trainB
-   mnist2USPS/trainB/0  # images of target number 0
-   mnist2USPS/trainB/1  # images of target number 1
-   mnist2USPS/trainB/2  # images of target number 2
+Dataset with class label has ``trainA`` and ``trainB`` directories. In
+``trainA``, every class comes as a separate directory that holds
+images for this class.
 
-.. _datasets-example-im2im-with-mask-semantics:
+Example: font number conversion
+Dataset: https://joligen.com/datasets/mnist2USPS.zip
 
-Image to image with mask semantics
-==================================
+.. code-block:: bash
+		
+		mnist2USPS/
+		mnist2USPS/trainA
+		mnist2USPS/trainA/0  # images of number 0
+		mnist2USPS/trainA/1  # images of number 1
+		mnist2USPS/trainA/2  # images of number 2
+		...
+		mnist2USPS/trainB
+		mnist2USPS/trainB/0  # images of target number 0
+		mnist2USPS/trainB/1  # images of target number 1
+		mnist2USPS/trainB/2  # images of target number 2
 
-Example: Add glasses to a face without modifying the rest of the face
+.. _datasets-bbox:
 
-Dataset:
-https://joligen.com/datasets/noglasses2glasses_ffhq_mini.zip
+*****************************
+ Datasets with bounding boxes
+*****************************
 
-Full dataset:
-https://joligen.com/datasets/noglasses2glasses_ffhq.zip
+Bounding boxes are elements location in format
 
 .. code::
 
-   noglasses2glasses_ffhq_mini
-   noglasses2glasses_ffhq_mini/trainA
-   noglasses2glasses_ffhq_mini/trainA/img
-   noglasses2glasses_ffhq_mini/trainA/img/0000.png # source image, e.g. face without glasses
-   ...
-   noglasses2glasses_ffhq_mini/trainA/bbox
-   noglasses2glasses_ffhq_mini/trainA/bbox/0000.png # source mask, e.g. mask around eyes
-   ...
-   noglasses2glasses_ffhq_mini/trainA/paths.txt # list of associated source / mask images
-   noglasses2glasses_ffhq_mini/trainB
-   noglasses2glasses_ffhq_mini/trainB/img
-   noglasses2glasses_ffhq_mini/trainB/img/0000.png # target image, e.g. face with glasses
-   ...
-   noglasses2glasses_ffhq_mini/trainB/bbox
-   noglasses2glasses_ffhq_mini/trainB/bbox/0000.png # target mask, e.g. mask around glasses
-   ...
-   noglasses2glasses_ffhq_mini/trainB/paths.txt # list of associated target / mask images
+   cls xmin ymin xmax ymax
 
-.. _datasets-example-im2im-with-bbox-semantics:
+where ``cls`` is an integer for the class, starting from 1.
 
-Image to image with bounding box semantics
-==========================================
+Dataset with bounding boxes comes as a data folder with two subdirectories
+``trainA`` and ``trainB`` that contain two subdirectories ``imgs`` and
+``bbox``. In ``imgs`` the image files are stored, and ``bbox``
+contains a .txt file per image, that lists the boxes for that image.
 
 Example: Super Mario to Sonic while preserving the position and action,
 e.g. crouch, jump, still, ...
@@ -173,10 +145,53 @@ in this order:
 
 where ``cls`` is the class, in this dataset ``2`` means ``running``.
 
-.. _datasets-example-im2im-with-bbox-class-semantics:
 
-Image to image with multiple semantics: bounding box and class
-==============================================================
+.. _datasets-masks:
+
+*********************
+ Datasets with masks
+*********************
+
+Dataset with mask labels contain the subdirectories ``trainA`` and
+``trainB``, each with two subdirectories ``imgs`` and ``bbox``. In
+``imgs`` are the image files. In ``masks`` are the mask files. 
+A mask file is a single channel (B&W) image with labels as pixel
+values. For n different classes, pixel values in the mask have to be
+between 0 and n-1. The number of classes needs to be specified at
+training time with ``--f_s_semantic_nclasses n``.
+
+Example: Add glasses to a face without modifying the rest of the face
+
+Dataset:
+https://joligen.com/datasets/noglasses2glasses_ffhq_mini.zip
+
+Full dataset:
+https://joligen.com/datasets/noglasses2glasses_ffhq.zip
+
+.. code::
+
+   noglasses2glasses_ffhq_mini
+   noglasses2glasses_ffhq_mini/trainA
+   noglasses2glasses_ffhq_mini/trainA/img
+   noglasses2glasses_ffhq_mini/trainA/img/0000.png # source image, e.g. face without glasses
+   ...
+   noglasses2glasses_ffhq_mini/trainA/bbox
+   noglasses2glasses_ffhq_mini/trainA/bbox/0000.png # source mask, e.g. mask around eyes
+   ...
+   noglasses2glasses_ffhq_mini/trainA/paths.txt # list of associated source / mask images
+   noglasses2glasses_ffhq_mini/trainB
+   noglasses2glasses_ffhq_mini/trainB/img
+   noglasses2glasses_ffhq_mini/trainB/img/0000.png # target image, e.g. face with glasses
+   ...
+   noglasses2glasses_ffhq_mini/trainB/bbox
+   noglasses2glasses_ffhq_mini/trainB/bbox/0000.png # target mask, e.g. mask around glasses
+   ...
+   noglasses2glasses_ffhq_mini/trainB/paths.txt # list of associated target / mask images
+
+
+*************************************************
+ Datasets with bounding box and image-level class
+*************************************************
 
 Example: Image seasonal modification while preserving objects with mask
 (cars, pedestrians, ...) and overall image weather (snow, rain, clear,
@@ -208,11 +223,3 @@ https://joligen.com/datasets/daytime2dawn_dusk_lite.zip
 
 in this order: ``source image path``, ``image class``, ``image mask``,
 where ``image class`` in this dataset represents the weather class.
-
-.. _datasets-example-im2im-with-other-semantics:
-
-Other semantics
-===============
-
-Other semantics are possible, i.e. an algorithm that runs on both source
-and target.