Skip to content

Commit

Permalink
feat(doc): datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
beniz committed Jun 21, 2023
1 parent 5fdaa28 commit dfe2343
Show file tree
Hide file tree
Showing 2 changed files with 208 additions and 138 deletions.
135 changes: 99 additions & 36 deletions docs/source/dataloaders.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,43 +2,106 @@
Dataloaders
#############

To choose a dataloader please use the flag ``--dataset_mode
dataloader_name``.

*******************
Unaligned dataset
*******************

Name : ``unaligned`` You need to create two directories to host images
from domain A ``/path/to/data/trainA`` and from domain B
``/path/to/data/trainB``. Then you can train the model with the dataset
flag ``--dataroot /path/to/data``. Optionally, you can create hold-out
test datasets at ``/path/to/data/testA`` and ``/path/to/data/testB`` to
test your model on unseen images.

*******************************
Unaligned and labeled dataset
*******************************

Name : ``unaligned_labeled`` You need to create two directories to host
images from domain A ``/path/to/data/trainA`` and from domain B
``/path/to/data/trainB``. In ``trainA``, you have to separate your data
into directories, each directory belongs to a class. Then you can train
the model with the dataset flag ``--dataroot /path/to/data``.
Optionally, you can create hold-out test datasets at
``/path/to/data/testA`` and ``/path/to/data/testB`` to test your model
on unseen images.
Every dataset type requires a dedicated dataloader, for the classes,
masks or bounding boxes to be processed accordingly.

There are two types of datasets and thus dataloaders:

- **Unaligned datasets**: images from domain A and B do not come as pairs
- **Aligned datasets**: images from domain A and B are paired

Next, there's a special type of dataloaders for online pre-processing
and augmentation:

- **Online**: dataloaders that automatically crop and zoom in / zoom out
around various elements of images, most often according to masks and
bounding boxes.

Online dataloaders are one of the great features of JoliGEN: they
allow to avoid a long pre-processing of the datasets, e.g. resizing
images, generating crops of interest, etc...

There's a special type of dataloaders for sequential data:

- **Temporal**: dataloaders that load sequences of images, used for
temporal discriminators in GANs, and temporal conditioning in DDPMs.

Finally, there's a special type of self-supervised dataloader, used by
DDPMs only:

- **self-supervised**: dataloaders that modify the input data in order to
generate self-supervised tasks, e.g. by removing portions of an
image for training an inpainting DDPM.

To choose a dataloader please use the flag ``--dataset_mode dataloader_name``.

********************
List of dataloaders
********************

- unaligned: basic unaligned, e.g. horse2zebra dataset
- unaligned_labeled_cls: unaligned with classes
- unaligned_labeled_mask: unaligned with masks
- unaligned_labeled_mask_online: unaligned with masks with online
croping around masks
- unaligned_labeled_mask_cls_online: unaligned with masks and classes
with online croping around masks

- self_supervised_labeled_cls: with class labels
- self_supervised_labeled_mask: with mask labels
- self_supervised_labeled_mask_online: with mask labels and online
croping around masks
- self_supervised_labeled_mask_cls_online: with class and mask labels,
and online croping around masks

- temporal: basic temporal (sequential) loader
- self_supervised_temporal: self-supervised version of the temporal
loader, for DDPMs

********************************************
Unaligned and labeled (with masks) dataset
Online Dataloaders and Options
********************************************

Name : ``unaligned_labeled_mask`` For each domain A and B, you have to
create a file :code:paths.txt` which each line gives paths to the image
and to the mask, separated by space, e.g. ``path/to/image
path/to/mask``. You need two create two directories to host
``paths.txt`` from each domain A ``/path/to/data/trainA`` and from
domain B ``/path/to/data/trainB``. Then you can train the model with the
dataset flag ``--dataroot /path/to/data``. Optionally, you can create
hold-out test datasets at ``/path/to/data/testA`` and
``/path/to/data/testB`` to test your model on unseen images.
Online dataloaders are useful when:

- Images are too large to be processed fully
- Dataset is labeled and model training should concentrate on labeled areas
- Dataset is small and can benefit from augmentation from random cropping

The online dataloader applies the following steps:

- Loads the input image according to `--data_oinline_creation_load_size_{A,B}`
- Pick a bounding box randomly
- Builds a mask from the bounding box
- Crop around the bounding box according to fixed size
`--data_online_creation_crop_size_{A,B}`
- Randomly pick and apply a positive or negative offset to the crop
size according to `--data_online_creation_crop_delta_{A,B}`. This
allows random variations around the fixed size of the crop.
- Randomly pick and apply a positive or negative offset to the mask
according to `--data_online_creation_mask_delta_{A,B}`. This step
allows for an object in domain A to roughly match the size of an
object in domain B. E.g. turning cars into buses requires an offset
on masks from the car domain so that the mask can fit a bus.

********************************************
Temporal Dataloaders and Options
********************************************

Temporal dataloaders read sequences of images. This is useful in two
cases:

- Temporal smoothing with GANs using a temporal discriminator
- Frame conditioning with DDPMs

The temporal dataloader applies the following steps:

- Uses the `--data_temporal_num_common_char` to sort the frames in the
dataset. The number of common characters defines the length of the
filename that should not be used for sorting. E.g. sorting files
named `image_xxxx` would use 6 as the number of common chars,
sorting based on xxxx only.
- Selects `--data_temporal_number_frames` interleaved with
`--data_temporal_frame_step` frames. This is useful to control the
temporal smoothing or conditioning independently from the video's
true frames per second.
211 changes: 109 additions & 102 deletions docs/source/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,122 +4,94 @@
Dataset formats
#################

JoliGEN supports datasets with and without labels. Labeled datasets
are useful because they allow for more fine-grained control of
generated images.

Broadly speaking, labels do help constrain the search space of all
possible combinations of generated pixels. For this reason labels are
sometimes refered to as semantic constraints.

For instance:

- **class labels** allow to ensure matching between input and output
image, e.g. turn Mario into Sonic while keeping the action (jump,
kneel, run, ..)

- **mask labels** allow conserve or modify only appropriate areas,
e.g. generate a car and conserve everything around it

JoliGEN allows to derive rectangular masks from **bounding boxes**,
and more precise masks automatically with SAM.

.. _datasets-unlabeled:

*******************
Unlabeled Datasets
*******************

To train a model on your own datasets, you need to create a data folder
with two subdirectories ``trainA`` and ``trainB`` that contain images
from domain A and B. You can test your model on your training set by
setting ``--phase train`` in ``test.py``. You can also create
subdirectories ``testA``` and ``testB`` if you have test data.

.. _datasets-labels:

**********************
Datasets with labels
**********************

Create ``trainA`` and ``trainB`` directories as described for CycleGAN
datasets. In ``trainA``, you have to separate your data into
directories, each directory belongs to a class.

.. _datasets-masks:

*********************
Datasets with masks
*********************

You can use a dataset made of images and their mask labels (it can be
segmentation or attention masks). To do so, you have to generate masks
which are pixel labels for the images. If you have n different classes,
pixel values in the mask have to be between 0 and n-1. You can specify
the number of classes with the flag ``--semantic_nclasses n``.

.. _datasets-example:

******************
Example Datasets
******************

.. _datasets-example-im2im-without-semantics:

Image to image without semantics
================================
Unlabeled dataset comes as a data folder with two subdirectories
``trainA`` and ``trainB`` that contain images from domain A and B
respectively.
Subdirectories ``testA`` and ``testB`` can be added for test data.

Example: horse to zebra from two sets of images

Dataset: https://joligen.com/datasets/horse2zebra.zip

.. code::
.. code-block:: bash
horse2zebra/
horse2zebra/trainA # horse images
horse2zebra/trainB # zebra images
horse2zebra/testA
horse2zebra/testB
horse2zebra/
horse2zebra/trainA # horse images
horse2zebra/trainB # zebra images
horse2zebra/testA
horse2zebra/testB
.. _datasets-example-im2im-with-class-semantics:

Image to image with class semantics
===================================

Example: font number conversion
.. _datasets-labels:

Dataset: https://joligen.com/datasets/mnist2USPS.zip
***************************
Datasets with class labels
***************************

.. code::
A class label is a label that hold for the full image.

mnist2USPS/
mnist2USPS/trainA
mnist2USPS/trainA/0 # images of number 0
mnist2USPS/trainA/1 # images of number 1
mnist2USPS/trainA/2 # images of number 2
...
mnist2USPS/trainB
mnist2USPS/trainB/0 # images of target number 0
mnist2USPS/trainB/1 # images of target number 1
mnist2USPS/trainB/2 # images of target number 2
Dataset with class label has ``trainA`` and ``trainB`` directories. In
``trainA``, every class comes as a separate directory that holds
images for this class.

.. _datasets-example-im2im-with-mask-semantics:
Example: font number conversion
Dataset: https://joligen.com/datasets/mnist2USPS.zip

Image to image with mask semantics
==================================
.. code-block:: bash
mnist2USPS/
mnist2USPS/trainA
mnist2USPS/trainA/0 # images of number 0
mnist2USPS/trainA/1 # images of number 1
mnist2USPS/trainA/2 # images of number 2
...
mnist2USPS/trainB
mnist2USPS/trainB/0 # images of target number 0
mnist2USPS/trainB/1 # images of target number 1
mnist2USPS/trainB/2 # images of target number 2
Example: Add glasses to a face without modifying the rest of the face
.. _datasets-bbox:

Dataset:
https://joligen.com/datasets/noglasses2glasses_ffhq_mini.zip
*****************************
Datasets with bounding boxes
*****************************

Full dataset:
https://joligen.com/datasets/noglasses2glasses_ffhq.zip
Bounding boxes are elements location in format

.. code::
noglasses2glasses_ffhq_mini
noglasses2glasses_ffhq_mini/trainA
noglasses2glasses_ffhq_mini/trainA/img
noglasses2glasses_ffhq_mini/trainA/img/0000.png # source image, e.g. face without glasses
...
noglasses2glasses_ffhq_mini/trainA/bbox
noglasses2glasses_ffhq_mini/trainA/bbox/0000.png # source mask, e.g. mask around eyes
...
noglasses2glasses_ffhq_mini/trainA/paths.txt # list of associated source / mask images
noglasses2glasses_ffhq_mini/trainB
noglasses2glasses_ffhq_mini/trainB/img
noglasses2glasses_ffhq_mini/trainB/img/0000.png # target image, e.g. face with glasses
...
noglasses2glasses_ffhq_mini/trainB/bbox
noglasses2glasses_ffhq_mini/trainB/bbox/0000.png # target mask, e.g. mask around glasses
...
noglasses2glasses_ffhq_mini/trainB/paths.txt # list of associated target / mask images
cls xmin ymin xmax ymax
.. _datasets-example-im2im-with-bbox-semantics:
where ``cls`` is an integer for the class, starting from 1.

Image to image with bounding box semantics
==========================================
Dataset with bounding boxes comes as a data folder with two subdirectories
``trainA`` and ``trainB`` that contain two subdirectories ``imgs`` and
``bbox``. In ``imgs`` the image files are stored, and ``bbox``
contains a .txt file per image, that lists the boxes for that image.

Example: Super Mario to Sonic while preserving the position and action,
e.g. crouch, jump, still, ...
Expand Down Expand Up @@ -173,10 +145,53 @@ in this order:
where ``cls`` is the class, in this dataset ``2`` means ``running``.

.. _datasets-example-im2im-with-bbox-class-semantics:

Image to image with multiple semantics: bounding box and class
==============================================================
.. _datasets-masks:

*********************
Datasets with masks
*********************

Dataset with mask labels contain the subdirectories ``trainA`` and
``trainB``, each with two subdirectories ``imgs`` and ``bbox``. In
``imgs`` are the image files. In ``masks`` are the mask files.
A mask file is a single channel (B&W) image with labels as pixel
values. For n different classes, pixel values in the mask have to be
between 0 and n-1. The number of classes needs to be specified at
training time with ``--f_s_semantic_nclasses n``.

Example: Add glasses to a face without modifying the rest of the face

Dataset:
https://joligen.com/datasets/noglasses2glasses_ffhq_mini.zip

Full dataset:
https://joligen.com/datasets/noglasses2glasses_ffhq.zip

.. code::
noglasses2glasses_ffhq_mini
noglasses2glasses_ffhq_mini/trainA
noglasses2glasses_ffhq_mini/trainA/img
noglasses2glasses_ffhq_mini/trainA/img/0000.png # source image, e.g. face without glasses
...
noglasses2glasses_ffhq_mini/trainA/bbox
noglasses2glasses_ffhq_mini/trainA/bbox/0000.png # source mask, e.g. mask around eyes
...
noglasses2glasses_ffhq_mini/trainA/paths.txt # list of associated source / mask images
noglasses2glasses_ffhq_mini/trainB
noglasses2glasses_ffhq_mini/trainB/img
noglasses2glasses_ffhq_mini/trainB/img/0000.png # target image, e.g. face with glasses
...
noglasses2glasses_ffhq_mini/trainB/bbox
noglasses2glasses_ffhq_mini/trainB/bbox/0000.png # target mask, e.g. mask around glasses
...
noglasses2glasses_ffhq_mini/trainB/paths.txt # list of associated target / mask images
*************************************************
Datasets with bounding box and image-level class
*************************************************

Example: Image seasonal modification while preserving objects with mask
(cars, pedestrians, ...) and overall image weather (snow, rain, clear,
Expand Down Expand Up @@ -208,11 +223,3 @@ https://joligen.com/datasets/daytime2dawn_dusk_lite.zip
in this order: ``source image path``, ``image class``, ``image mask``,
where ``image class`` in this dataset represents the weather class.

.. _datasets-example-im2im-with-other-semantics:

Other semantics
===============

Other semantics are possible, i.e. an algorithm that runs on both source
and target.

0 comments on commit dfe2343

Please sign in to comment.