Skip to content
This repository has been archived by the owner on Aug 9, 2024. It is now read-only.

Keras-Preprocessing Redesign #10

Merged
merged 11 commits into from
Sep 23, 2019
Merged

Keras-Preprocessing Redesign #10

merged 11 commits into from
Sep 23, 2019

Conversation

fchollet
Copy link
Contributor

Keras-Preprocessing Redesign

Comment period open until September 13th, 2019.

Status Proposed
Author(s) Francois Chollet ([email protected]), Frederic Branchaud-Charron ([email protected])
Updated 2019-08-23

Context

tf.data.Dataset is the main API for data loading and preprocessing in TensorFLow. It has two advantages:

  • It supports GPU prefetching
  • It supports distribution via the Distribution Strategies API

Meanwhile, keras.preprocessing is a major API for data loading and preprocessing in Keras. It is based
on Numpy and Scipy, and it produces instances of the keras.utils.Sequence class, which are finite-length,
resettable Python generators that yield batches of data.

Some features of keras.preprocessing are highly useful and don't have straightforward equivalents in tf.data
(in particular image data augmentation and dynamic time series iteration).

Ideally, the utilities in keras.preprocessing should be made compatible with tf.data.
This presents the opportunity to improve on the existing API. In particular we don't have good support
for image segmentation use cases today.

Some features are also being supplanted by preprocessing layers, in particular text processing.
As a result we may want move the current API to an API similar to Layers.

Goals

  • Unify "keras.preprocessing" and the recently-introduced Preprocessing Layers API.
  • Make all features of keras.preprocessing compatible with tf.data.
  • As a by-product, add required ops to TensorFlow (tf.image).

@lobrien
Copy link

lobrien commented Aug 23, 2019

If I have a target polygon in the source image, how would I get the corresponding area in transformed coordinates corresponding to a sequence of Random{transform} calls? Does the ImagePipeline object maintain / expose the accumulated transform? If I were training object detection and started with source_image and source_bounding_box, wouldn't I need the transform (or recreate it) to get bounding_box_after_random_augmentation?

RandomZoom([0., 0.2]),
CenterCrop(height, width),
])
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this allow injecting custom augmentation? For example, suppose I want to apply Gaussian Blur or channel wise contrast, can I inject that into it directly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you just need to create your own layer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a single layer can have multiple transformations inside the call which can be defined by other libraries like imguag or albumnetation, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Custom layers offer you great flexibility to implement your own transformations. But note that all transformations should be defined using TF ops if you want performance. Otherwise you'd have to step out of graph execution (it's still technically doable, though).

])

input_pipeline = ImagePipeline([
augmenter,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to take a lot of care with the random augmentation layers to make sure the syncing works correctly. They would need to be deterministic & 're-settable' in some fashion that's built-in to the API & that the ImagePipeline/from_directory apis would take advantage of.

Copy link

@tomerk tomerk Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to make matters trickier now that I think about it, this would also need to work well w/ parallel processing in the dataset (& possibly w/ distribution strategies active). It may be worth taking a page out of Jax's book as inspiration? https://github.com/google/jax/blob/master/design_notes/prng.md

Or using some of the mechanisms from Peng's RFC for random numbers in tf 2.0:
https://github.com/tensorflow/community/pull/38/files?short_path=b84a5ce#diff-b84a5ce018def5de3e1396b9962feff1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we should add a public API to control the seeding behavior, besides the seed argument in the constructor.

Would you have a specific API in mind? (in terms of methods and their signature)

Copy link

@tomerk tomerk Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think the safest thing would be to make random augmentation layers only use stateless random ops and support taking a seed argument directly in the call methods. The constructor-time seed argument would then be renamed initial_seed. If no seed argument is provided at call-time, the seed generated from initial_seed would be used.

Otherwise, the layer's initial_seed would be combined with the seed passed in to make sure different layer objects act differently when passed the same seed (while a layer object shared in different models would act the same way in both when passed the same seed).

So, the implementation & API for RandomFlip would look something like:

class RandomFlip(PreprocessingLayer):
   def __init__(self, horizontal=False, vertical=False, initial_seed=None):
    self.horizontal = horizontal
    self.vertical = vertical
    self.initial_seed = initial_seed
    self._initial_seed_or_random =  initial_seed or random_value()
    self._current_seed = self._initial_seed_or_random
   def call(self, inputs, training=None, seed=None):
    if seed is None:
      seed = self._current_seed
      self._current_seed += 1
    else:
      seed = seed + self._initial_seed_or_random

    if training:
      if self.horizontal:
        inputs = tf.image.random_flip_left_right(inputs, seed=seed)
      if self.vertical:
        inputs = tf.image.random_flip_up_down(inputs, seed=seed)
    return inputs

We'll run into similar challenges as with the training argument of layers & models where we have to feed it through nested models & layers to avoid bugs. We can use a similar mechanism in __call__ to solve the problem.

This sort of deterministic randomness could be generally useful for random models & layers in Keras beyond just for random augmentations & preprocessing layers (e.g. for dropout layers)


The from_directory & input methods would continue to take an optional seed argument. They could either provide the dataset tuple index as a seed in the 'call' method of ImagePIpeline, or use some sort of 'tf.function & distribution strategy-friendly' version of tf.random.experimental.Generator to generate the seeds for the layers. We can check with Peng who's been working on it to see what the status there is.

The random layers should also use tf.random.experimental.Generator or something similar to maintain their internal seeds rather than using raw python in the form of self.seed = self.seed + 1.
That way they will work correctly w/ tf.function & distribution strategies.

A note on retracing tf.function: If we want to wrap the call method in a tf.function like we do for saved_models, we will need to take care to represent the seeds passed into call as scalar tensors rather than just python objects. Otherwise the function will get retraced for each seed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to these suggestions, especially having seed as a call argument. Note that we don't need to change the name of the constructor argument; seed is fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the workflow for an image-segmentation pipeline would be:

Create two identical ImagePipelines, with the same seed, and run them on both inputs? I guess the alternatives would be:

  • Make these work on nests of images
  • Or have a .reapply method?

So if someone wanted to use this for object detection then you could build a BBox equivalent of each layer, and write a converter that makes a new Sequential pipeline with all the BBox layers substituted in, using the same seed. Would that be a reasonable approach to using this for object_detection?

@Dref360
Copy link

Dref360 commented Aug 26, 2019

@lobrien You would have a matching set of Layers for BBoxes and they would get the same seed.


#### Constructor

`ImagePipeline` inherits from `keras.model.Sequential` and takes a list of layers as inputs. In the future it will inherit from `PreprocessingStage`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not build out the basic preproc stage first? Inheriting from Sequential directly seems dangerous, as we will get a bunch of attrs/methods that people will accidentally start to depend on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. We could simply inherit from PreprocessingLayer only and manually add Sequential-like features instead of inheriting from Sequential.


`ImagePipeline` inherits from `keras.model.Sequential` and takes a list of layers as inputs. In the future it will inherit from `PreprocessingStage`.

`ImagePipeline` is a preprocessing layer that encapsulate a series of image transformations. Since some of these transformations may be trained (featurewise normalization), it exposes the method `adapt`, like all other preprocessing layers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a preproc layer, right? A preprocessing stage? Do layers implement adapt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had the same question, i think @fchollet is referring to preprocessing stage here.

Layers do not implement adapt, this was a concept introduced in the preprocessing layers design and is similar to fit. adapt is the API used to train preprocessing layers. The name fit was not reused since the data used for training is different between this API and fit. Also i think adapt was originally called update in the processing layers design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a preprocessing stage (so by extension it is a preprocessing layer, since preprocessing stage will subclass PreprocessingLayer).

I describe it as a preprocessing layer specifically because it is likely that PreprocessingStage will not yet exist when we ship the initial version of this API, hence ImagePipeline would subclass PreprocessingLayer in its first iteration.

The method name adapt was the consensus result of the preprocessing layer API design review meeting (not great, but we have to settle on something).

#### Methods

```python
def from_directory(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify-- these are now methods of an instantiated ImagePipeline, not standalones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are instance methods, not standalone functions.


- We are dropping support for ZCA whitening as it is no longer popular in the computer vision community.
- We don't have immediate support for random translations along only one axis.
- We only plan on implementing support for `data_format='channels_last'`. As such this argument does not appear in the API.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Does this match the expectations of accelerator users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that NVIDIA is moving towards native support for channels_last, removing the need to convert to channels_first for performance. I omitted the argument for the sake of simplicity, but we can always add support if the need arises.

RandomContrast(amplitude=0., seed=None)
RandomSaturation(amplitude=0., seed=None)
RandomWidth(amplitude=0., seed=None) # Expand / shrink width while distorting aspect ratio
RandomHeight(amplitude=0., seed=None) # Expand / shrink height while distorting aspect ratio
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: amplitude is a weird parameter on some of these-- eg, what is the amplitude of a rotation, or a width resize? As these are separate layers whose params will diverge over time, does it make sense to use the "right" words here rather than biasing towards the same words?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API consistency is important to reduce cognitive load and minimize surprises / looking up the docs. Is there a better universal word we could use in this context?

Note: this is also the reason why we use the keyword "kernel" throughout Keras even in places where it doesn't exactly apply.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and easy to use, but I am a little concerned that there's no apparent way to specify these in absolute units: pixels, radians,... etc.

Exposing some way to skip the relative units would make it easier to build piplelines specified in absolute units without spending time questioning things like:

  • What the relative units are for each? How they're interpreted?
  • How does multi-scale training, or images with a different input range affect the pipeline?

If larger than 1, it is rounded to one for the lower boundary (but not the higher boundary).

For random zoom this comes out a little strange. If I want a uniform scale in [1/2, 2] I can set amplitude=[1/2, 1]. But, IIUC, the random part here is linear so 1/3 of images are shrunk, and 2/3 are expanded. A log-scale option for random zoom would be nice to have.


```python
Resizing(height, width) # Resize while distorting aspect ratio
CenterCrop(height, width) # Resize without distorting aspect ratio
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there value in a RandomCrop? Or just Crop, with center v random parameterized? IIRC, random cropping is part of some imagenet pipelines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding RandomCrop is a good idea. It is technically equivalent to a combination of RandomTranslation and CenterCrop, but it is a useful shortcut.

@fchollet
Copy link
Contributor Author

The only outstanding question at this time is what we should call what is currently named "amplitude".

Due to not having any major issue to resolve with this design, we will be skipping the design review and use some time during the Keras SIG public meeting this Friday to officialize the design.

Please make sure to raise any issue/question on this doc before then.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants