-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dense Prediction API Design, Including Segmentation and Fully Convolutional Networks #6538
Comments
Depends; if you were to implement it as a subclass, which methods would be reused and which would have to be overridden?
Sure
Reading images from disk with |
Really interested in helping you! Maybe we should have a dedicated slack channel so we could all discuss. I had a Mean IOU implemented somewhere I'll try to find it! SSD Keras has some data augmentation for boxes. We could probably uses it. |
@Dref360 the semantic_segmentation slack channel would work. Bounding box design input would be great because I'm not currently using them. |
I would say that predicting a bounding box is a significantly different task from segmentation, in particular you may need a complicated loss function to handle many boxes. I'm also not sure if best practices are well established enough for this. For upscaling operations popular choices include:
Also pix2pix is a popular variant using adversarial training that would be nice to have as an example. There are several Keras implementations out there. For FCNs I've found base Keras to be pretty useable but one sticking point is that it's not easy to replace a fixed size model or Input layer to one that has None size for all the spatial dimensions, which is all you really need to have a FCN that allows multiple scale inputs. I think the best way to do this now is to create a new instance of the same model except for the Input layer and use get_weights + set_weights. It would be nice if there was a convenient way to just resize the model's input spatial dimensions and have it propagate to all layers, raising an error if it's not possible e.g., if there's a Dense layer. |
I'd be interested in contributing as well! However, keep in mind that there are a few subtasks within the segmentation problem and that makes the task harder. For example, all semantic segmentation networks, such as FCN, segnet, ENet, ICNet etc, do is pixel classification. They cannot detect objects and therefore can't differentiate between distinct instances of the same class in an image. Other works, such as DeepMask/SharpMask/FastMask, output mask proposals for each object they detect but they do not do classification. This means that in theory they can detect objects that belong in classes they have not seen before. Finally, Instance Segmentation does both (e.g. Instance-FCN, FCIS, Mask R-CNN). It can tell where a person ends and another begins and also outputs a class label for each instance it detects. Detection is an inherent part of the pipeline for two of the subtasks, so if we plan to cover all three cases, I don't think we can get away with not discussing it. |
@PavlosMelissinos good points, training on varied tasks like instance recognition and mask proposals should also be considered, what are the best practices for that type of data? How are they typically formatted? Masks are also sometimes useful for segmentation, such as the pascal voc "ambiguous regions". @Dref360 I thought about the bounding box issue some more and I agree with @allanzelener that the tools will be significantly different for bounding boxes. Unless there is a compelling reason I've missed to keep it here, I think bounding box algorithms should be considered out of scope for this issue and should be handled as a separate github issue. |
For segmentation training it will be important to support loading data data from a directory, and to support the most common dataset formats, which to my knowledge are the Pascal VOC format and the COCO json format. This post goes into loading from a directory in a reasonable way and including support for Pascal VOC.
Here is how SegDataGenerator works in Keras-FCN: seg_aug_generator = SegDataGenerator(
featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
channelwise_center=False,
rotation_range=0.,
width_shift_range=0.,
height_shift_range=0.,
shear_range=0.,
zoom_range=0.,
zoom_maintain_shape=True,
channel_shift_range=0.,
fill_mode='constant',
cval=0.,
label_cval=255,
crop_mode='none',
crop_size=(0, 0),
pad_size=None,
horizontal_flip=False,
vertical_flip=False,
rescale=None,
data_format='default')
generator = seg_aug_generator.flow_from_directory(
file_path, data_dir, data_suffix,
label_dir, label_suffix, classes,
ignore_label=255,
target_size=None, color_mode='rgb',
class_mode='sparse',
batch_size=32, shuffle=True, seed=None,
save_to_dir=None, save_prefix='', save_format='jpeg',
loss_shape=None)
model.fit_generator(generator=generator, ...)
# Some internal details for the directory iterator:
'''
Users need to ensure that all files exist.
Label images should be png images where pixel values represents class number.
find images -name *.jpg > images.txt
find labels -name *.png > labels.txt
for a file name 2011_002920.jpg, each row should contain 2011_002920
file_path: location of train.txt, or val.txt in PASCAL VOC2012 format,
listing image file path components without extension
data_dir: location of image files referred to by file in file_path
label_dir: location of label files
data_suffix: image file extension, such as `.jpg` or `.png`
label_suffix: label file suffix, such as `.png`, or `.npy`
loss_shape: shape to use when applying loss function to the label data
''' I think much of this functionality can be added directly to
|
@ahundt as a longtime Keras user that is now figuring out my way through multiclass semantic segmentation with sample weighting plus data augmentation via the What I can tell you is the #2971 discusses some problems with ordered lists being needed for My impression is these hardships are more due to the design not being for my case use, but given that it is one that seems quite popular, it would be beneficial to expose/redesign the API appropriately. |
@mptorr if you're using ImageDataGenerator, I believe the mismatches are due to each object generating random numbers separately, and the workaround is to provide the same random seed to each so they access indices in the same order. SegDataGenerator resolves this by accepting image and label dirs in a single object. |
@ahundt thanks for the suggestion—in fact I am using a fixed and identical seed for both generators, but still get the error. Anyway, I don't want to hijack this thread with my travails... at some point I hope to figure this out. I was going to try your SegDataGenerator however wanted to ask 2 things about it, as they may pertain to your request for features/suggestions: [1] it appears it currently does not support pixelwise weighting to compensate for class imbalance. This would be an important feature to me, as most of my segmentation tasks will have disproportionately over/under-represented classes. Currently I balance classes using Keras' [2] I'm a bit confused on how SegDataGenerator loads images. The comment in the class perhaps could be reworded (or have examples) for the most important arguments. I also didn't understand how to use this info: I'll be glad to give it a spin, especially if there's an option for sample weighting. Glad to continue this conversation elsewhere if more appropriate than on this thread. |
I'm sorry for taking this long to comment but I just found the time to do so and I think there's too much stuff to discuss here. Should we split the issue into multiple threads maybe? I recognize the following parts of the pipeline as separate entities regarding standardization and support for different implementations: PreprocessingImho, this is the stinkiest part of the pipeline and usually goes like this in most projects: Semantic segmentation
Mask proposal networks / instance segmentation
I believe semantic segmentation and anything that deals with bounding box should be considered separate tasks and be built upon gradually. Semantic segmentation is relatively simple, therefore maybe let's consider that first but acknowledge that it only covers a part of the wider task. Object detection networks are not yet standardized on keras, so we probably should take it one step at a time. ResizingI'm taking the initiative to start with one term that is ambiguous, resizing. I'm not sure what the proper terminology is for some of this stuff, so please bear with me: Resizing can either be achieved through stretching (with pixel interpolation), padding (e.g. with zeros) or cropping. Padding gives the worst results as it messes up somewhat with the statistics of the image and wastes network capacity at the same time. Cropping, on the other hand, may remove too much context from the image, which is also undesirable. Furthermore, on prediction, using crops means that only part of the image area will be seen by the network in each pass. Therefore multiple passes over the image are required in order to cover the whole area. In general it seems obvious to me that some kind of stretching is necessary. However, it is problematic when used alone in the case of multi-label, one hot targets (most popular option in segmentation datasets, e.g. MS-COCO). An easy solution would be to convert each one hot vector to a class index vector, then to PIL.Image (or equivalent), do the resize there and then convert back to one hot and feed that into the network. This however forces the selection of a single label for each pixel. Is this an important issue or should we safely assume that it's due to labeling error (annotations are not exact)? multiscale trainingThis is also an important feature since CNNs are not completely scale invariant. YOLOv2 for instance, in order to be able to learn to detect objects at various scales changes the shape of the input every few batches. In keras, this is not exactly easy. I think tensorflow only allows one dimension of the input to be unspecified (None), so this might not be keras' fault. I have no idea whether it works with theano as a backend. Data generationAs far as data loading goes, I suggest that some variant of the MSCOCO class I have created for the enet-keras repository be used. It definitely needs quite some cleaning up and unit tests of course as it's a bit clumsy right now but I believe the set of operations is valid. Any kind of feedback is welcome obviously. The logic of the class could be standardized (I have added a dummy Dataset class which I will populate as soon as I'm a little bit more confident about the layout) and easily extended in order to allow custom datasets and/or loading from disk. |
@allanzelener @ahundt I think there's a messy workaround for that using Permute and a TimeDistributed wrapper but it's not exactly a solution.
@ahundt Can you explain what you mean here by "supplementary data" and what the use case is? I don't quite get it. For the zipping part maybe you're looking for this? EDIT: It's not a big deal though, why not just write a function that calls next for both generators and yields the pairs in a tuple? |
Options for input data to
I'm leaning towards option 1 because it would maximize compatibility with the existing @PavlosMelissinos thanks for the feedback, replies below.
My use case is a vector that represents how a robot arm in the scene will move and an image of that robot. So the input data is an image and a vector, while the labels is a 2D image containing scores of how successful the motions will be if they are relative to that x,y coordinate in the image. Another example would be input text and an image. Ex: "the person on the right" and an image of two people side by side. The labeled data would be the same dimensions as the original image right person's pixels labeled as 1 and all other pixels labeled as 0.
Sounds like a reasonable possibility. How would performing or not performing zoom/translation be specified for each input?
Padding definitely requires extra memory and processing power, but are the results really that bad? I think it might depend on the network design. Resnet specifies zero padding and is particularly effective, for example.
We should support each of these modes because each makes sense for a variety of reasonable applications.
How about the |
Just a thought. If you want to handle every case on earth ( Anyway, the Pytorch way to do data augmentation sounds pretty cool with transform.compose |
@Dref360 @ahundt
I was referring to padding in the context of preprocessing (where it takes up a sizable portion of the input image), it's my mistake for not making that clear. Zero padding within a CNN is not that bad (still skews statistics but it's not so big a deal and we don't really have a viable alternative).
Say the user has an image/label pair that is originally 486px in height and 220 in width; the shape of the input tensor is (None, 256, 256, 3) for the image and (None, 65536, 81) for the label. How does SegDataGenerator deal with the conversion? Labels (one-hot) are tricky to resize in this case because numpy arrays do not properly support the operation (scipy has ndimage.zoom though, might be worth a shot), label 'bleeding' among non-spatial dimensions should not be allowed and NEAREST interpolation mode returns very weird and pixelated ground truth masks. I think I'm in favor of using some presets (e.g. instance segmentation needs each sample to be a pair; a crop within a ground truth bounding box and the binary mask of that object) and leaving the rest up to the user. |
I'd hardly suggest every case on earth, haha. It is very reasonable to let a user select from both the sets
Keras already supports those cases listed above for simple label prediction.
That led me to an interesting idea, rather than the sequential model style of pytorch's That said, selling a major API change is much more difficult than a minor extension of |
@PavlosMelissinos Could you elaborate on this?
This is one of the key changes I'm hoping we can make, where 2D labels are directly supported, in other words the label would be the same dimensions as the input data. SegDataGenerator image/label transform code: x = apply_transform(x, transform_matrix, img_channel_index,
fill_mode=self.fill_mode, cval=self.cval)
y = apply_transform(y, transform_matrix, img_channel_index,
fill_mode='constant', cval=self.label_cval) Remember that labels cannot and should not be interpolated! Average of labels 1 and 3 is not the label 2. :-) You have to pick from 1 or 3 so while it isn't as smooth you've got to use an algorithm like nearest. |
From my experience, the problem in the arbitrary input shape scenario in a Fully Convolutional Network (no Dense layers) is at the end of the network, when you need to Flatten the output and compare it to the targets. I'm not confident that hack would work (it was actually suggested by a colleague as a temporary workaround), so I'll reproduce it tomorrow at work and get back to you.
That's not a problem, after all reshaping is trivial.
That's the actual problem (it's noticeably less smooth with nearest neighbor). Maybe there is a better solution? EDIT: In semantic segmentation there is a direct association between a rgb pixel and the ground truth label pixel at the same position. If the annotation is done in a specific size and then that image is resized, there is information distortion because the pixels are moved and some unseen values might appear (especially in the case of bilinear, bicubic or lanczos antialiasing). I guess what I'm saying is that the pixel values of the resized target labels should be dependent on the values of the pixels in the rgb image and more specifically on the way the value of each pixel in the resized rgb image was produced from the original. Does that make sense? |
I can think of two sensible ways to handle this.
First approach is O(unique labels in image) and second approach is O(connected components). |
@allanzelener Both nice ideas, especially the second one! |
@allanzelener That's what I do, I rescale the polygons and then use OpenCV to draw the rescaled polygons. Works great and fast. |
Here is my idea for a def generate_samples_from_disk(sample_sets, callbacks=load_image, batch_size=1, data_dirs=None):
"""Generate numpy arrays from files on disk in groups, such as single images or pairs of images.
# Arguments
sample_sets: A list of lists, each containing the data's filenames such as [['img1.jpg', 'img2.jpg'], ['label1.png', 'label2.png']].
Also supports a list of txt files, each containing the list of filenames in each set such as ['images.txt', 'labels.txt'].
If None, all images in the folders specified in data_dirs are loaded in lexicographic order.
callbacks: One callback that loads data from the specified file path into a numpy array, `load_image` by default.
Either a single callback should be specified or a callback must be provided for each sample set, and must be the same length as sample_sets.
data_dirs: Directory or list of directories to load.
Default None means each entry in sample_sets contains the full path to each file.
Specifying a directory means filenames sample_sets can be found in that directory.
Specifying a list of directories means each sample set is in that separate directory, and must be the same length as sample_sets.
batch_size: number of samples in a batch
# Returns
Yields batch_size data points in each list provided.
""" To do that I believe the python unpack mechanism would be the thing to use, but otherwise the implementation shouldn't be too complicated. It should also be set up so it can work with PASCAL VOC easily and cleanly. Example usage with layout as downloaded by #6665: # pascal voc + berkeley semantic contours annotations
train_file_path = os.path.expanduser('~/.keras/datasets/VOC2012/combined_imageset_train.txt') #Data/VOClarge/VOC2012/ImageSets/Segmentation
val_file_path = os.path.expanduser('~/.keras/datasets/VOC2012/combined_imageset_val.txt')
data_dir = os.path.expanduser('~/.keras/datasets/VOC2012/VOCdevkit/VOC2012/JPEGImages')
label_dir = os.path.expanduser('~/.keras/datasets/VOC2012/combined_annotations')
def open_png(path):
path = path + '.png'
# ... open and return 1 channel uint8 numpy array ...
def open_jpg(path):
path = path + '.jpg'
# ... open and return 3 channel uint8 numpy array ...
seg_gen = generate_samples_from_disk([train_file_path, train_file_path],
callbacks=[open_jpg, open_png],
data_dirs=[data_dir, label_dir])
# now apply augmentation then fit Any thoughts or details that are missing, perhaps how it would work with multiple input and label files per sample? |
#6538 (comment) @allanzelener sounds like a nice approach, could you suggest an API design or have any reference code?
@Dref360 Do you have a link or is that private? I'm guessing OpenCV won't be permitted as a new dependency, there is a lot of baggage and dramatic version differences across OSes, and I haven't seen an API that's clean the way Keras is. |
Okay it looks like dealing with However, some indicator, member variable, or parameter may need to be carried so the difference between one_hot data and dense segmentation labels can be accounted for. Additional investigation needed on that front. |
What about adding a parameter to all the relevant layers and other APIs which either:
This could disambiguate the purpose of each data segment, and it could work in a manner analogous to a |
Here is iteration 3.0 of this idea. I think this generalizes better to other non-segmentation problems. This is in addition to the extended segmentation data generator/augmentation, not instead of it. Comments are welcome! data_spec list parameter for layersWhat do you think of a Example of data ambiguity2D classes with dense prediction vs depth in a 3D CNN with single class prediction.
|
Hi, I'm experimenting with Keras implementations of Yolo and SSD (https://github.com/lhk/object_detection). For augmentation, the papers on object detection use variations of crops and color changes. I've implemented a basic prototype for automatic augmentation of images with bounding boxes: https://github.com/lhk/bbox_augmentations/blob/master/showcase.ipynb This integrates nicely with Keras, I've actually used parts of your image preprocessing pipeline. Reimplementing the current I would very much like to work on this. Could you point me in the right direction to get started ? For example, I could try to recreate the current infrastructure of generators for the new annotated data type. Would that be useful ? |
It would be an awesome addition. Algorithm-wise, zooms and shifts are straightforward, but the way you do rotations is wrong in principle. For example, if you rotate a circle around its center, the bounding box doesn't change, while with your approach it does change. Although for small rotations, and if bounding boxes weren't all that tight to begin with, it wouldn't matter. In this case, bounding boxes should be jittered anyway, and the sampling can take the original+rotated into account? |
@lhk I'd suggest starting with the SegDataGenerator class in Keras-FCN, and create a pull request for the official keras-contrib repository that trains on pascal voc, a dataset already in keras-contrib. If you want to go that route you should also be aware of this PR which has some first steps (but also bugs in the example at the time of writing): keras-team/keras-contrib#152 |
To add some other resources: |
Thanks! I've been slowly integrating some functionality into github.com/keras-team/keras-contrib as well, there are several open pull requests. |
@ahundt, I am interested to help you in reinforcement learning with OpenAI gym. Please let me know, how should I proceed. |
@Luffy1996 this issue is about image segmentation rather than RL so I'll message separately. |
I'll close this issue for now since this thread didn't have any updates for quite a while. Please open another one if necessary. |
Dense Prediction API Design, Including Segmentation and Fully Convolutional Networks
This issue is to develop an API design for dense prediction tasks such as Segmentation, which includes Fully Convolutional Networks (FCN), and was based on the discussion at #5228 (comment). The goal is to ensure Keras incorporates best practices by default for this sort of problem. Community input, volunteers, and implementations will be very welcome. #6655 is where preprocessing layers can be discussed.
Motivating Tasks and Datasets
Reference Materials
Feature Requests
These are ideas rather than a finalized proposal so input is welcome!
Existing Keras Utilities with compatible license
Questions
The text was updated successfully, but these errors were encountered: