Add SSD in models #440

lemairecarl · 2018-03-06T15:41:32Z

I've been working with PyTorch for several months, and with SSD for a few months. I'd like to add SSD to torchvision's "model zoo".

I will combine the good parts of https://github.com/amdegroot/ssd.pytorch and https://github.com/kuangliu/torchcv. Both implementation have some problems and refactoring will be needed to come to the level of refinement expected from torchvision.

I will begin in the coming weeks if there's no opposition.
@fmassa

fmassa · 2018-03-06T16:01:45Z

Hi,

Thanks!
Yes, having training / evaluation code is definitely necessary for object detection, as the models are not enough.

I'm still figuring out the right balance between having things in torchvision and in external repos. I think everything that is quite generic and reusable should come here.
So if the training code doesn't live here, then I'm unsure if the models should be here as well or in the repo where the training code lives. For imagenet classification, all the models that we have in torchvision (with only a few exceptions) were trained using examples/imagenet.

One thing that needs to be improved is support for other data types than images (like bounding boxes). We've addressed that with the functional interface in some way, but we are still missing a good story on how to tie things together.

What do you think? If you find time to start working on SSD, it might be good to list here your proposed action points so we can discuss.

lemairecarl · 2018-03-06T16:16:56Z

I would add a COCO/VOC example that would contain the training procedure, just like for classification nets. I think it's a good practice. The code will be more modular than the imagenet example.
Could you tell me a bit more how you added support for bounding boxes? And how far would you like to go with bounding box support? Do you mean add utils like Jaccard overlap and that kind of stuff? Or maybe you're thinking about transforms? We can start by having the bounding box utils with the example code, and then decide which parts we want to integrate to pytorch or torchvision.

TODO

Add COCO/VOC example (in pytorch/examples). Things will be modular, e.g. a VisdomHelper module. I already have good training/evaluation scripts
Make sure that the results are satisfying
Move the model to torchvision/models
Move the dataset code to torchvision/datasets
Move the data augmentation code to torchvision/transforms
Upload weights trained on COCO/VOC

fmassa · 2018-03-06T16:36:04Z

The thing I noticed while writing an initial version of Faster R-CNN in 2016 was that it requires a lot of code. This made me unsure if the examples repo was the good place to put it.

About bounding box, I was mostly thinking about basic transforms, but other functionality (like intersection over union) might be worth considering.

lemairecarl · 2018-03-06T16:49:24Z

There is clearly code that is not part of the model but that is needed to use it. With SSD there is the box encoding/decoding procedure which is specific to this model, and that is quite heavy. There is the prior box management, the non-max suppression in the decoding... Do you think we could have a ssd folder in torchvision/models instead of only one script?

Something like:

ssd/
    ssd.py              model, layers
    box_coder.py        bounding box encoding/decoding using prior boxes
    multibox_loss.py
    utils.py            some parts may later be integrated in the library

I think that's reasonable.

amdegroot · 2018-03-07T20:56:18Z

Hi guys,
Just read #425. I think it would be awesome if segmentation and detection could be included in torchvision.

In ssd.pytorch I've tried to avoid modifying the current torchvision datasets (COCO, Pascal) and transformations as much as possible, but I think there would still need to be some slight modifications made to the current torchvision code to support detection, which I would be more than happy to help with, if we end up deciding on this route.

That being said, I agree Francisco that most detection implementations out there are a little heavier and would potentially require a lot more to support (e.g. yolo, faster-rcnn), so I think it makes sense to consider whether or not a torchvision "detection module" could be made extensible to more than just SSD before we jump into it.

vfdev-5 · 2018-03-14T15:47:29Z

@lemairecarl what are your thoughts on how to

Move the data augmentation code to torchvision/transforms

Do you want to merge the current transforms.py with ssd.pytorch augmentations or create a separate a file with similar classes like Compose, RandomCrop etc ?

lemairecarl · 2018-03-14T19:01:41Z

I think we'll see in the process if we need to break things into multiple files. I'll rely on the pull request comments. I want to start working on this next week.

fmassa · 2018-03-14T19:10:48Z

Yes. I'm still thinking on how to integrate the transforms in a way that it fits nicely with torchvision, while not requiring much boilerplate and being generic.

vfdev-5 · 2018-03-15T13:36:36Z

We can inspire from tensorpack.

For example, there is a proxy dataset that applies transformations on a dataset according to the indices:

class XYTransformedDataset(Dataset):

    def __init__(self, dataset, transformations, img_index=(0, 1), coords_index=(2, )):
        self.ds = dataset
        self.transformations = transformations
        self.img_index = img_index
        self.coords_index = coords_index

    def __getitem__(self, index):

        dp = self.ds[index]  # For example, dp = (im, mask, polygons, labels)
        output_dp = list(dp)
        # HERE WE NEED TO PASS input image AS PARAMETER
        params = self.transformations.get_params(dp[0])

        # Transform images:
        for idx in self.img_index:
            output_dp[idx] = self.transformations(dp[idx], params)

        # Transform coords:
        for idx in self.coords_index:
            output_dp[idx] = self.transformations.transform_coords(dp[idx], params)
        return output_dp

A base class for transformations:

class BaseRandomTransformation:

    def get_params(self):
        return None

    def __call__(self, img, params=None):
        raise NotImplementedError()

    def transform_coords(self, coords, params):
        raise NotImplementedError()

such that all other classes from torchvision derive from it. For example, Compose

class Compose(BaseRandomTransformation):

    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, img, params=None):
        for t in self.transforms:
            img = t(img, params=params)
        return img

    def transform_coords(self, coords, params):        
        for t, p in zip(self.transforms, params):
            coords = t.transform_coords(coords, params=p)
        return coords

    def get_params(self, img):
        return [t.get_params(img) for t in self.transforms]

and another example:

class RandomCrop(BaseRandomTransformation):

    def __init__(self, size, padding=0):
        # Same as in torchvision.RandomCrop
        pass

    def get_params(self, img):
        return _get_params(img, self.size):

    @staticmethod
    def _get_params(img, output_size):
        # Same as in torchvision.RandomCrop
        w, h = img.size
        th, tw = output_size
        if w == tw and h == th:
            return 0, 0, h, w

        i = random.randint(0, h - th)
        j = random.randint(0, w - tw)
        return i, j, th, tw

    def __call__(self, img, params=None):
        if self.padding > 0:
            img = F.pad(img, self.padding)

        if params is None:
            params = self._get_params(img, self.size)
        
        i, j, h, w = params

        return F.crop(img, i, j, h, w)

    def transform_coords(self, coords, params):    
        i, j, h, w = params
        return F.crop_coords(coords, i, j, h, w)

Here, the problem is that some transformation parameters can not be generated without the input image.

vfdev-5 · 2018-04-24T14:14:22Z

fmassa commented on 14 Mar
Yes. I'm still thinking on how to integrate the transforms in a way that it fits nicely with torchvision, while not requiring much boilerplate and being generic.

@fmassa any updates on this ?

fmassa · 2018-04-24T14:31:02Z

@vfdev-5 Yes, I have some proof-of-concept implementations.

I'm holding on on making a PR yet because I want to see how well they fit the object detection framework I'm writing (I have Fast R-CNN, Faster R-CNN, and FPN working for training and evaluation, I'm now implementing Mask R-CNN).

If I'm happy with how they mix with the framework, I'll be pushing them as is to torchvision.

vfdev-5 · 2018-04-24T14:35:12Z

@fmassa that's super!
For impatient people like me, can we take a look on your work to make an idea, please, if it is opensourced somewhere ?

fmassa · 2018-04-24T14:37:12Z

It's not yet open-source, but it will be open-sourced. Stay tuned!

devforfu · 2018-09-19T06:40:22Z

@vfdev-5 One can find an SSD implementation in the fast.ai course lectures also. However, it is a bit hidden under the hood of author's wrappers.

@fmassa Is your library something similar to the Detectron?

fmassa · 2018-09-21T11:58:23Z

@devforfu yes, it is going to be similar to Detectron.

varunagrawal · 2018-10-15T17:35:29Z

@fmassa I've built a generic bounding box library for both 2D and 3D bounding boxes. I need to get some legal stuff taken care of before I can release it, but I believe it has everything you would need for object detection in general (including IoU computation).

fmassa · 2018-10-16T11:36:14Z

@varunagrawal we will be releasing in one week a library for object detection which will contain bounding box abstractions, and once it gets a bit more mature we might move it to torchvision.

mattans · 2018-12-04T15:51:46Z

Hi @fmassa, was it released?

fmassa · 2018-12-06T12:48:00Z

@mattans yes, check it out in https://github.com/facebookresearch/maskrcnn-benchmark/

lemairecarl · 2018-12-06T21:11:25Z

I'm closing this for now, since I have moved on to other projects. I might come back to it later.

tczhangzhi · 2019-09-15T15:14:04Z

I look forward to this project going on, as I always love torchvision's elegant implementation.
But for a NewBee, I think torch.hub works well:

import torch 
precision = 'fp32'
ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', model_math=precision)

Summary: Pull Request resolved: pytorch/kineto#440 Fixing windows build for `torchvision` and `kineto`. Differential Revision: D31295975 fbshipit-source-id: a2049218f46beb46bbaeb0a3b39d7633e695a799

Summary: Pull Request resolved: pytorch/kineto#440 Fixing windows build for `torchvision`. In `csrc/vision.cpp`, since `PyMODINIT_FUNC` depends on `Python.h` I added the same condition for `PyMODINIT_FUNC` as the one for `import <PyTorch.h>`. Differential Revision: D31488734 fbshipit-source-id: 0ca13c7d8de81f27eb63d3f7e54f8777128312c7

fmassa added the enhancement label Mar 6, 2018

lemairecarl mentioned this issue Mar 6, 2018

[RFC] Model for segmentation/detection problem #425

Closed

fmassa mentioned this issue Aug 3, 2018

[enhancement] computer vision and bounding box detection #565

Closed

lemairecarl closed this as completed Dec 6, 2018

datumbox reopened this Apr 22, 2021

datumbox mentioned this issue Apr 22, 2021

Add SSD architecture with VGG16 backbone #3403

Merged

datumbox closed this as completed in #3403 Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SSD in models #440

Add SSD in models #440

lemairecarl commented Mar 6, 2018

fmassa commented Mar 6, 2018

lemairecarl commented Mar 6, 2018 •

edited by z-a-f

Loading

fmassa commented Mar 6, 2018

lemairecarl commented Mar 6, 2018

amdegroot commented Mar 7, 2018

vfdev-5 commented Mar 14, 2018

lemairecarl commented Mar 14, 2018

fmassa commented Mar 14, 2018

vfdev-5 commented Mar 15, 2018 •

edited

Loading

vfdev-5 commented Apr 24, 2018 •

edited

Loading

fmassa commented Apr 24, 2018

vfdev-5 commented Apr 24, 2018 •

edited

Loading

fmassa commented Apr 24, 2018

devforfu commented Sep 19, 2018 •

edited

Loading

fmassa commented Sep 21, 2018

varunagrawal commented Oct 15, 2018

fmassa commented Oct 16, 2018 •

edited

Loading

mattans commented Dec 4, 2018

fmassa commented Dec 6, 2018

lemairecarl commented Dec 6, 2018

tczhangzhi commented Sep 15, 2019

Add SSD in models #440

Add SSD in models #440

Comments

lemairecarl commented Mar 6, 2018

fmassa commented Mar 6, 2018

lemairecarl commented Mar 6, 2018 • edited by z-a-f Loading

TODO

fmassa commented Mar 6, 2018

lemairecarl commented Mar 6, 2018

amdegroot commented Mar 7, 2018

vfdev-5 commented Mar 14, 2018

lemairecarl commented Mar 14, 2018

fmassa commented Mar 14, 2018

vfdev-5 commented Mar 15, 2018 • edited Loading

vfdev-5 commented Apr 24, 2018 • edited Loading

fmassa commented Apr 24, 2018

vfdev-5 commented Apr 24, 2018 • edited Loading

fmassa commented Apr 24, 2018

devforfu commented Sep 19, 2018 • edited Loading

fmassa commented Sep 21, 2018

varunagrawal commented Oct 15, 2018

fmassa commented Oct 16, 2018 • edited Loading

mattans commented Dec 4, 2018

fmassa commented Dec 6, 2018

lemairecarl commented Dec 6, 2018

tczhangzhi commented Sep 15, 2019

lemairecarl commented Mar 6, 2018 •

edited by z-a-f

Loading

vfdev-5 commented Mar 15, 2018 •

edited

Loading

vfdev-5 commented Apr 24, 2018 •

edited

Loading

vfdev-5 commented Apr 24, 2018 •

edited

Loading

devforfu commented Sep 19, 2018 •

edited

Loading

fmassa commented Oct 16, 2018 •

edited

Loading