To extend torchvision for video #855

JuanFMontesinos · 2019-04-16T07:48:18Z

Motivation

I've realized that the way torchvision is coded it's not possible to store a transformation to be applied several times. Video requires the same transformation to be applied to the whole sequence.

Proposed changes

I propose to restructure the code with minor changes such that:

A base transformation class (template) were created, providing get_params and reset_params method:

class BaseTransformation(object):  
    def get_params(self):  
        pass  
    def reset_params(self):  
        pass

get_params would provide needed parameters if necessary meanwhile reset_params would act as param initilizer + reseter.

To modify compose class to deal with list/tuples of frames such that when the list were exhausted, paramters would be reset:

class Compose(object):
    """Composes several transforms together.

    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.

    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),
        >>> ])
    """

    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, inpt):
        if isinstance(inpt,(list,tuple)):
            return self.apply_sequence(inpt)
        else:
            return self.apply_img(inpt)
    def apply_img(self,img):
        for t in self.transforms:
            img = t(img)
        return img
    def apply_sequence(self,seq):
        output = list(map(self.apply_img,seq))
        for t in self.transforms:
            t.reset_params()
        return output
    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        for t in self.transforms:
            format_string += '\n'
            format_string += '    {0}'.format(t)
        format_string += '\n)'
        return format_string

To set random parameters and image parameters as object attributes. As some parameters requires image features to be computed, parameters would be initialized as None and computed/stored with the 1st frame:
Example 1:

class RandomHorizontalFlip(object):
    """Horizontally flip the given PIL Image randomly with a given probability.

    Args:
        p (float): probability of the image being flipped. Default value is 0.5
    """

    def __init__(self, p=0.5):
        self.p = p

    def __call__(self, img):
        """
        Args:
            img (PIL Image): Image to be flipped.

        Returns:
            PIL Image: Randomly flipped image.
        """
        if self.flag is None: #This was initially if random.random() < self.p: so it was not possible
                                       #to apply the same transformation to another frame
            self.get_paramters()
        if self.flag:
            return F.hflip(img)
        return img

    def __repr__(self):
        return self.__class__.__name__ + '(p={})'.format(self.p)
    def get_paramters(self):
        self.flag = random.random() < self.p
    def reset_params(self):
        self.flag = None

Example 2:

class RandomResizedCrop(BaseTransformation):
    """Crop the given PIL Image to random size and aspect ratio.

    A crop of random size (default: of 0.08 to 1.0) of the original size and a random
    aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop
    is finally resized to given size.
    This is popularly used to train the Inception networks.

    Args:
        size: expected output size of each edge
        scale: range of size of the origin size cropped
        ratio: range of aspect ratio of the origin aspect ratio cropped
        interpolation: Default: PIL.Image.BILINEAR
    """

    def __init__(self, size, scale=(0.08, 1.0), ratio=(3. / 4., 4. / 3.), interpolation=Image.BILINEAR):
        if isinstance(size, tuple):
            self.size = size
        else:
            self.size = (size, size)
        if (scale[0] > scale[1]) or (ratio[0] > ratio[1]):
            warnings.warn("range should be of kind (min, max)")

        self.interpolation = interpolation
        self.scale = scale
        self.ratio = ratio
        self.reset_params()

    def get_params(self,img, scale, ratio):
        """Get parameters for ``crop`` for a random sized crop.

        Args:
            img (PIL Image): Image to be cropped.
            scale (tuple): range of size of the origin size cropped
            ratio (tuple): range of aspect ratio of the origin aspect ratio cropped

        Returns:
            tuple: params (i, j, h, w) to be passed to ``crop`` for a random
                sized crop.
        """
        area = img.size[0] * img.size[1]

        for attempt in range(10):
            target_area = random.uniform(*scale) * area
            log_ratio = (math.log(ratio[0]), math.log(ratio[1]))
            aspect_ratio = math.exp(random.uniform(*log_ratio))

            w = int(round(math.sqrt(target_area * aspect_ratio)))
            h = int(round(math.sqrt(target_area / aspect_ratio)))

            if w <= img.size[0] and h <= img.size[1]:
                i = random.randint(0, img.size[1] - h)
                j = random.randint(0, img.size[0] - w)
                return i, j, h, w

        # Fallback to central crop
        in_ratio = img.size[0] / img.size[1]
        if (in_ratio < min(ratio)):
            w = img.size[0]
            h = w / min(ratio)
        elif (in_ratio > max(ratio)):
            h = img.size[1]
            w = h * max(ratio)
        else:  # whole image
            w = img.size[0]
            h = img.size[1]
        self.i = (img.size[1] - h) // 2
        self.j = (img.size[0] - w) // 2
        self.h = h
        self.w = w
        
    def reset_params(self):
        self.i = None
        self.j = None
        self.h = None
        self.w = None    
        
    def __call__(self, img):
        """
        Args:
            img (PIL Image): Image to be cropped and resized.

        Returns:
            PIL Image: Randomly cropped and resized image.
        """
        if self.i is None:
            assert self.i == self.h == self.j == self.w 
            self.get_params(img, self.size)

        return F.resized_crop(img, self.i, self.j, self.h,
                              self.w, self.size, self.interpolation)

    def __repr__(self):
        interpolate_str = _pil_interpolation_to_str[self.interpolation]
        format_string = self.__class__.__name__ + '(size={0}'.format(self.size)
        format_string += ', scale={0}'.format(tuple(round(s, 4) for s in self.scale))
        format_string += ', ratio={0}'.format(tuple(round(r, 4) for r in self.ratio))
        format_string += ', interpolation={0})'.format(interpolate_str)
        return format_string

The text was updated successfully, but these errors were encountered:

fmassa · 2019-04-16T08:42:36Z

Thanks for opening the issue!

Adding support for video data is in the plans, and will be integrated in the next major release of torchvision. This also involves the transforms.

JuanFMontesinos · 2019-04-16T08:49:17Z

Hi!
Could you tell me which branch contains those changes? pretty sure it will be better than what I'm suggesting.

fmassa · 2019-04-16T08:57:49Z

It's currently in a private branch, I'm working on some other things now and I'll get back to video once I've finished those next tasks, hopefully by the end of the week

seelikat · 2019-04-26T16:23:48Z

@fmassa Are there news on this already? Using these transforms in video clips would be very useful for us right now.

JuanFMontesinos · 2019-04-26T16:33:48Z

@kateiyas You can download a pip package called flerken (under development), which contains a framework for pytorch but also torchvision adapted for video
You can import these submodules and they act as a drop-in wrt torchvision transforms
if you input a list of frames instead of a single frame it will apply the same transformation to all of them and then reset with new random parameters
from flerken.dataloaders import transforms as T
from flerken.dataloaders.transforms import Compose

You have all the torchvision transforms there (only main compose class has been rewritten)

seelikat · 2019-04-30T13:45:32Z

@JuanFMontesinos Thanks, but a few small adaptations to this package fulfilled my needs: https://github.com/hassony2/torch_videovision
Good luck with your project though!

I hope all of this will be integrated into pytorch soon.

fmassa · 2019-04-30T14:26:57Z

We got a bit late with the work on video. It won't be present in the 0.3 release, but in the next one.

ekagra-ranjan · 2019-05-04T19:03:13Z

@fmassa What is the rough timeline that you have in mind for releasing the major changes in the torchvision.transform?

fmassa · 2019-07-02T10:04:20Z

@ekagra-ranjan

Next release with video is planned for end of July.

First PR adding video reading is already merged in #1039

willprice · 2019-08-14T17:50:43Z

@JuanFMontesinos you might also be interested in https://torchvideo.readthedocs.io/en/latest/

JuanFMontesinos · 2019-08-18T03:45:53Z

@willprice thanks! It looks really nice

fmassa · 2019-08-28T15:26:52Z

TorchVision 0.4 with support for video has been released, see https://github.com/pytorch/vision/releases/tag/v0.4.0

We still need to adapt the default transforms to support video data, but that might be a breaking change so we currently have them in the references/video_classification folder

fmassa · 2019-09-26T15:09:52Z

Initial set of transforms for video have been added in #1353
There will still be some refactoring happening, so it shouldn't be considered to be used in its current state, but it will soon be ready for consumption.

bjuncek · 2020-10-15T17:27:49Z

@fmassa I believe these have been closed by @vfdev-5 s tensor transform PRs.

Do you think it's safe to close this?

Greeser · 2020-10-19T08:12:41Z

What is a current state with video transformation?

What I've understood so far is that there was a plan to add transformation. Currently _transforms_video.py and _functional_video.py remain private. From this I can assume that these files won't live long and will be discarded soon.

vfdev-5 · 2020-10-19T10:39:43Z

@Greeser with the upcoming release, almost all torchvision transformations will work on tensors of shape (B, C, H, W) which should cover the scope of _transforms_video.py. See https://github.com/pytorch/vision/tree/master/examples/python and please keep in mind that these examples are subject to a minor changes before the official release.

fmassa added the enhancement label Apr 16, 2019

pmeier mentioned this issue Apr 20, 2019

Refactor transforms #861

Closed

fmassa added module: transforms module: video labels Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To extend torchvision for video #855

To extend torchvision for video #855

JuanFMontesinos commented Apr 16, 2019 •

edited

Loading

fmassa commented Apr 16, 2019

JuanFMontesinos commented Apr 16, 2019

fmassa commented Apr 16, 2019

seelikat commented Apr 26, 2019

JuanFMontesinos commented Apr 26, 2019

seelikat commented Apr 30, 2019

fmassa commented Apr 30, 2019

ekagra-ranjan commented May 4, 2019

fmassa commented Jul 2, 2019

willprice commented Aug 14, 2019

JuanFMontesinos commented Aug 18, 2019

fmassa commented Aug 28, 2019

fmassa commented Sep 26, 2019

bjuncek commented Oct 15, 2020

Greeser commented Oct 19, 2020

vfdev-5 commented Oct 19, 2020 •

edited

Loading

To extend torchvision for video #855

To extend torchvision for video #855

Comments

JuanFMontesinos commented Apr 16, 2019 • edited Loading

Motivation

Proposed changes

fmassa commented Apr 16, 2019

JuanFMontesinos commented Apr 16, 2019

fmassa commented Apr 16, 2019

seelikat commented Apr 26, 2019

JuanFMontesinos commented Apr 26, 2019

seelikat commented Apr 30, 2019

fmassa commented Apr 30, 2019

ekagra-ranjan commented May 4, 2019

fmassa commented Jul 2, 2019

willprice commented Aug 14, 2019

JuanFMontesinos commented Aug 18, 2019

fmassa commented Aug 28, 2019

fmassa commented Sep 26, 2019

bjuncek commented Oct 15, 2020

Greeser commented Oct 19, 2020

vfdev-5 commented Oct 19, 2020 • edited Loading

JuanFMontesinos commented Apr 16, 2019 •

edited

Loading

vfdev-5 commented Oct 19, 2020 •

edited

Loading