Improved format conversion in io.image.read_image #3021

datumbox · 2020-11-18T13:06:12Z

🚀 Feature

Replace the channels parameter of the read_image(), decode_image(), decode_png() and decode_jpeg() methods with a mode parameter that gives users better control on image conversions.

Motivation

Issue #2948 proposed specifying the number of output channels when loading images and PR #2988 introduced the change on the above methods. Though the API provides similar functionality to other libraries, it requires making implicit assumptions in relation to the mapping between # channels and output formats. For example, we assumed that channel=1 means Grayscale. Nevertheless since both Palette images and Gayscale images use 1 channel, we had to introduce logic for handling such corner-cases.

A better approach would be to give control to the users to explicitly define what type of conversions they want to make.

Pitch

Building on top of @vfdev-5's proposal, we could replace channels with a mode parameter. For example:

# old:
def read_image(path: str, channels: int = 0) -> torch.Tensor

# new:
def read_image(path: str, mode: ImageReadMode = ImageReadMode.UNCHANGED) -> torch.Tensor

The mode will be an enum with the following values:

ImageReadMode.UNCHANGED
ImageReadMode.GRAY
ImageReadMode.GRAY_ALPHA
ImageReadMode.RGB
ImageReadMode.RGB_ALPHA

The default value of mode will be ImageReadMode.UNCHANGED and it will have similar behaviour as the current channels=0. It will load the image without making any modification and to ensure BC it will additionally support Palette, CMYK and other currently supported formats.

Note: The scope of this proposal is to change this experimental API to allow for better image format support on the future. Adding support for converting images to Palette, from/to CMYK, etc is not within the scope of this proposal. Many of such conversions are not supported by LibJPEG and LibPNG and require writing custom conversion code which should be handled in a separate ticket.

The text was updated successfully, but these errors were encountered:

fmassa · 2020-11-18T13:30:13Z

The proposal sounds great to me.
One thing we should decide is what to do if the original input format doesn't support one specific configuration. For example, if the user passes RGB_ALPHA to a JPEG image, should we return a RGB_ALPHA with a all-ones (or zeros) alpha channel, or error out?

datumbox · 2020-11-18T13:46:20Z

Thanks for the feedback Francisco.

I had in mind to raise an error. This is exactly what we currently do if someone passes channels=4 for a JPEG:

vision/torchvision/csrc/cpu/image/readjpeg_cpu.cpp

Lines 108 to 123 in 8c28175

    
               switch (channels) { 
        
                 case 1: // Gray 
        
                   cinfo.out_color_space = JCS_GRAYSCALE; 
        
                   break; 
        
                 case 3: // RGB 
        
                   cinfo.out_color_space = JCS_RGB; 
        
                   break; 
        
                 /* 
        
                  * Libjpeg does not support converting from CMYK to grayscale etc. There 
        
                  * is a way to do this but it involves converting it manually to RGB: 
        
                  * https://github.com/tensorflow/tensorflow/blob/86871065265b04e0db8ca360c046421efb2bdeb4/tensorflow/core/lib/jpeg/jpeg_mem.cc#L284-L313 
        
                  * 
        
                  */ 
        
                 default: 
        
                   jpeg_destroy_decompress(&cinfo); 
        
                   TORCH_CHECK(false, "Invalid number of output channels.");

If on the future we decide that RGB_ALPHA is one of the supported conversion formats for jpeg images, we will simply add an entry on the above snippet along with the conversion code. In this case, as you hinted, the conversion code is simply adding a 4th channel with value 255 everywhere.

fmassa · 2020-11-19T10:35:21Z

I think raising an error is ok for now, but in this case we should document that this is only supported for some input format types.

In the future, we might want to make it support all formats, as this will give best user experience.

datumbox self-assigned this Nov 18, 2020

datumbox added new feature module: io labels Nov 18, 2020

datumbox mentioned this issue Nov 18, 2020

Add support of mode and remove channels #3024

Merged

fmassa closed this as completed in #3024 Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved format conversion in io.image.read_image #3021

Improved format conversion in io.image.read_image #3021

datumbox commented Nov 18, 2020 •

edited

Loading

fmassa commented Nov 18, 2020

datumbox commented Nov 18, 2020

fmassa commented Nov 19, 2020

Improved format conversion in io.image.read_image #3021

Improved format conversion in io.image.read_image #3021

Comments

datumbox commented Nov 18, 2020 • edited Loading

🚀 Feature

Motivation

Pitch

fmassa commented Nov 18, 2020

datumbox commented Nov 18, 2020

fmassa commented Nov 19, 2020

datumbox commented Nov 18, 2020 •

edited

Loading