Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved format conversion in io.image.read_image #3021

Closed
datumbox opened this issue Nov 18, 2020 · 3 comments · Fixed by #3024
Closed

Improved format conversion in io.image.read_image #3021

datumbox opened this issue Nov 18, 2020 · 3 comments · Fixed by #3024

Comments

@datumbox
Copy link
Contributor

datumbox commented Nov 18, 2020

🚀 Feature

Replace the channels parameter of the read_image(), decode_image(), decode_png() and decode_jpeg() methods with a mode parameter that gives users better control on image conversions.

Motivation

Issue #2948 proposed specifying the number of output channels when loading images and PR #2988 introduced the change on the above methods. Though the API provides similar functionality to other libraries, it requires making implicit assumptions in relation to the mapping between # channels and output formats. For example, we assumed that channel=1 means Grayscale. Nevertheless since both Palette images and Gayscale images use 1 channel, we had to introduce logic for handling such corner-cases.

A better approach would be to give control to the users to explicitly define what type of conversions they want to make.

Pitch

Building on top of @vfdev-5's proposal, we could replace channels with a mode parameter. For example:

# old:
def read_image(path: str, channels: int = 0) -> torch.Tensor

# new:
def read_image(path: str, mode: ImageReadMode = ImageReadMode.UNCHANGED) -> torch.Tensor

The mode will be an enum with the following values:

ImageReadMode.UNCHANGED
ImageReadMode.GRAY
ImageReadMode.GRAY_ALPHA
ImageReadMode.RGB
ImageReadMode.RGB_ALPHA

The default value of mode will be ImageReadMode.UNCHANGED and it will have similar behaviour as the current channels=0. It will load the image without making any modification and to ensure BC it will additionally support Palette, CMYK and other currently supported formats.

Note: The scope of this proposal is to change this experimental API to allow for better image format support on the future. Adding support for converting images to Palette, from/to CMYK, etc is not within the scope of this proposal. Many of such conversions are not supported by LibJPEG and LibPNG and require writing custom conversion code which should be handled in a separate ticket.

@fmassa
Copy link
Member

fmassa commented Nov 18, 2020

The proposal sounds great to me.
One thing we should decide is what to do if the original input format doesn't support one specific configuration. For example, if the user passes RGB_ALPHA to a JPEG image, should we return a RGB_ALPHA with a all-ones (or zeros) alpha channel, or error out?

@datumbox
Copy link
Contributor Author

Thanks for the feedback Francisco.

I had in mind to raise an error. This is exactly what we currently do if someone passes channels=4 for a JPEG:

switch (channels) {
case 1: // Gray
cinfo.out_color_space = JCS_GRAYSCALE;
break;
case 3: // RGB
cinfo.out_color_space = JCS_RGB;
break;
/*
* Libjpeg does not support converting from CMYK to grayscale etc. There
* is a way to do this but it involves converting it manually to RGB:
* https://github.com/tensorflow/tensorflow/blob/86871065265b04e0db8ca360c046421efb2bdeb4/tensorflow/core/lib/jpeg/jpeg_mem.cc#L284-L313
*
*/
default:
jpeg_destroy_decompress(&cinfo);
TORCH_CHECK(false, "Invalid number of output channels.");

If on the future we decide that RGB_ALPHA is one of the supported conversion formats for jpeg images, we will simply add an entry on the above snippet along with the conversion code. In this case, as you hinted, the conversion code is simply adding a 4th channel with value 255 everywhere.

@fmassa
Copy link
Member

fmassa commented Nov 19, 2020

I think raising an error is ok for now, but in this case we should document that this is only supported for some input format types.

In the future, we might want to make it support all formats, as this will give best user experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants