-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved format conversion in io.image.read_image #3021
Comments
The proposal sounds great to me. |
Thanks for the feedback Francisco. I had in mind to raise an error. This is exactly what we currently do if someone passes vision/torchvision/csrc/cpu/image/readjpeg_cpu.cpp Lines 108 to 123 in 8c28175
If on the future we decide that RGB_ALPHA is one of the supported conversion formats for jpeg images, we will simply add an entry on the above snippet along with the conversion code. In this case, as you hinted, the conversion code is simply adding a 4th channel with value 255 everywhere. |
I think raising an error is ok for now, but in this case we should document that this is only supported for some input format types. In the future, we might want to make it support all formats, as this will give best user experience. |
🚀 Feature
Replace the
channels
parameter of theread_image()
,decode_image()
,decode_png()
anddecode_jpeg()
methods with amode
parameter that gives users better control on image conversions.Motivation
Issue #2948 proposed specifying the number of output channels when loading images and PR #2988 introduced the change on the above methods. Though the API provides similar functionality to other libraries, it requires making implicit assumptions in relation to the mapping between # channels and output formats. For example, we assumed that
channel=1
means Grayscale. Nevertheless since both Palette images and Gayscale images use 1 channel, we had to introduce logic for handling such corner-cases.A better approach would be to give control to the users to explicitly define what type of conversions they want to make.
Pitch
Building on top of @vfdev-5's proposal, we could replace
channels
with amode
parameter. For example:The
mode
will be an enum with the following values:The default value of
mode
will be ImageReadMode.UNCHANGED and it will have similar behaviour as the currentchannels=0
. It will load the image without making any modification and to ensure BC it will additionally support Palette, CMYK and other currently supported formats.Note: The scope of this proposal is to change this experimental API to allow for better image format support on the future. Adding support for converting images to Palette, from/to CMYK, etc is not within the scope of this proposal. Many of such conversions are not supported by LibJPEG and LibPNG and require writing custom conversion code which should be handled in a separate ticket.
The text was updated successfully, but these errors were encountered: