Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchvision.io.read_image return tensor shape is different. #3332

Closed
kairos03 opened this issue Feb 1, 2021 · 13 comments
Closed

torchvision.io.read_image return tensor shape is different. #3332

kairos03 opened this issue Feb 1, 2021 · 13 comments

Comments

@kairos03
Copy link

kairos03 commented Feb 1, 2021

🐛 Bug

torchvision.io.read_image return tensor shape is different with [3, width, height] on the document when reading the grayscale or RGBA image. It returns [1, width, height] or [4, width, height].

https://pytorch.org/docs/stable/torchvision/io.html#torchvision.io.read_image

To Reproduce

Steps to reproduce the behavior:

>>> img =  torchvision.io.read_image(<grayscale image>)
>>> img.shape
(1, 123, 123)

>>> img =  torchvision.io.read_image(<RGBA image>)
>>> img.shape
(4, 123, 123)

Expected behavior

>>> img =  torchvision.io.read_image(<grayscale image>)
>>> img.shape
(3, 123, 123)

>>> img =  torchvision.io.read_image(<RGBA image>)
>>> img.shape
(3, 123, 123)

Environment

PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.3 LTS (x86_64)
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Clang version: Could not collect
CMake version: version 3.10.2

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 440.100
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip] numpy==1.19.4
[pip] torch==1.7.1
[pip] torchaudio==0.7.0a0+a853dff
[pip] torchvision==0.8.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] mkl 2020.0 166
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] numpy 1.19.4 pypi_0 pypi
[conda] pytorch 1.7.1 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch
[conda] torchaudio 0.7.2 py37 pytorch
[conda] torchvision 0.8.2 py37_cu102 pytorch

@kairos03
Copy link
Author

kairos03 commented Feb 1, 2021

It seems that the documentation and development are not syncing.
#2988

@datumbox
Copy link
Contributor

datumbox commented Feb 1, 2021

@kairos03 The latest master of TorchVision has been updated to support reading grayscale images, transparency etc. So what you report is not a bug but an expected behaviour. See #2984, #2988 and #3024 for details on the feature.

This feature is not included at version 0.8.2 but it's only available on latest master. On version 0.8.2 you should be getting an exception:
https://github.com/pytorch/vision/blob/v0.8.2/torchvision/csrc/cpu/image/readpng_cpu.cpp#L74-L78

Which version of TorchVision as you currently using?

@kairos03
Copy link
Author

kairos03 commented Feb 1, 2021

@datumbox I'm using v0.8.2

@datumbox
Copy link
Contributor

datumbox commented Feb 1, 2021

Could you please confirm by checking the outputs of:

import torchvision
torchvision.__version__

@kairos03
Copy link
Author

kairos03 commented Feb 1, 2021

Here:

Python 3.7.7 (default, Mar 23 2020, 22:36:06) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchvision
>>> torchvision.__version__
'0.8.2'
>>> 

@datumbox
Copy link
Contributor

datumbox commented Feb 1, 2021

@kairos03 I can't reproduce what you see:

Reading a grayscale or transparent image using 0.8.2 throws an exception for me:

import torchvision

print(torchvision.__version__) #0.8.2

img =  torchvision.io.read_image("logos/gray_pytorch.png") #RuntimeError: Non RGB images are not supported.
print(img.shape)

I suspect that you might have installed a different torchvision version on a virtual environment, possibly the latest master or a nightly. Version 0.8.2 did not have support for non-RGB images and this was added later on latest master.

At any case, what you report is not a bug. We just added support for additional image types. The documentation has also been updated to reflect that and the website will be updated on the next release.

I'll close the issue but if you feel you need more support feel free to reopen it.

@datumbox datumbox closed this as completed Feb 1, 2021
@GDkids
Copy link

GDkids commented May 26, 2021

set the Args 'mode=ImageReadMode.RGB' can change output to [3, width, height]
class ImageReadMode directly controls it
more infomation can be see in 'https://github.com/pytorch/vision/blob/master/torchvision/io/image.py#L234-L248'
I meet this question today and find this link in the first place
I think comment here maybe useful for later viewers

@ckyleda
Copy link

ckyleda commented Jun 11, 2021

What can we do if we are stuck with torchvision 0.8.2?

Is there no solution?

@fmassa
Copy link
Member

fmassa commented Jun 14, 2021

@ckyleda if you can't update torchvision to latest version, you'll have to add some extra logic in your code to handle it.

Something like

img = torchvision.io.read_image("my_img.png")
if img.shape[0] == 4:
    img = img[:3]
elif img.shape[0] == 1:
    img = img.repeat(3, 1, 1)

@ckyleda
Copy link

ckyleda commented Jun 15, 2021

This works for images that are grayscale; but I have RGB images where the actual channels are important and replicating the information across all channels is not desired behavior.

It blows the mind that defaulting to single-channel image reading was ever implemented in the first place. I suspect this probably means I cannot use torch for my use case.

@fmassa
Copy link
Member

fmassa commented Jun 16, 2021

@ckyleda I'm sorry, I don't understand your last comment.

The default behavior for torchvision.io.read_image in torchvision 0.8.2 was to only support RGB images for PNG, returning 3 channels.

@majnas
Copy link

majnas commented Sep 23, 2021

This mode load png as 3 channel
img = torchvision.io.read_image("my_img.png", mode=torchvision.io.image.ImageReadMode.RGB)

@kuangxiaoye
Copy link

This works for images that are grayscale; but I have RGB images where the actual channels are important and replicating the information across all channels is not desired behavior.

It blows the mind that defaulting to single-channel image reading was ever implemented in the first place. I suspect this probably means I cannot use torch for my use case.

it works! Thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants