Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Scriptable Transform: Grayscale #1505

Merged
merged 11 commits into from
Nov 5, 2019

Conversation

PyExtreme
Copy link
Contributor

This is my initial work on Adding Scriptable transform for Grayscale referenced in #1375

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I've left some comments, can you have a look?

Also, can you add tests comparing the behavior to what PIL does?

if not F._is_tensor_image(img):
raise TypeError('tensor is not a torch image.')

else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this else: branch? It's not necessary because we raise an error in the other branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you add a check that the input image has 3 channels?

raise TypeError('tensor is not a torch image.')

else:
hwc_img = img.transpose(1, 2).transpose(0, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to perform those transpositions?

@@ -48,3 +48,25 @@ def crop(img, top, left, height, width):
raise TypeError('tensor is not a torch image.')

return img[..., top:top + height, left:left + width]

def to_grayscale(img, num_output_channels = 3):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we might need for now to be a bit more explicit, and maybe say rgb_to_grayscale, because we don't have the original colorspace information.


else:
hwc_img = img.transpose(1, 2).transpose(0, 2)
weights = torch.tensor([0.2989, 0.5870, 0.1140])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a reference from where those values come from?
Also, can you create the tensor in the device of img, via device=img.device?

else:
hwc_img = img.transpose(1, 2).transpose(0, 2)
weights = torch.tensor([0.2989, 0.5870, 0.1140])
res = hwc_img[:,:,:3] * weights[None, :]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more efficient (specially on the GPU) to do something like

res = torch.tensordot(img, weights[:, None, None], [[0], [0]]).squeeze()

although this might be a bit less readable. Can you do some benchmarks to see if there is any difference and report back?

if num_output_channels == 1:
return gray_img.int()
else:
return torch.cat((gray_img.int(), gray_img.int(), gray_img.int())).transpose(1, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this .int makes the assumption that the input is in uint8 format, which is not enforced anywhere in the code. In fact, the code works fine if the images are in float32 / 0-1 range as well.
Can you use .repeat instead?

res = hwc_img[:,:,:3] * weights[None, :]
gray_img = res[:, :, 0] + res[:, :, 1] + res[:, :, 2]
if num_output_channels == 1:
return gray_img.int()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the int() call here. If anything, it should be .to(dtype=torch.uint8), but I don't think we want to enforce it, we should just return whatever type was passed by the user

@fmassa
Copy link
Member

fmassa commented Oct 21, 2019

Also, lint is failing, can you fix it afterwards as well? https://travis-ci.org/pytorch/vision/jobs/599950160

@PyExtreme
Copy link
Contributor Author

Hi @fmassa , thanks for the review. Will surely make the changes.

@PyExtreme
Copy link
Contributor Author

PyExtreme commented Oct 25, 2019

Hi @fmassa , I have been reading and understanding the codebase. I maybe on vacation till next complete week. So, I may be making changes and pushing changes next to next week.

Thank you

@PyExtreme
Copy link
Contributor Author

Hi @fmassa , I have made some required changes.

Seems like there are conflicts, will solve those.

Also, will be adding a test soon.

@fmassa
Copy link
Member

fmassa commented Oct 29, 2019

Hi @PyExtreme

Maybe you could use the implementation of rgb_to_grayscale from #1525 to write your implementation?

@PyExtreme
Copy link
Contributor Author

PyExtreme commented Oct 29, 2019

@fmassa , that's really thoughtful and intuitive.
Sure, will make the changes.

@PyExtreme PyExtreme force-pushed the add_scriptable_grayscale branch from b03d237 to 18553f8 Compare October 29, 2019 18:18
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you have reverted the changes from #1525 ?

@PyExtreme
Copy link
Contributor Author

PyExtreme commented Oct 31, 2019

Looks like you have reverted the changes from #1525 ?

@fmassa , I think I may have to make a few changes. I will surely inform you when I am done.
Thanks

@PyExtreme PyExtreme force-pushed the add_scriptable_grayscale branch from 18553f8 to fc7dfd2 Compare October 31, 2019 18:12
@codecov-io
Copy link

codecov-io commented Oct 31, 2019

Codecov Report

Merging #1505 into master will increase coverage by 0.32%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1505      +/-   ##
==========================================
+ Coverage   65.15%   65.48%   +0.32%     
==========================================
  Files          90       90              
  Lines        7078     7080       +2     
  Branches     1076     1077       +1     
==========================================
+ Hits         4612     4636      +24     
+ Misses       2147     2135      -12     
+ Partials      319      309      -10
Impacted Files Coverage Δ
torchvision/transforms/functional_tensor.py 58.82% <66.66%> (-3.68%) ⬇️
torchvision/transforms/functional.py 71.67% <0%> (+1.13%) ⬆️
torchvision/transforms/transforms.py 81.04% <0%> (+1.16%) ⬆️
torchvision/datasets/voc.py 21.64% <0%> (+2.06%) ⬆️
torchvision/datasets/folder.py 82.05% <0%> (+2.56%) ⬆️
torchvision/datasets/lsun.py 24.48% <0%> (+4.08%) ⬆️
torchvision/datasets/cifar.py 78.16% <0%> (+6.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd17484...a30ff08. Read the comment docs.

@PyExtreme
Copy link
Contributor Author

PyExtreme commented Oct 31, 2019

@fmassa , it would be of great help for me if you could please tell me the commands to compile the code locally.
Presently, I am just running pytest and flake8 test

Also, in this PR, since the rgb_to_grayscale function will be public, I have removed the _ from _rgb_to_grayscale in the function call from adjust_saturation and adjust_contrast functions

Thank you

@PyExtreme
Copy link
Contributor Author

Hi @fmassa , would be great if you could please review it.

Thank you

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments.

You can compile and test it locally by doing

python setup.py install

and then

python test/test_functional_tensor.py

if img.size()[0] != 3:
raise TypeError('Input Image does not contain 3 Channels')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this part as before. This is an implicit cast which can lead to very unexpected results.
For example, if a user passes a CPU tensor, but has a CUDA device available, this will be returning a CUDA tensor, which is definitely not expected, and is the reason why one of the tests is failing.

@@ -19,10 +17,8 @@ def vflip(img_tensor):

def hflip(img_tensor):
"""Horizontally flip the given the Image Tensor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the whitespace changes all over the file? They are necessary for the documentation to render properly

For RGB to Grayscale conversion, ITU-R 601-2 luma transform is performed which
is L = R * 0.2989 + G * 0.5870 + B * 0.1140
"""
if img.size()[0] != 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you make it instead img.shape[0] != 3?

grayscale_tensor = F_t.rgb_to_grayscale(img_tensor)
pil_img = np.rollaxis(np.array(transforms.ToPILImage()(img_tensor).convert("RGB")), 2, 0)
grayscale_pil_img = (0.2989 * pil_img[0] + 0.5870 * pil_img[1] + 0.1140 * pil_img[2])
self.assertTrue(torch.equal(grayscale_tensor.to(int), torch.tensor(grayscale_pil_img).to(int)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is not testing against PIL implementation of grayscale conversion.
Can you instead compare F.grayscale and F_t.rgb_to_grayscale?

@PyExtreme
Copy link
Contributor Author

@fmassa , sure will do the changes

@PyExtreme PyExtreme force-pushed the add_scriptable_grayscale branch from 68a0938 to 953e2cc Compare November 4, 2019 13:55
@PyExtreme
Copy link
Contributor Author

@fmassa , I have made the changes.

Thank you

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor comments, and then it should be good to go

img_tensor = torch.randint(0, 255, (3, 16, 16), dtype=torch.uint8)
grayscale_tensor = F_t.rgb_to_grayscale(img_tensor).to(int)
grayscale_pil_img = torch.tensor(np.array(F.to_grayscale(F.to_pil_image(img_tensor)))).to(int)
self.assertTrue(torch.allclose(grayscale_tensor, grayscale_pil_img, 1e-1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change using torch.allclose with something like

max_diff = (grayscale_tensor - grayscale_pil_img).abs().max()
self.asertLess(...)

so that it gives a better idea of how far off are the two implementations of grayscale?

Copy link
Contributor Author

@PyExtreme PyExtreme Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fmassa , the max diff will be 1 which arises maybe due to rounding. But for most if the values of the two tensors are equal.

@@ -4,10 +4,8 @@

def vflip(img_tensor):
"""Vertically flip the given the Image Tensor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the removal of newlines introduced in this PR please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmassa , I had just removed 1 newline. However, in order to make the format of the docstring of this module look like other ones, I will make the required changes.


Args
img (Tensor): Image to be converted to Grayscale in the form [C, H, W].
Tensor: Grayscale image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you format the output as the other examples?

Args:
    img (Tensor): ...

Returns:
    Tensor: Grayscale image

img (Tensor): Image to be converted to Grayscale in the form [C, H, W].
Tensor: Grayscale image.

For RGB to Grayscale conversion, ITU-R 601-2 luma transform is performed which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this before Args:, so that it's properly rendered during documentation generation?

@PyExtreme
Copy link
Contributor Author

@fmassa , I have the mentioned changes. Please feel free to reach out to me if I can help in something here.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fmassa fmassa merged commit 8909ff4 into pytorch:master Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants