-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Scriptable Transform: Grayscale #1505
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I've left some comments, can you have a look?
Also, can you add tests comparing the behavior to what PIL
does?
if not F._is_tensor_image(img): | ||
raise TypeError('tensor is not a torch image.') | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this else:
branch? It's not necessary because we raise an error in the other branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you add a check that the input image has 3
channels?
raise TypeError('tensor is not a torch image.') | ||
|
||
else: | ||
hwc_img = img.transpose(1, 2).transpose(0, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to perform those transpositions?
@@ -48,3 +48,25 @@ def crop(img, top, left, height, width): | |||
raise TypeError('tensor is not a torch image.') | |||
|
|||
return img[..., top:top + height, left:left + width] | |||
|
|||
def to_grayscale(img, num_output_channels = 3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we might need for now to be a bit more explicit, and maybe say rgb_to_grayscale
, because we don't have the original colorspace information.
|
||
else: | ||
hwc_img = img.transpose(1, 2).transpose(0, 2) | ||
weights = torch.tensor([0.2989, 0.5870, 0.1140]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a reference from where those values come from?
Also, can you create the tensor in the device of img
, via device=img.device
?
else: | ||
hwc_img = img.transpose(1, 2).transpose(0, 2) | ||
weights = torch.tensor([0.2989, 0.5870, 0.1140]) | ||
res = hwc_img[:,:,:3] * weights[None, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more efficient (specially on the GPU) to do something like
res = torch.tensordot(img, weights[:, None, None], [[0], [0]]).squeeze()
although this might be a bit less readable. Can you do some benchmarks to see if there is any difference and report back?
if num_output_channels == 1: | ||
return gray_img.int() | ||
else: | ||
return torch.cat((gray_img.int(), gray_img.int(), gray_img.int())).transpose(1, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this .int
makes the assumption that the input is in uint8
format, which is not enforced anywhere in the code. In fact, the code works fine if the images are in float32
/ 0-1
range as well.
Can you use .repeat
instead?
res = hwc_img[:,:,:3] * weights[None, :] | ||
gray_img = res[:, :, 0] + res[:, :, 1] + res[:, :, 2] | ||
if num_output_channels == 1: | ||
return gray_img.int() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the int()
call here. If anything, it should be .to(dtype=torch.uint8)
, but I don't think we want to enforce it, we should just return whatever type was passed by the user
Also, lint is failing, can you fix it afterwards as well? https://travis-ci.org/pytorch/vision/jobs/599950160 |
Hi @fmassa , thanks for the review. Will surely make the changes. |
Hi @fmassa , I have been reading and understanding the codebase. I maybe on vacation till next complete week. So, I may be making changes and pushing changes next to next week. Thank you |
Hi @fmassa , I have made some required changes. Seems like there are conflicts, will solve those. Also, will be adding a test soon. |
Hi @PyExtreme Maybe you could use the implementation of |
@fmassa , that's really thoughtful and intuitive. |
b03d237
to
18553f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you have reverted the changes from #1525 ?
18553f8
to
fc7dfd2
Compare
Codecov Report
@@ Coverage Diff @@
## master #1505 +/- ##
==========================================
+ Coverage 65.15% 65.48% +0.32%
==========================================
Files 90 90
Lines 7078 7080 +2
Branches 1076 1077 +1
==========================================
+ Hits 4612 4636 +24
+ Misses 2147 2135 -12
+ Partials 319 309 -10
Continue to review full report at Codecov.
|
@fmassa , it would be of great help for me if you could please tell me the commands to compile the code locally. Also, in this PR, since the Thank you |
Hi @fmassa , would be great if you could please review it. Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few more comments.
You can compile and test it locally by doing
python setup.py install
and then
python test/test_functional_tensor.py
if img.size()[0] != 3: | ||
raise TypeError('Input Image does not contain 3 Channels') | ||
|
||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this part as before. This is an implicit cast which can lead to very unexpected results.
For example, if a user passes a CPU tensor, but has a CUDA device available, this will be returning a CUDA tensor, which is definitely not expected, and is the reason why one of the tests is failing.
@@ -19,10 +17,8 @@ def vflip(img_tensor): | |||
|
|||
def hflip(img_tensor): | |||
"""Horizontally flip the given the Image Tensor. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert the whitespace changes all over the file? They are necessary for the documentation to render properly
For RGB to Grayscale conversion, ITU-R 601-2 luma transform is performed which | ||
is L = R * 0.2989 + G * 0.5870 + B * 0.1140 | ||
""" | ||
if img.size()[0] != 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you make it instead img.shape[0] != 3
?
test/test_functional_tensor.py
Outdated
grayscale_tensor = F_t.rgb_to_grayscale(img_tensor) | ||
pil_img = np.rollaxis(np.array(transforms.ToPILImage()(img_tensor).convert("RGB")), 2, 0) | ||
grayscale_pil_img = (0.2989 * pil_img[0] + 0.5870 * pil_img[1] + 0.1140 * pil_img[2]) | ||
self.assertTrue(torch.equal(grayscale_tensor.to(int), torch.tensor(grayscale_pil_img).to(int))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is not testing against PIL implementation of grayscale conversion.
Can you instead compare F.grayscale
and F_t.rgb_to_grayscale
?
@fmassa , sure will do the changes |
68a0938
to
953e2cc
Compare
@fmassa , I have made the changes. Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more minor comments, and then it should be good to go
test/test_functional_tensor.py
Outdated
img_tensor = torch.randint(0, 255, (3, 16, 16), dtype=torch.uint8) | ||
grayscale_tensor = F_t.rgb_to_grayscale(img_tensor).to(int) | ||
grayscale_pil_img = torch.tensor(np.array(F.to_grayscale(F.to_pil_image(img_tensor)))).to(int) | ||
self.assertTrue(torch.allclose(grayscale_tensor, grayscale_pil_img, 1e-1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change using torch.allclose
with something like
max_diff = (grayscale_tensor - grayscale_pil_img).abs().max()
self.asertLess(...)
so that it gives a better idea of how far off are the two implementations of grayscale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fmassa , the max diff will be 1 which arises maybe due to rounding. But for most if the values of the two tensors are equal.
@@ -4,10 +4,8 @@ | |||
|
|||
def vflip(img_tensor): | |||
"""Vertically flip the given the Image Tensor. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert the removal of newlines introduced in this PR please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmassa , I had just removed 1 newline. However, in order to make the format of the docstring of this module look like other ones, I will make the required changes.
|
||
Args | ||
img (Tensor): Image to be converted to Grayscale in the form [C, H, W]. | ||
Tensor: Grayscale image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you format the output as the other examples?
Args:
img (Tensor): ...
Returns:
Tensor: Grayscale image
img (Tensor): Image to be converted to Grayscale in the form [C, H, W]. | ||
Tensor: Grayscale image. | ||
|
||
For RGB to Grayscale conversion, ITU-R 601-2 luma transform is performed which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this before Args:
, so that it's properly rendered during documentation generation?
@fmassa , I have the mentioned changes. Please feel free to reach out to me if I can help in something here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is my initial work on Adding Scriptable transform for Grayscale referenced in #1375