-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding RandAugment implementation #4348
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below I highlight a few bits of the implementation that I think is noteworthy.
@SamuelGabriel Given your work on the area I would love it if you can provide your input once you are back.
# op_name: (magnitudes, signed) | ||
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SamuelGabriel I noticed in your implementation you restrict the maximum value of Translate to 14.4. I couldn't find a reference of that on the RandAugment paper. I was hoping if you could provide some background info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not use this setting for my RA experiments actually. See for example here: https://github.com/automl/trivialaugment/blob/master/confs/wresnet28x10_svhncore_b128_maxlr.1_ra_fixed.yaml
I used fixed_standard
which is the search space described in the RA paper, but slightly different from that in the implementation of the significant parts of this (https://github.com/tensorflow/models/tree/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment) AutoAugment implementation (which RA follows with its augmentation space), therefore I call it fixed
. The setting you speak about is for their ImageNet experiment, as my re-implementations of AA/RA where on 32x32 images (CIFAR/SVHN), I followed the implementation above. Here they set the translation to 10: https://github.com/tensorflow/models/blob/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment/augmentation_transforms.py#L319 They do not use the same augmentation space across datasets...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not sure what is the best strategy to follow here. Any idea? The same problem arises for AutoAugment actually. Should we ask the authors or focus only on 32x32 images or only on ImageNet? For TA it is simpler we use the same setting across datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure either. Here I decided to use this approach because of their comment on Table 6
on the AA paper. This also means that if you are a 32x32 image as in the case of CIFAR, your Translate max value would be 14.5, which is similar but on equal to yours and hence it sparked my interest on how you derived it. Not sure if this subpixel diff matters here.
@@ -178,9 +178,9 @@ def _get_transforms( | |||
else: | |||
raise ValueError("The provided policy {} is not recognized.".format(policy)) | |||
|
|||
def _get_magnitudes(self, num_bins: int, image_size: List[int]) -> Dict[str, Tuple[Tensor, bool]]: | |||
def _augmentation_space(self, num_bins: int, image_size: List[int]) -> Dict[str, Tuple[Tensor, bool]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even thought this is a private method, I decided to use the terminology of the TrivialAugment paper as I think it describes better what we get back (combination of permitted ops and magnitudes) for the given augmentation.
image. If given a number, the value is used for all bands respectively. | ||
""" | ||
|
||
def __init__(self, num_ops: int = 2, magnitude: int = 9, num_magnitude_bins: int = 30, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- N=2, M=9 are the best ImageNet values for ResNet50 (see A.2.3 on paper).
- num_magnitude_bins=30 because the majority of the experiments on the paper used this value. Weirdly section A.2.3 mentions trying the level 31 for EfficientNet B7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The num_magnitude_bins should be 31, like for TA, as 0 is also a bin and in the paper the maximal value is 30. That they tried level 31 is definitely weird and, I guess, a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
I haven't checked the bin values for the augmentation space, but the rest is good with me.
I've made a couple of minor comments, none of which are merge blocking
if isinstance(img, Tensor): | ||
if isinstance(fill, (int, float)): | ||
fill = [float(fill)] * F.get_image_num_channels(img) | ||
elif fill is not None: | ||
fill = [float(f) for f in fill] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be worth putting this in a helper function, or push it directly to _apply_op
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll add this in a helper method for now. This can move to a base class once we nail the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT is giving me headaches if I move it on ops. I'll add a TODO to remove duplicate code once we have a base class.
@@ -239,3 +239,87 @@ def forward(self, img: Tensor) -> Tensor: | |||
|
|||
def __repr__(self) -> str: | |||
return self.__class__.__name__ + '(policy={}, fill={})'.format(self.policy, self.fill) | |||
|
|||
|
|||
class RandAugment(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to inherit from AutoAugment
and override only the _augmentation_space
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. It's a direct copy-paste. The only reason I didn't make it static or inherit from AutoAugment is because I think we haven't nailed the API of the base class yet. I was thinking of keeping only the public parts visible and make changes once we add a couple of methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it looks good, besides the mentioned problems of the augmentation space.
image. If given a number, the value is used for all bands respectively. | ||
""" | ||
|
||
def __init__(self, num_ops: int = 2, magnitude: int = 9, num_magnitude_bins: int = 30, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The num_magnitude_bins should be 31, like for TA, as 0 is also a bin and in the paper the maximal value is 30. That they tried level 31 is definitely weird and, I guess, a typo.
# op_name: (magnitudes, signed) | ||
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not use this setting for my RA experiments actually. See for example here: https://github.com/automl/trivialaugment/blob/master/confs/wresnet28x10_svhncore_b128_maxlr.1_ra_fixed.yaml
I used fixed_standard
which is the search space described in the RA paper, but slightly different from that in the implementation of the significant parts of this (https://github.com/tensorflow/models/tree/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment) AutoAugment implementation (which RA follows with its augmentation space), therefore I call it fixed
. The setting you speak about is for their ImageNet experiment, as my re-implementations of AA/RA where on 32x32 images (CIFAR/SVHN), I followed the implementation above. Here they set the translation to 10: https://github.com/tensorflow/models/blob/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment/augmentation_transforms.py#L319 They do not use the same augmentation space across datasets...
"Solarize": (torch.linspace(256.0, 0.0, num_bins), False), | ||
"AutoContrast": (torch.tensor(0.0), False), | ||
"Equalize": (torch.tensor(0.0), False), | ||
"Invert": (torch.tensor(0.0), False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invert should not be here, but Identity should be. I believe you replicated the mistake I made in the TA implementation for Vision. Sorry for that. I'll fix it there, too. TA and RA use the same augmentation operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks! Indeed this was copied from you. Your implementation heavily inspired how the entire code was refactored here, so thanks a lot for the contribution.
# op_name: (magnitudes, signed) | ||
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not sure what is the best strategy to follow here. Any idea? The same problem arises for AutoAugment actually. Should we ask the authors or focus only on 32x32 images or only on ImageNet? For TA it is simpler we use the same setting across datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SamuelGabriel Thanks a lot for your review. I'll fix the issues you highlighted on a separate PR as this is already merged.
# op_name: (magnitudes, signed) | ||
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True), | ||
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure either. Here I decided to use this approach because of their comment on Table 6
on the AA paper. This also means that if you are a 32x32 image as in the case of CIFAR, your Translate max value would be 14.5, which is similar but on equal to yours and hence it sparked my interest on how you derived it. Not sure if this subpixel diff matters here.
"Solarize": (torch.linspace(256.0, 0.0, num_bins), False), | ||
"AutoContrast": (torch.tensor(0.0), False), | ||
"Equalize": (torch.tensor(0.0), False), | ||
"Invert": (torch.tensor(0.0), False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks! Indeed this was copied from you. Your implementation heavily inspired how the entire code was refactored here, so thanks a lot for the contribution.
Summary: Pull Request resolved: facebookresearch/vissl#421 * Adding randaugment implementation * Refactoring. * Adding num_magnitude_bins. * Adding FIXME. Reviewed By: fmassa Differential Revision: D30793331 fbshipit-source-id: 7a99c6d2e64931e10672ceea9e81309c62a799af
Partially implements #3817
Inspired from work started by @SamuelGabriel at #4221
cc @vfdev-5 @datumbox