-
Notifications
You must be signed in to change notification settings - Fork 711
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
962 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# GhostNetV2: Enhance Cheap Operation with Long-Range Attention | ||
|
||
Code for our NeurIPS 2022 (Spotlight) paper, [GhostNetV2: Enhance Cheap Operation with Long-Range Attention](https://openreview.net/pdf/6db544c65bbd0fa7d7349508454a433c112470e2.pdf). Light-weight convolutional neural networks (CNNs) are specially designed for applications on mobile devices with faster inference speed. The convolutional operation can only capture local information in a window region, which prevents performance from being further improved. Introducing self-attention into convolution can capture global information well, but it will largely encumber the actual speed. In this paper, we propose a hardware-friendly attention mechanism (dubbed DFC attention) and then present a new GhostNetV2 architecture for mobile applications. The proposed DFC attention is constructed based on fully-connected layers, which can not only execute fast on common hardware but also capture the dependence between long-range pixels. We further revisit the expressiveness bottleneck in previous GhostNet and propose to enhance expanded features produced by cheap operations with DFC attention, so that a GhostNetV2 block can aggregate local and long-range information simultaneously. Extensive experiments demonstrate the superiority of GhostNetV2 over existing architectures. For example, it achieves 75.3% top-1 accuracy on ImageNet with 167M FLOPs, significantly suppressing GhostNetV1 (74.5%) with a similar computational cost. | ||
|
||
The information flow of DFC attention: | ||
|
||
<p align="center"> | ||
<img src="fig/dfc.PNG" width="800"> | ||
</p> | ||
|
||
|
||
The diagrams of blocks in GhostNetV1 and GhostNetV2: | ||
|
||
<p align="center"> | ||
<img src="fig/ghostnetv2.PNG" width="800"> | ||
</p> | ||
|
||
|
||
|
||
## Requirements | ||
|
||
- python 3 | ||
- pytorch == 1.7.1 | ||
- torchvision == 0.8.2 | ||
- timm==0.3.2 | ||
|
||
## Usage | ||
|
||
|
||
Run ghostnetv2/train.py` to train models. For example, you can run the following code to train GhostNetV2 on ImageNet dataset. | ||
|
||
```shell | ||
python -m torch.distributed.launch --nproc_per_node=8 train.py path_to_imagenet/ --output /cache/models/ --model ghostnetv2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --lr .064 --lr-noise 0.42 0.9 --width 1.0 | ||
``` | ||
## Results | ||
|
||
<p align="center"> | ||
<img src="fig/imagenet.PNG" width="900"> | ||
</p> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,261 @@ | ||
# 2020.11.06-Changed for building GhostNetV2 | ||
# Huawei Technologies Co., Ltd. <[email protected]> | ||
""" | ||
Creates a GhostNet Model as defined in: | ||
GhostNet: More Features from Cheap Operations By Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu. | ||
https://arxiv.org/abs/1911.11907 | ||
Modified from https://github.com/d-li14/mobilenetv3.pytorch and https://github.com/rwightman/pytorch-image-models | ||
""" | ||
import torch | ||
import torch.nn as nn | ||
import torch.nn.functional as F | ||
import math | ||
|
||
from timm.models.registry import register_model | ||
|
||
def _make_divisible(v, divisor, min_value=None): | ||
""" | ||
This function is taken from the original tf repo. | ||
It ensures that all layers have a channel number that is divisible by 8 | ||
It can be seen here: | ||
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py | ||
""" | ||
if min_value is None: | ||
min_value = divisor | ||
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) | ||
# Make sure that round down does not go down by more than 10%. | ||
if new_v < 0.9 * v: | ||
new_v += divisor | ||
return new_v | ||
|
||
def hard_sigmoid(x, inplace: bool = False): | ||
if inplace: | ||
return x.add_(3.).clamp_(0., 6.).div_(6.) | ||
else: | ||
return F.relu6(x + 3.) / 6. | ||
|
||
class SqueezeExcite(nn.Module): | ||
def __init__(self, in_chs, se_ratio=0.25, reduced_base_chs=None, | ||
act_layer=nn.ReLU, gate_fn=hard_sigmoid, divisor=4, **_): | ||
super(SqueezeExcite, self).__init__() | ||
self.gate_fn = gate_fn | ||
reduced_chs = _make_divisible((reduced_base_chs or in_chs) * se_ratio, divisor) | ||
self.avg_pool = nn.AdaptiveAvgPool2d(1) | ||
self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True) | ||
self.act1 = act_layer(inplace=True) | ||
self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True) | ||
|
||
def forward(self, x): | ||
x_se = self.avg_pool(x) | ||
x_se = self.conv_reduce(x_se) | ||
x_se = self.act1(x_se) | ||
x_se = self.conv_expand(x_se) | ||
x = x * self.gate_fn(x_se) | ||
return x | ||
|
||
class ConvBnAct(nn.Module): | ||
def __init__(self, in_chs, out_chs, kernel_size, | ||
stride=1, act_layer=nn.ReLU): | ||
super(ConvBnAct, self).__init__() | ||
self.conv = nn.Conv2d(in_chs, out_chs, kernel_size, stride, kernel_size//2, bias=False) | ||
self.bn1 = nn.BatchNorm2d(out_chs) | ||
self.act1 = act_layer(inplace=True) | ||
|
||
def forward(self, x): | ||
x = self.conv(x) | ||
x = self.bn1(x) | ||
x = self.act1(x) | ||
return x | ||
|
||
class GhostModuleV2(nn.Module): | ||
def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True,mode=None,args=None): | ||
super(GhostModuleV2, self).__init__() | ||
self.mode=mode | ||
self.gate_fn=nn.Sigmoid() | ||
|
||
if self.mode in ['original']: | ||
self.oup = oup | ||
init_channels = math.ceil(oup / ratio) | ||
new_channels = init_channels*(ratio-1) | ||
self.primary_conv = nn.Sequential( | ||
nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False), | ||
nn.BatchNorm2d(init_channels), | ||
nn.ReLU(inplace=True) if relu else nn.Sequential(), | ||
) | ||
self.cheap_operation = nn.Sequential( | ||
nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False), | ||
nn.BatchNorm2d(new_channels), | ||
nn.ReLU(inplace=True) if relu else nn.Sequential(), | ||
) | ||
elif self.mode in ['attn']: | ||
self.oup = oup | ||
init_channels = math.ceil(oup / ratio) | ||
new_channels = init_channels*(ratio-1) | ||
self.primary_conv = nn.Sequential( | ||
nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False), | ||
nn.BatchNorm2d(init_channels), | ||
nn.ReLU(inplace=True) if relu else nn.Sequential(), | ||
) | ||
self.cheap_operation = nn.Sequential( | ||
nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False), | ||
nn.BatchNorm2d(new_channels), | ||
nn.ReLU(inplace=True) if relu else nn.Sequential(), | ||
) | ||
self.short_conv = nn.Sequential( | ||
nn.Conv2d(inp, oup, kernel_size, stride, kernel_size//2, bias=False), | ||
nn.BatchNorm2d(oup), | ||
nn.Conv2d(oup, oup, kernel_size=(1,5), stride=1, padding=(0,2), groups=oup,bias=False), | ||
nn.BatchNorm2d(oup), | ||
nn.Conv2d(oup, oup, kernel_size=(5,1), stride=1, padding=(2,0), groups=oup,bias=False), | ||
nn.BatchNorm2d(oup), | ||
) | ||
|
||
def forward(self, x): | ||
if self.mode in ['original']: | ||
x1 = self.primary_conv(x) | ||
x2 = self.cheap_operation(x1) | ||
out = torch.cat([x1,x2], dim=1) | ||
return out[:,:self.oup,:,:] | ||
elif self.mode in ['attn']: | ||
res=self.short_conv(F.avg_pool2d(x,kernel_size=2,stride=2)) | ||
x1 = self.primary_conv(x) | ||
x2 = self.cheap_operation(x1) | ||
out = torch.cat([x1,x2], dim=1) | ||
return out[:,:self.oup,:,:]*F.interpolate(self.gate_fn(res),size=out.shape[-1],mode='nearest') | ||
|
||
|
||
class GhostBottleneckV2(nn.Module): | ||
|
||
def __init__(self, in_chs, mid_chs, out_chs, dw_kernel_size=3, | ||
stride=1, act_layer=nn.ReLU, se_ratio=0.,layer_id=None,args=None): | ||
super(GhostBottleneckV2, self).__init__() | ||
has_se = se_ratio is not None and se_ratio > 0. | ||
self.stride = stride | ||
|
||
# Point-wise expansion | ||
if layer_id<=1: | ||
self.ghost1 = GhostModuleV2(in_chs, mid_chs, relu=True,mode='original',args=args) | ||
else: | ||
self.ghost1 = GhostModuleV2(in_chs, mid_chs, relu=True,mode='attn',args=args) | ||
|
||
# Depth-wise convolution | ||
if self.stride > 1: | ||
self.conv_dw = nn.Conv2d(mid_chs, mid_chs, dw_kernel_size, stride=stride, | ||
padding=(dw_kernel_size-1)//2,groups=mid_chs, bias=False) | ||
self.bn_dw = nn.BatchNorm2d(mid_chs) | ||
|
||
# Squeeze-and-excitation | ||
if has_se: | ||
self.se = SqueezeExcite(mid_chs, se_ratio=se_ratio) | ||
else: | ||
self.se = None | ||
|
||
self.ghost2 = GhostModuleV2(mid_chs, out_chs, relu=False,mode='original',args=args) | ||
|
||
# shortcut | ||
if (in_chs == out_chs and self.stride == 1): | ||
self.shortcut = nn.Sequential() | ||
else: | ||
self.shortcut = nn.Sequential( | ||
nn.Conv2d(in_chs, in_chs, dw_kernel_size, stride=stride, | ||
padding=(dw_kernel_size-1)//2, groups=in_chs, bias=False), | ||
nn.BatchNorm2d(in_chs), | ||
nn.Conv2d(in_chs, out_chs, 1, stride=1, padding=0, bias=False), | ||
nn.BatchNorm2d(out_chs), | ||
) | ||
def forward(self, x): | ||
residual = x | ||
x = self.ghost1(x) | ||
if self.stride > 1: | ||
x = self.conv_dw(x) | ||
x = self.bn_dw(x) | ||
if self.se is not None: | ||
x = self.se(x) | ||
x = self.ghost2(x) | ||
x += self.shortcut(residual) | ||
return x | ||
|
||
|
||
class GhostNetV2(nn.Module): | ||
def __init__(self, cfgs, num_classes=1000, width=1.0, dropout=0.2,block=GhostBottleneckV2,args=None): | ||
super(GhostNetV2, self).__init__() | ||
self.cfgs = cfgs | ||
self.dropout = dropout | ||
|
||
# building first layer | ||
output_channel = _make_divisible(16 * width, 4) | ||
self.conv_stem = nn.Conv2d(3, output_channel, 3, 2, 1, bias=False) | ||
self.bn1 = nn.BatchNorm2d(output_channel) | ||
self.act1 = nn.ReLU(inplace=True) | ||
input_channel = output_channel | ||
|
||
# building inverted residual blocks | ||
stages = [] | ||
#block = block | ||
layer_id=0 | ||
for cfg in self.cfgs: | ||
layers = [] | ||
for k, exp_size, c, se_ratio, s in cfg: | ||
output_channel = _make_divisible(c * width, 4) | ||
hidden_channel = _make_divisible(exp_size * width, 4) | ||
if block==GhostBottleneckV2: | ||
layers.append(block(input_channel, hidden_channel, output_channel, k, s, | ||
se_ratio=se_ratio,layer_id=layer_id,args=args)) | ||
input_channel = output_channel | ||
layer_id+=1 | ||
stages.append(nn.Sequential(*layers)) | ||
|
||
output_channel = _make_divisible(exp_size * width, 4) | ||
stages.append(nn.Sequential(ConvBnAct(input_channel, output_channel, 1))) | ||
input_channel = output_channel | ||
|
||
self.blocks = nn.Sequential(*stages) | ||
|
||
# building last several layers | ||
output_channel = 1280 | ||
self.global_pool = nn.AdaptiveAvgPool2d((1, 1)) | ||
self.conv_head = nn.Conv2d(input_channel, output_channel, 1, 1, 0, bias=True) | ||
self.act2 = nn.ReLU(inplace=True) | ||
self.classifier = nn.Linear(output_channel, num_classes) | ||
|
||
def forward(self, x): | ||
x = self.conv_stem(x) | ||
x = self.bn1(x) | ||
x = self.act1(x) | ||
x = self.blocks(x) | ||
x = self.global_pool(x) | ||
x = self.conv_head(x) | ||
x = self.act2(x) | ||
x = x.view(x.size(0), -1) | ||
if self.dropout > 0.: | ||
x = F.dropout(x, p=self.dropout, training=self.training) | ||
x = self.classifier(x) | ||
return x | ||
|
||
@register_model | ||
def ghostnetv2(**kwargs): | ||
cfgs = [ | ||
# k, t, c, SE, s | ||
[[3, 16, 16, 0, 1]], | ||
[[3, 48, 24, 0, 2]], | ||
[[3, 72, 24, 0, 1]], | ||
[[5, 72, 40, 0.25, 2]], | ||
[[5, 120, 40, 0.25, 1]], | ||
[[3, 240, 80, 0, 2]], | ||
[[3, 200, 80, 0, 1], | ||
[3, 184, 80, 0, 1], | ||
[3, 184, 80, 0, 1], | ||
[3, 480, 112, 0.25, 1], | ||
[3, 672, 112, 0.25, 1] | ||
], | ||
[[5, 672, 160, 0.25, 2]], | ||
[[5, 960, 160, 0, 1], | ||
[5, 960, 160, 0.25, 1], | ||
[5, 960, 160, 0, 1], | ||
[5, 960, 160, 0.25, 1] | ||
] | ||
] | ||
return GhostNetV2(cfgs, num_classes=kwargs['num_classes'], | ||
width=kwargs['width'], | ||
dropout=kwargs['dropout'], | ||
args=kwargs['args']) |
Oops, something went wrong.