Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add vis-cam tool #577

Merged
merged 37 commits into from
Dec 23, 2021
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
7841e27
add cam-grad tool
Ezra-Yu Nov 20, 2021
6e3e5c6
Merge branch 'open-mmlab:master' into cam
Ezra-Yu Nov 24, 2021
1636df9
refactor cam-grad tool
Ezra-Yu Nov 24, 2021
f23c1b2
Merge branch 'open-mmlab:master' into cam
Ezra-Yu Nov 29, 2021
86a3f57
add docs
Ezra-Yu Nov 29, 2021
a955b45
update docs
Ezra-Yu Nov 30, 2021
6f2fbec
Merge branch 'cam' of github.com:Ezra-Yu/mmclassification into cam
Ezra-Yu Nov 30, 2021
f2cb8b2
Update docs and support Transformer
Ezra-Yu Dec 2, 2021
b6ce116
Merge branch 'open-mmlab:master' into cam
Ezra-Yu Dec 2, 2021
44ad8eb
remove pictures and use link
Ezra-Yu Dec 2, 2021
0bb57dd
Merge branch 'cam' of github.com:Ezra-Yu/mmclassification into cam
Ezra-Yu Dec 2, 2021
e8b2368
replace example img and finish EN docs
Ezra-Yu Dec 3, 2021
f2653c1
improve docs
Ezra-Yu Dec 7, 2021
271040c
improve code
Ezra-Yu Dec 7, 2021
a1c93a1
Merge remote-tracking branch 'origin/master' into cam
Ezra-Yu Dec 7, 2021
780ea76
Fix MobileNet V3 configs
mzr1996 Dec 8, 2021
569a72e
Refactor to support more powerful feature extraction.
mzr1996 Dec 8, 2021
998fa0a
Add unit tests
mzr1996 Dec 8, 2021
c32a4a4
Fix unit test
mzr1996 Dec 8, 2021
a9293dd
merge feature extraction
Ezra-Yu Dec 13, 2021
d82276b
fix distortion of visualization exapmles in docs
Ezra-Yu Dec 13, 2021
cbd2b7c
fix distortion
Ezra-Yu Dec 13, 2021
28900e7
fix distortion
Ezra-Yu Dec 13, 2021
3474cb1
fix distortion
Ezra-Yu Dec 13, 2021
2e5d6ba
merge master
Ezra-Yu Dec 20, 2021
03bc739
merge master
Ezra-Yu Dec 20, 2021
7a2152f
merge fix conficts
Ezra-Yu Dec 20, 2021
ba1ddfd
Imporve the tool
mzr1996 Dec 22, 2021
f38af5f
Support use both attribute name and index to get layer
mzr1996 Dec 22, 2021
0a6e046
add default get_target-layers
Ezra-Yu Dec 22, 2021
76afdec
Merge branch 'cam' of github.com:Ezra-Yu/mmclassification into cam
Ezra-Yu Dec 22, 2021
40490ce
add default get_target-layers
Ezra-Yu Dec 22, 2021
fd51580
update docs
Ezra-Yu Dec 22, 2021
e3cad26
update docs
Ezra-Yu Dec 22, 2021
e2874a9
add additional printt info when not using target-layers
Ezra-Yu Dec 22, 2021
3445afa
Imporve docs
mzr1996 Dec 23, 2021
9d5c053
Fix enumerate list.
mzr1996 Dec 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion configs/mobilenet_v3/mobilenet-v3-large_8xb32_in1k.py
Original file line number Diff line number Diff line change
@@ -17,7 +17,7 @@
# - modify: RandomErasing use RE-M instead of RE-0

_base_ = [
'../_base_/models/mobilenet-v3-large_8xb32_in1k.py',
'../_base_/models/mobilenet_v3_large_imagenet.py',
'../_base_/datasets/imagenet_bs32_pil_resize.py',
'../_base_/default_runtime.py'
]
Binary file added demo/bird.JPEG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/cat-dog.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/dog.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
192 changes: 180 additions & 12 deletions docs/en/tools/visualization.md
Original file line number Diff line number Diff line change
@@ -5,6 +5,7 @@
- [Visualization](#visualization)
- [Pipeline Visualization](#pipeline-visualization)
- [Learning Rate Schedule Visualization](#learning-rate-schedule-visualization)
- [Class Activation Map Visualization](#class-activation-map-visualization)
- [FAQs](#faqs)

<!-- TOC -->
@@ -52,27 +53,27 @@ python tools/visualizations/vis_pipeline.py \

1. Visualize all the transformed pictures of the `ImageNet` training set and display them in pop-up windows:

```shell
python ./tools/visualizations/vis_pipeline.py ./configs/resnet/resnet50_8xb32_in1k.py --show --mode pipeline
```
```shell
python ./tools/visualizations/vis_pipeline.py ./configs/resnet/resnet50_8xb32_in1k.py --show --mode pipeline
```

<div align=center><img src="../_static/image/tools/visualization/pipeline-pipeline.jpg" style=" width: auto; height: 40%; "></div>
<div align=center><img src="../_static/image/tools/visualization/pipeline-pipeline.jpg" style=" width: auto; height: 40%; "></div>

2. Visualize 10 comparison pictures in the `ImageNet` train set and save them in the `./tmp` folder:

```shell
python ./tools/visualizations/vis_pipeline.py configs/swin_transformer/swin_base_224_b16x64_300e_imagenet.py --phase train --output-dir tmp --number 10 --adaptive
```
```shell
python ./tools/visualizations/vis_pipeline.py configs/swin_transformer/swin_base_224_b16x64_300e_imagenet.py --phase train --output-dir tmp --number 10 --adaptive
```

<div align=center><img src="../_static/image/tools/visualization/pipeline-concat.jpg" style=" width: auto; height: 40%; "></div>
<div align=center><img src="../_static/image/tools/visualization/pipeline-concat.jpg" style=" width: auto; height: 40%; "></div>

3. Visualize 100 original pictures in the `CIFAR100` validation set, then display and save them in the `./tmp` folder:

```shell
python ./tools/visualizations/vis_pipeline.py configs/resnet/resnet50_8xb16_cifar100.py --phase val --output-dir tmp --mode original --number 100 --show --adaptive --bgr2rgb
```
```shell
python ./tools/visualizations/vis_pipeline.py configs/resnet/resnet50_8xb16_cifar100.py --phase val --output-dir tmp --mode original --number 100 --show --adaptive --bgr2rgb
```

<div align=center><img src="../_static/image/tools/visualization/pipeline-original.jpg" style=" width: auto; height: 40%; "></div>
<div align=center><img src="../_static/image/tools/visualization/pipeline-original.jpg" style=" width: auto; height: 40%; "></div>

## Learning Rate Schedule Visualization

@@ -119,6 +120,173 @@ python tools/visualizations/vis_lr.py configs/repvgg/repvgg-B3g4_4xb64-autoaug-l

<div align=center><img src="../_static/image/tools/visualization/lr_schedule2.png" style=" width: auto; height: 40%; "></div>

## Class Activation Map Visualization

MMClassification provides `tools\visualizations\vis_cam.py` tool to visualize class activation map. Please use `pip install "grad-cam>=1.3.6"` command to install [pytorch-grad-cam](https://github.com/jacobgil/pytorch-grad-cam).

The supported methods are as follows:

| Method | What it does |
|----------|--------------|
| GradCAM | Weight the 2D activations by the average gradient |
| GradCAM++ | Like GradCAM but uses second order gradients |
| XGradCAM | Like GradCAM but scale the gradients by the normalized activations |
| EigenCAM | Takes the first principle component of the 2D Activations (no class discrimination, but seems to give great results)|
| EigenGradCAM | Like EigenCAM but with class discrimination: First principle component of Activations\*Grad. Looks like GradCAM, but cleaner|
| LayerCAM | Spatially weight the activations by positive gradients. Works better especially in lower layers |

**Command**

```bash
python tools/visualizations/vis_cam.py \
${IMG} \
${CONFIG_FILE} \
${CHECKPOINT} \
[--target-layers ${TARGET-LAYERS}] \
[--preview-model] \
[--method ${METHOD}] \
[--target-category ${TARGET-CATEGORY}] \
[--save-path ${SAVE_PATH}] \
[--vit-like] \
[--num-extra-tokens ${NUM-EXTRA-TOKENS}]
[--aug_smooth] \
[--eigen_smooth] \
[--device ${DEVICE}] \
[--cfg-options ${CFG-OPTIONS}]
```

**Description of all arguments**

- `img` : The target picture path.
- `config` : The path of the model config file.
- `checkpoint` : The path of the checkpoint.
- `--target-layers` : The target layers to get activation maps, one or more network layers can be specified. If not set, use the norm layer of the last block.
- `--preview-model` : Whether to print all network layer names in the model.
- `--method` : Visualization method, supports `GradCAM`, `GradCAM++`, `XGradCAM`, `EigenCAM`, `EigenGradCAM`, `LayerCAM`, which is case insensitive. Defaults to `GradCAM`.
- `--target-category` : Target category, if not set, use the category detected by the given model.
- `--save-path` : The path to save the CAM visualization image. If not set, the CAM image will not be saved.
- `--vit-like` : Whether the network is ViT-like network.
- `--num-extra-tokens` : The number of extra tokens in ViT-like backbones. If not set, use num_extra_tokens the backbone.
- `--aug_smooth` : Whether to use TTA(Test Time Augment) to get CAM.
- `--eigen_smooth` : Whether to use the principal component to reduce noise.
- `--device` : The computing device used. Default to 'cpu'.
- `--cfg-options` : Modifications to the configuration file, refer to [Tutorial 1: Learn about Configs](https://mmclassification.readthedocs.io/en/latest/tutorials/config.html).

```{note}
The argument `--preview-model` can view all network layers names in the given model. It will be helpful if you know nothing about the model layers when setting `--target-layers`.
```

**Examples(CNN)**

Here are some examples of `target-layers` in ResNet-50, which can be any module or layer:

- `'backbone.layer4'` means the output of the forth ResLayer.
- `'backbone.layer4.2'` means the output of the third BottleNeck block in the forth ResLayer.
- `'backbone.layer4.2.conv1'` means the output of the `conv1` layer in above BottleNeck block.

```{note}
For `ModuleList` or `Sequential`, you can also use the index to specify which sub-module is the target layer.
For example, the `backbone.layer4[-1]` is the same as `backbone.layer4.2` since `layer4` is a `Sequential` with three sub-modules.
```

1. Use different methods to visualize CAM for `ResNet50`, the `target-category` is the predicted result by the given checkpoint, using the default `target-layers`.

```shell
python tools/visualizations/vis_cam.py \
demo/bird.JPEG \
configs/resnet/resnet50_8xb32_in1k.py \
https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth \
--method GradCAM
# GradCAM++, XGradCAM, EigenCAM, EigenGradCAM, LayerCAM
```

| Image | GradCAM | GradCAM++ | EigenGradCAM | LayerCAM |
|-------|----------|------------|-------------- |------------|
| <div align=center><img src='https://user-images.githubusercontent.com/18586273/144429496-628d3fb3-1f6e-41ff-aa5c-1b08c60c32a9.JPEG' height="auto" width="160" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/147065002-f1c86516-38b2-47ba-90c1-e00b49556c70.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/147065119-82581fa1-3414-4d6c-a849-804e1503c74b.jpg' height="auto" width="150"></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/147065096-75a6a2c1-6c57-4789-ad64-ebe5e38765f4.jpg' height="auto" width="150"></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/147065129-814d20fb-98be-4106-8c5e-420adcc85295.jpg' height="auto" width="150"></div> |

2. Use different `target-category` to get CAM from the same picture. In `ImageNet` dataset, the category 238 is 'Greater Swiss Mountain dog', the category 281 is 'tabby, tabby cat'.

```shell
python tools/visualizations/vis_cam.py \
demo/cat-dog.png configs/resnet/resnet50_8xb32_in1k.py \
https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth \
--target-layers 'backbone.layer4.2' \
--method GradCAM \
--target-category 238
# --target-category 281
```

| Category | Image | GradCAM | XGradCAM | LayerCAM |
| --------- |-------|----------|-------------- |------------|
| Dog | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144429526-f27f4cce-89b9-4117-bfe6-55c2ca7eaba6.png' height="auto" width="165" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144433562-968a57bc-17d9-413e-810e-f91e334d648a.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144433853-319f3a8f-95f2-446d-b84f-3028daca5378.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144433937-daef5a69-fd70-428f-98a3-5e7747f4bb88.jpg' height="auto" width="150" ></div> |
| Cat | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144429526-f27f4cce-89b9-4117-bfe6-55c2ca7eaba6.png' height="auto" width="165" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144434518-867ae32a-1cb5-4dbd-b1b9-5e375e94ea48.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144434603-0a2fd9ec-c02e-4e6c-a17b-64c234808c56.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144434623-b4432cc2-c663-4b97-aed3-583d9d3743e6.jpg' height="auto" width="150" ></div> |

3. Use `--eigen-smooth` and `--aug-smooth` to improve visual effects.

```shell
python tools/visualizations/vis_cam.py \
demo/dog.jpg \
configs/mobilenet_v3/mobilenet-v3-large_8xb32_in1k.py \
https://download.openmmlab.com/mmclassification/v0/mobilenet_v3/convert/mobilenet_v3_large-3ea3c186.pth \
--target-layers 'backbone.layer16' \
--method LayerCAM \
--eigen-smooth --aug-smooth
```

| Image | LayerCAM | eigen-smooth | aug-smooth | eigen&aug |
|-------|----------|------------|-------------- |------------|
| <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557492-98ac5ce0-61f9-4da9-8ea7-396d0b6a20fa.jpg' height="auto" width="160"></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557541-a4cf7d86-7267-46f9-937c-6f657ea661b4.jpg' height="auto" width="145" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557547-2731b53e-e997-4dd2-a092-64739cc91959.jpg' height="auto" width="145" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557545-8189524a-eb92-4cce-bf6a-760cab4a8065.jpg' height="auto" width="145" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557548-c1e3f3ec-3c96-43d4-874a-3b33cd3351c5.jpg' height="auto" width="145" ></div> |

**Examples(Transformer)**

Here are some examples:

- `'backbone.norm3'` for Swin-Transformer;
- `'backbone.layers[-1].ln1'` for ViT;

For ViT-like networks, such as ViT, T2T-ViT and Swin-Transformer, the features are flattened. And for drawing the CAM, we need to specify the `--vit-like` argument to reshape the features into square feature maps.

Besides the flattened features, some ViT-like networks also add extra tokens like the class token in ViT and T2T-ViT, and the distillation token in DeiT. In these networks, the final classification is done on the tokens computed in the last attention block, and therefore, the classification score will not be affected by other features and the gradient of the classification score with respect to them, will be zero. Therefore, you shouldn't use the output of the last attention block as the target layer in these networks.

To exclude these extra tokens, we need know the number of extra tokens. Almost all transformer-based backbones in MMClassification have the `num_extra_tokens` attribute. If you want to use this tool in a new or third-party network that don't have the `num_extra_tokens` attribute, please specify it the `--num-extra-tokens` argument.

1. Visualize CAM for `Swin Transformer`, using default `target-layers`:

```shell
python tools/visualizations/vis_cam.py \
demo/bird.JPEG \
configs/swin_transformer/swin-tiny_16xb64_in1k.py \
https://download.openmmlab.com/mmclassification/v0/swin-transformer/swin_tiny_224_b16x64_300e_imagenet_20210616_090925-66df6be6.pth \
--vit-like
```

2. Visualize CAM for `Vision Transformer(ViT)`:

```shell
python tools/visualizations/vis_cam.py \
demo/bird.JPEG \
configs/vision_transformer/vit-base-p16_ft-64xb64_in1k-384.py \
https://download.openmmlab.com/mmclassification/v0/vit/finetune/vit-base-p16_in21k-pre-3rdparty_ft-64xb64_in1k-384_20210928-98e8652b.pth \
--vit-like \
--target-layers 'backbone.layers[-1].ln1'
```

3. Visualize CAM for `T2T-ViT`:

```shell
python tools/visualizations/vis_cam.py \
demo/bird.JPEG \
configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py \
https://download.openmmlab.com/mmclassification/v0/t2t-vit/t2t-vit-t-14_3rdparty_8xb64_in1k_20210928-b7c09b62.pth \
--vit-like \
--target-layers 'backbone.encoder[-1].ln1'
```

| Image | ResNet50 | ViT | Swin | T2T-ViT |
|-------|----------|------------|-------------- |------------|
| <div align=center><img src='https://user-images.githubusercontent.com/18586273/144429496-628d3fb3-1f6e-41ff-aa5c-1b08c60c32a9.JPEG' height="auto" width="165" ></div> | <div align=center><img src=https://user-images.githubusercontent.com/18586273/144431491-a2e19fe3-5c12-4404-b2af-a9552f5a95d9.jpg height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144436218-245a11de-6234-4852-9c08-ff5069f6a739.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144436168-01b0e565-442c-4e1e-910c-17c62cff7cd3.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144436198-51dbfbda-c48d-48cc-ae06-1a923d19b6f6.jpg' height="auto" width="150" ></div> |

## FAQs

- None
Loading