[Enhance] Improve CPE performance by reduce memory copy. #762

XiaobingSuper · 2022-04-02T01:55:15Z

Motivation

For position_encoding, the self.proj's input is always a non-contiguous tensor, there has a performance issue when self.stride == 1, because PyTorch always has a contiguous conversion when convolution's input is not a contiguous tensor(https://github.com/pytorch/pytorch/blob/fa09099ba35fcd42347732ca3a5f8ddaf145da1b/aten/src/ATen/native/Convolution.cpp#L1093), and add also has a similar issue(the input from convolution is a contiguous tensor, but another input is non-contiguous tensor).

Modification

This PR will do pre contiguous conversion before doing convolution and add computing.

CLAassistant · 2022-04-02T01:55:38Z

All committers have signed the CLA.

codecov · 2022-04-02T02:38:22Z

Codecov Report

Merging #762 (23afbca) into dev (02c8f82) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #762      +/-   ##
==========================================
+ Coverage   86.95%   86.98%   +0.03%     
==========================================
  Files         127      127              
  Lines        8071     8068       -3     
  Branches     1390     1389       -1     
==========================================
  Hits         7018     7018              
+ Misses        848      845       -3     
  Partials      205      205

Flag	Coverage Δ
unittests	`86.89% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcls/models/utils/position_encoding.py	`100.00% <100.00%> (ø)`
mmcls/apis/train.py	`19.17% <0.00%> (+0.75%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a67bc8...23afbca. Read the comment docs.

Ezra-Yu

LGTM.

* [Enhance] Add extra dataloader settings in configs. (open-mmlab#752) * Use `train_dataloader`, `val_dataloader` and `test_dataloader` settings in the `data` field to specify different arguments. * Fix bug * Fix bug * [Enhance] Improve CPE performance by reduce memory copy. (open-mmlab#762) * [Feature] Support resize relative position embedding in `SwinTransformer`. (open-mmlab#749) * [Feature]: Add resize rel pos embed * [Refactor]: Create a separated resize_rel_pos_bias_table func * [Refactor]: Refactor rel pos embed bias * [Refactor]: Move interpolate into func * Remove index buffer only when window_size changes Co-authored-by: mzr1996 <[email protected]> * [Feature] Add PoolFormer backbone and checkpoints. (open-mmlab#746) * add PoolFormer * fix some typos in PoolFormer * fix lint error * modify out_indices and gap * fix typo * fix lint * fix typo * fix typo in poolforemr README * fix lint * Update some paths * Refactor freeze_stages method * Add unit tests * Fix lint Co-authored-by: mzr1996 <[email protected]> * Bump version to v0.22.1 (open-mmlab#785) * [Docs] Refine API reference. (open-mmlab#774) * [Docs] Refine API reference * Add PoolFormer * [Docs] Fix docs. * [Enhance] Reduce the memory usage of unit tests for Swin-Transformer. (open-mmlab#759) * [Feature] Support VAN. (open-mmlab#739) * add van * fix config * add metafile * add test * model convert script * fix review * fix lint * fix the configs and improve docs * rm debug lines * add VAN into api Co-authored-by: Yu Zhaohui <[email protected]> * [Feature] Support DenseNet. (open-mmlab#750) * init add densenet implementation * Add config and converted models * update meta * add test for memory efficient * Add docs * add doc for jit * Update checkpoint path * Update readthedocs Co-authored-by: mzr1996 <[email protected]> * [Fix] Use symbolic link in the API reference of Chinese docs. * [Enhance] Support training on IPU and add fine-tuning configs of ViT. (open-mmlab#723) * implement training and evaluation on IPU * fp16 SOTA * Tput reaches 5600 * 123 * add poptorch dataloder * change ipu_replicas to ipu-replicas * add noqa to config long line(website) * remove ipu dataloder test code * del one blank line in test_builder * refine the dataloder initialization * fix a typo * refine args for dataloder * remove an annoted line * process one more conflict * adjust code structure in mmcv.ipu * adjust ipu code structure in mmcv * IPUDataloader to IPUDataLoader * align with mmcv * adjust according to mmcv * mmcv code structre fixed Co-authored-by: hudi <[email protected]> * [Fix] Fix lint and mmcv version requirement for IPU. * Bump version to v0.23.0 (open-mmlab#809) * Refacoter Wandb hook and refine docstring Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]>

…Biases Integration) (#764) * wandb integration * visualize using wandb tables * wandb tables enhanced * Refactor MMClsWandbHook (#1) * [Enhance] Add extra dataloader settings in configs. (#752) * Use `train_dataloader`, `val_dataloader` and `test_dataloader` settings in the `data` field to specify different arguments. * Fix bug * Fix bug * [Enhance] Improve CPE performance by reduce memory copy. (#762) * [Feature] Support resize relative position embedding in `SwinTransformer`. (#749) * [Feature]: Add resize rel pos embed * [Refactor]: Create a separated resize_rel_pos_bias_table func * [Refactor]: Refactor rel pos embed bias * [Refactor]: Move interpolate into func * Remove index buffer only when window_size changes Co-authored-by: mzr1996 <[email protected]> * [Feature] Add PoolFormer backbone and checkpoints. (#746) * add PoolFormer * fix some typos in PoolFormer * fix lint error * modify out_indices and gap * fix typo * fix lint * fix typo * fix typo in poolforemr README * fix lint * Update some paths * Refactor freeze_stages method * Add unit tests * Fix lint Co-authored-by: mzr1996 <[email protected]> * Bump version to v0.22.1 (#785) * [Docs] Refine API reference. (#774) * [Docs] Refine API reference * Add PoolFormer * [Docs] Fix docs. * [Enhance] Reduce the memory usage of unit tests for Swin-Transformer. (#759) * [Feature] Support VAN. (#739) * add van * fix config * add metafile * add test * model convert script * fix review * fix lint * fix the configs and improve docs * rm debug lines * add VAN into api Co-authored-by: Yu Zhaohui <[email protected]> * [Feature] Support DenseNet. (#750) * init add densenet implementation * Add config and converted models * update meta * add test for memory efficient * Add docs * add doc for jit * Update checkpoint path * Update readthedocs Co-authored-by: mzr1996 <[email protected]> * [Fix] Use symbolic link in the API reference of Chinese docs. * [Enhance] Support training on IPU and add fine-tuning configs of ViT. (#723) * implement training and evaluation on IPU * fp16 SOTA * Tput reaches 5600 * 123 * add poptorch dataloder * change ipu_replicas to ipu-replicas * add noqa to config long line(website) * remove ipu dataloder test code * del one blank line in test_builder * refine the dataloder initialization * fix a typo * refine args for dataloder * remove an annoted line * process one more conflict * adjust code structure in mmcv.ipu * adjust ipu code structure in mmcv * IPUDataloader to IPUDataLoader * align with mmcv * adjust according to mmcv * mmcv code structre fixed Co-authored-by: hudi <[email protected]> * [Fix] Fix lint and mmcv version requirement for IPU. * Bump version to v0.23.0 (#809) * Refacoter Wandb hook and refine docstring Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]> * shuffle val data * minor updates * minor fix Co-authored-by: Ma Zerun <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]>

)

…Biases Integration) (open-mmlab#764) * wandb integration * visualize using wandb tables * wandb tables enhanced * Refactor MMClsWandbHook (open-mmlab#1) * [Enhance] Add extra dataloader settings in configs. (open-mmlab#752) * Use `train_dataloader`, `val_dataloader` and `test_dataloader` settings in the `data` field to specify different arguments. * Fix bug * Fix bug * [Enhance] Improve CPE performance by reduce memory copy. (open-mmlab#762) * [Feature] Support resize relative position embedding in `SwinTransformer`. (open-mmlab#749) * [Feature]: Add resize rel pos embed * [Refactor]: Create a separated resize_rel_pos_bias_table func * [Refactor]: Refactor rel pos embed bias * [Refactor]: Move interpolate into func * Remove index buffer only when window_size changes Co-authored-by: mzr1996 <[email protected]> * [Feature] Add PoolFormer backbone and checkpoints. (open-mmlab#746) * add PoolFormer * fix some typos in PoolFormer * fix lint error * modify out_indices and gap * fix typo * fix lint * fix typo * fix typo in poolforemr README * fix lint * Update some paths * Refactor freeze_stages method * Add unit tests * Fix lint Co-authored-by: mzr1996 <[email protected]> * Bump version to v0.22.1 (open-mmlab#785) * [Docs] Refine API reference. (open-mmlab#774) * [Docs] Refine API reference * Add PoolFormer * [Docs] Fix docs. * [Enhance] Reduce the memory usage of unit tests for Swin-Transformer. (open-mmlab#759) * [Feature] Support VAN. (open-mmlab#739) * add van * fix config * add metafile * add test * model convert script * fix review * fix lint * fix the configs and improve docs * rm debug lines * add VAN into api Co-authored-by: Yu Zhaohui <[email protected]> * [Feature] Support DenseNet. (open-mmlab#750) * init add densenet implementation * Add config and converted models * update meta * add test for memory efficient * Add docs * add doc for jit * Update checkpoint path * Update readthedocs Co-authored-by: mzr1996 <[email protected]> * [Fix] Use symbolic link in the API reference of Chinese docs. * [Enhance] Support training on IPU and add fine-tuning configs of ViT. (open-mmlab#723) * implement training and evaluation on IPU * fp16 SOTA * Tput reaches 5600 * 123 * add poptorch dataloder * change ipu_replicas to ipu-replicas * add noqa to config long line(website) * remove ipu dataloder test code * del one blank line in test_builder * refine the dataloder initialization * fix a typo * refine args for dataloder * remove an annoted line * process one more conflict * adjust code structure in mmcv.ipu * adjust ipu code structure in mmcv * IPUDataloader to IPUDataLoader * align with mmcv * adjust according to mmcv * mmcv code structre fixed Co-authored-by: hudi <[email protected]> * [Fix] Fix lint and mmcv version requirement for IPU. * Bump version to v0.23.0 (open-mmlab#809) * Refacoter Wandb hook and refine docstring Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]> * shuffle val data * minor updates * minor fix Co-authored-by: Ma Zerun <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]>

improve position_encoding perfromance by reduce memory copy

23afbca

XiaobingSuper changed the title ~~improve position_encoding perfromance by reduce memory copy~~ improve position_encoding performance by reduce memory copy Apr 2, 2022

mzr1996 requested a review from Ezra-Yu April 2, 2022 02:32

Ezra-Yu changed the base branch from master to dev April 2, 2022 02:39

Ezra-Yu approved these changes Apr 2, 2022

View reviewed changes

mzr1996 changed the title ~~improve position_encoding performance by reduce memory copy~~ [Enhance] Improve CPE performance by reduce memory copy. Apr 2, 2022

mzr1996 merged commit 875195e into open-mmlab:dev Apr 2, 2022

XiaobingSuper deleted the xiaobing/reduce_memory_copy branch April 4, 2022 11:29

AlenLi817 mentioned this pull request May 16, 2022

The swin backbone model trained in train.py cannot be used in test.py #839

Closed

mzr1996 pushed a commit to mzr1996/mmpretrain that referenced this pull request Nov 24, 2022

[Enhance] Improve CPE performance by reduce memory copy. (open-mmlab#762

2473813

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhance] Improve CPE performance by reduce memory copy. #762

[Enhance] Improve CPE performance by reduce memory copy. #762

XiaobingSuper commented Apr 2, 2022

CLAassistant commented Apr 2, 2022 •

edited

Loading

codecov bot commented Apr 2, 2022 •

edited

Loading

Ezra-Yu left a comment

[Enhance] Improve CPE performance by reduce memory copy. #762

[Enhance] Improve CPE performance by reduce memory copy. #762

Conversation

XiaobingSuper commented Apr 2, 2022

Motivation

Modification

CLAassistant commented Apr 2, 2022 • edited Loading

codecov bot commented Apr 2, 2022 • edited Loading

Codecov Report

Ezra-Yu left a comment

Choose a reason for hiding this comment

CLAassistant commented Apr 2, 2022 •

edited

Loading

codecov bot commented Apr 2, 2022 •

edited

Loading