-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhance] Improve CPE performance by reduce memory copy. #762
[Enhance] Improve CPE performance by reduce memory copy. #762
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #762 +/- ##
==========================================
+ Coverage 86.95% 86.98% +0.03%
==========================================
Files 127 127
Lines 8071 8068 -3
Branches 1390 1389 -1
==========================================
Hits 7018 7018
+ Misses 848 845 -3
Partials 205 205
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* [Enhance] Add extra dataloader settings in configs. (open-mmlab#752) * Use `train_dataloader`, `val_dataloader` and `test_dataloader` settings in the `data` field to specify different arguments. * Fix bug * Fix bug * [Enhance] Improve CPE performance by reduce memory copy. (open-mmlab#762) * [Feature] Support resize relative position embedding in `SwinTransformer`. (open-mmlab#749) * [Feature]: Add resize rel pos embed * [Refactor]: Create a separated resize_rel_pos_bias_table func * [Refactor]: Refactor rel pos embed bias * [Refactor]: Move interpolate into func * Remove index buffer only when window_size changes Co-authored-by: mzr1996 <[email protected]> * [Feature] Add PoolFormer backbone and checkpoints. (open-mmlab#746) * add PoolFormer * fix some typos in PoolFormer * fix lint error * modify out_indices and gap * fix typo * fix lint * fix typo * fix typo in poolforemr README * fix lint * Update some paths * Refactor freeze_stages method * Add unit tests * Fix lint Co-authored-by: mzr1996 <[email protected]> * Bump version to v0.22.1 (open-mmlab#785) * [Docs] Refine API reference. (open-mmlab#774) * [Docs] Refine API reference * Add PoolFormer * [Docs] Fix docs. * [Enhance] Reduce the memory usage of unit tests for Swin-Transformer. (open-mmlab#759) * [Feature] Support VAN. (open-mmlab#739) * add van * fix config * add metafile * add test * model convert script * fix review * fix lint * fix the configs and improve docs * rm debug lines * add VAN into api Co-authored-by: Yu Zhaohui <[email protected]> * [Feature] Support DenseNet. (open-mmlab#750) * init add densenet implementation * Add config and converted models * update meta * add test for memory efficient * Add docs * add doc for jit * Update checkpoint path * Update readthedocs Co-authored-by: mzr1996 <[email protected]> * [Fix] Use symbolic link in the API reference of Chinese docs. * [Enhance] Support training on IPU and add fine-tuning configs of ViT. (open-mmlab#723) * implement training and evaluation on IPU * fp16 SOTA * Tput reaches 5600 * 123 * add poptorch dataloder * change ipu_replicas to ipu-replicas * add noqa to config long line(website) * remove ipu dataloder test code * del one blank line in test_builder * refine the dataloder initialization * fix a typo * refine args for dataloder * remove an annoted line * process one more conflict * adjust code structure in mmcv.ipu * adjust ipu code structure in mmcv * IPUDataloader to IPUDataLoader * align with mmcv * adjust according to mmcv * mmcv code structre fixed Co-authored-by: hudi <[email protected]> * [Fix] Fix lint and mmcv version requirement for IPU. * Bump version to v0.23.0 (open-mmlab#809) * Refacoter Wandb hook and refine docstring Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]>
…Biases Integration) (#764) * wandb integration * visualize using wandb tables * wandb tables enhanced * Refactor MMClsWandbHook (#1) * [Enhance] Add extra dataloader settings in configs. (#752) * Use `train_dataloader`, `val_dataloader` and `test_dataloader` settings in the `data` field to specify different arguments. * Fix bug * Fix bug * [Enhance] Improve CPE performance by reduce memory copy. (#762) * [Feature] Support resize relative position embedding in `SwinTransformer`. (#749) * [Feature]: Add resize rel pos embed * [Refactor]: Create a separated resize_rel_pos_bias_table func * [Refactor]: Refactor rel pos embed bias * [Refactor]: Move interpolate into func * Remove index buffer only when window_size changes Co-authored-by: mzr1996 <[email protected]> * [Feature] Add PoolFormer backbone and checkpoints. (#746) * add PoolFormer * fix some typos in PoolFormer * fix lint error * modify out_indices and gap * fix typo * fix lint * fix typo * fix typo in poolforemr README * fix lint * Update some paths * Refactor freeze_stages method * Add unit tests * Fix lint Co-authored-by: mzr1996 <[email protected]> * Bump version to v0.22.1 (#785) * [Docs] Refine API reference. (#774) * [Docs] Refine API reference * Add PoolFormer * [Docs] Fix docs. * [Enhance] Reduce the memory usage of unit tests for Swin-Transformer. (#759) * [Feature] Support VAN. (#739) * add van * fix config * add metafile * add test * model convert script * fix review * fix lint * fix the configs and improve docs * rm debug lines * add VAN into api Co-authored-by: Yu Zhaohui <[email protected]> * [Feature] Support DenseNet. (#750) * init add densenet implementation * Add config and converted models * update meta * add test for memory efficient * Add docs * add doc for jit * Update checkpoint path * Update readthedocs Co-authored-by: mzr1996 <[email protected]> * [Fix] Use symbolic link in the API reference of Chinese docs. * [Enhance] Support training on IPU and add fine-tuning configs of ViT. (#723) * implement training and evaluation on IPU * fp16 SOTA * Tput reaches 5600 * 123 * add poptorch dataloder * change ipu_replicas to ipu-replicas * add noqa to config long line(website) * remove ipu dataloder test code * del one blank line in test_builder * refine the dataloder initialization * fix a typo * refine args for dataloder * remove an annoted line * process one more conflict * adjust code structure in mmcv.ipu * adjust ipu code structure in mmcv * IPUDataloader to IPUDataLoader * align with mmcv * adjust according to mmcv * mmcv code structre fixed Co-authored-by: hudi <[email protected]> * [Fix] Fix lint and mmcv version requirement for IPU. * Bump version to v0.23.0 (#809) * Refacoter Wandb hook and refine docstring Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]> * shuffle val data * minor updates * minor fix Co-authored-by: Ma Zerun <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]>
…Biases Integration) (open-mmlab#764) * wandb integration * visualize using wandb tables * wandb tables enhanced * Refactor MMClsWandbHook (open-mmlab#1) * [Enhance] Add extra dataloader settings in configs. (open-mmlab#752) * Use `train_dataloader`, `val_dataloader` and `test_dataloader` settings in the `data` field to specify different arguments. * Fix bug * Fix bug * [Enhance] Improve CPE performance by reduce memory copy. (open-mmlab#762) * [Feature] Support resize relative position embedding in `SwinTransformer`. (open-mmlab#749) * [Feature]: Add resize rel pos embed * [Refactor]: Create a separated resize_rel_pos_bias_table func * [Refactor]: Refactor rel pos embed bias * [Refactor]: Move interpolate into func * Remove index buffer only when window_size changes Co-authored-by: mzr1996 <[email protected]> * [Feature] Add PoolFormer backbone and checkpoints. (open-mmlab#746) * add PoolFormer * fix some typos in PoolFormer * fix lint error * modify out_indices and gap * fix typo * fix lint * fix typo * fix typo in poolforemr README * fix lint * Update some paths * Refactor freeze_stages method * Add unit tests * Fix lint Co-authored-by: mzr1996 <[email protected]> * Bump version to v0.22.1 (open-mmlab#785) * [Docs] Refine API reference. (open-mmlab#774) * [Docs] Refine API reference * Add PoolFormer * [Docs] Fix docs. * [Enhance] Reduce the memory usage of unit tests for Swin-Transformer. (open-mmlab#759) * [Feature] Support VAN. (open-mmlab#739) * add van * fix config * add metafile * add test * model convert script * fix review * fix lint * fix the configs and improve docs * rm debug lines * add VAN into api Co-authored-by: Yu Zhaohui <[email protected]> * [Feature] Support DenseNet. (open-mmlab#750) * init add densenet implementation * Add config and converted models * update meta * add test for memory efficient * Add docs * add doc for jit * Update checkpoint path * Update readthedocs Co-authored-by: mzr1996 <[email protected]> * [Fix] Use symbolic link in the API reference of Chinese docs. * [Enhance] Support training on IPU and add fine-tuning configs of ViT. (open-mmlab#723) * implement training and evaluation on IPU * fp16 SOTA * Tput reaches 5600 * 123 * add poptorch dataloder * change ipu_replicas to ipu-replicas * add noqa to config long line(website) * remove ipu dataloder test code * del one blank line in test_builder * refine the dataloder initialization * fix a typo * refine args for dataloder * remove an annoted line * process one more conflict * adjust code structure in mmcv.ipu * adjust ipu code structure in mmcv * IPUDataloader to IPUDataLoader * align with mmcv * adjust according to mmcv * mmcv code structre fixed Co-authored-by: hudi <[email protected]> * [Fix] Fix lint and mmcv version requirement for IPU. * Bump version to v0.23.0 (open-mmlab#809) * Refacoter Wandb hook and refine docstring Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]> * shuffle val data * minor updates * minor fix Co-authored-by: Ma Zerun <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Weihao Yu <[email protected]> Co-authored-by: takuoko <[email protected]> Co-authored-by: Yu Zhaohui <[email protected]> Co-authored-by: Hubert <[email protected]> Co-authored-by: Hu Di <[email protected]> Co-authored-by: hudi <[email protected]>
Motivation
For position_encoding, the self.proj's input is always a non-contiguous tensor, there has a performance issue when self.stride == 1, because PyTorch always has a contiguous conversion when convolution's input is not a contiguous tensor(https://github.com/pytorch/pytorch/blob/fa09099ba35fcd42347732ca3a5f8ddaf145da1b/aten/src/ATen/native/Convolution.cpp#L1093), and add also has a similar issue(the input from convolution is a contiguous tensor, but another input is non-contiguous tensor).
Modification
This PR will do pre contiguous conversion before doing convolution and add computing.