add benchmark regression test script with tmux #849

liqikai9 · 2021-08-10T05:59:02Z

Motivation

When releasing the new version of our codebase monthly or quarterly, we would like to conduct benchmark regression tests for the previously released models and algorithms, which can support different priorities.

The base feature is to read a config file containing a model list and runtime parameters, then run multiple tasks in different panes and windows controlled by tmux automatically.

The priority of the models is as follows. P0: core, P1: important, P2: less important, P3: least important. You can assign different priorities for each model and also decide the priority levels for inference and training tasks, respectively.

This script aims at running multiple benchmark regression tasks without the need to start lots of terminals manually and avoiding the possible inconvenience due to network interruption when running tasks on remote servers, which is quite common.

Modification

We added the folder .dev_scripts containing two files: benchmark_regression_cfg_tmpl.yaml and benchmark_regression.py. Besides, in order to specify the work-dir of the inference task, we added an additional argument --work-dir to the script $mmpose/tools/test.py and modified the code accordingly.

Arguments

The script is based on $mmpose/tools/slurm_test.sh and $mmpose/tools/slurm_train.sh. It supports running test and train tasks with custom priority and runtime setting parameters, which can be specified in the config file.

To run the script, a config file containing multiple models is required. For example, the benchmark_regression_cfg_tmpl.yaml under the directory $mmpose/.dev_scripts. The config file gives a template about different fields. It has a model_list field that contains different priorities. Under each priority level, there are multiple models.

Specifically, the config file must indicate model priorities and paths to the config file and the corresponding checkpoint file. For example,

model_list:
    P0: # the priority of the models, P0: core, P1: important, P2: less important, P3: least important
      -   config: configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py  # path to the config file
          checkpoint: https://download.openmmlab.com/mmpose/top_down/resnet/res50_coco_256x192-ec54d7f3_20200709.pth # path or url to the config file
        task_name: res50_coco_256x192 # the job name in slurm will be specified according to this field and the mode. If not specified, use the basename of the config file

        test:...

        train:...


    -    ...
      
    P1:...

The field priority like P0 and P1 is added so that you assign different priorities for different models. You can add more models as you need under the corresponding priority field.

For a more detailed description of the arguments, please refer to the script $mmpose/.dev_scripts/benchmark_regression.py.

Usage

Here is a simple example to run the script.

cd $mmpose
python ./.dev_scripts/benchmark_regression.py [--config ${/path/to/moel_list}] [--session-name ${SESSION_NAME}] [--priority ${TEST_PRIORITY} ${TRAIN_PRIORITY} ]

Note that the ${TEST_PRIORITY} and ${TRAIN_PRIORITY} give the largest number of priorities of test and train tasks, respectively.

Running the above script with default parameters, you will start a new tmux session with each pane running a task independently. Enjoy it!

…test tasks

codecov · 2021-08-10T06:19:17Z

Codecov Report

Merging #849 (5f3e95b) into master (24dbb01) will increase coverage by 0.05%.
The diff coverage is 96.92%.

❗ Current head 5f3e95b differs from pull request most recent head 704bbbf. Consider uploading reports for the commit 704bbbf to get more accurate results

@@            Coverage Diff             @@
##           master     #849      +/-   ##
==========================================
+ Coverage   83.59%   83.64%   +0.05%     
==========================================
  Files         176      178       +2     
  Lines       14145    14195      +50     
  Branches     2364     2367       +3     
==========================================
+ Hits        11824    11874      +50     
- Misses       1713     1714       +1     
+ Partials      608      607       -1

Flag	Coverage Δ
unittests	`83.57% <96.92%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmpose/models/heads/hmr_head.py	`100.00% <ø> (ø)`
mmpose/models/utils/smpl.py	`96.29% <96.29%> (ø)`
mmpose/models/__init__.py	`100.00% <100.00%> (ø)`
mmpose/models/builder.py	`100.00% <100.00%> (ø)`
mmpose/models/detectors/mesh.py	`90.75% <100.00%> (+0.04%)`	⬆️
mmpose/models/utils/__init__.py	`100.00% <100.00%> (ø)`
mmpose/datasets/pipelines/shared_transform.py	`88.50% <0.00%> (+0.50%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24dbb01...704bbbf. Read the comment docs.

.dev_scripts/config_list.yaml

.dev_scripts/test_benchmark_tmux.py

modify the config and rename the filename

modify the script and rename the filename

using mmcv.load to avoid introducing the extra dependency on yaml

…st_benchmark_script

.dev_scripts/benchmark_regression.py

.dev_scripts/benchmark_regression_cfg.yaml

.dev_scripts/benchmark_regression.py

ly015 · 2021-08-20T09:03:37Z

.dev_scripts/benchmark_regression_cfg.yaml

-                eval: mAP # evaluation metric, which depends on the dataset, e.g., "mAP" for MSCOCO
-                fuse-conv-bn:
-                gpu_collect:
+    P0: # the priority of the models, P0: core, P1: important, P2: less important, P3: least important


A few suggestions about the config file:

Rename this file "benchmark_regression_cfg_tmpl.yaml" or something, which serves as a template to show the full content that a config file could include. We will add a more compact config file with a full model list and only necessary arguments.

Use test instead of infer as the mode name.

gpus_per_node can be set to 8 for all models and modes.

Got these suggestions.

* Fix import and deprecation issues in unit tests (#871) * fix some bugs in the unit test of smpl model. * reorganize `tests/` to solve importing issue (PEP 420) * fix deprecation warnings in unit tests Co-authored-by: ly015 <[email protected]> * add benchmark regression test script with tmux (#849) * test the simple case using tmux to run multiple benchmark regression test tasks * modify and rename the config file and script * Delete config_list.yaml * modify the config and rename the filename * Delete test_benchmark_tmux.py * modify the script and rename the filename * Update setup.cfg * using mmcv.load to avoid introducing the extra dependency on yaml * fix some typo * refactor the config file and modify the script accordingly * modify the config and script * rename the config file * Correct dataset preparation guide of WFLW (#873) * add pr template (#875) * add CITATION.cff and update setup.py (#876) * Add copyright header and pre-commit hook (#872) * Add pre-commit hook to automatically add copyright file header * update files with copyright header * Limit copyright checking in the first 2 lines of a file * Exclude configs in demo/ * set max-header-lines as 5 * rebase to master and add copyright to new files * move benchmark_regression into .dev_scripts/benchmark * Translate tasks/2d_body_keypoint.md (#842) * 2rd PR remove poseval * fix lint * revise the CN version Co-authored-by: ly015 <[email protected]> * fix some bugs in the unit test of smpl model. * * reorganiz `tests/` to solve importing issue (PEP 420) * add dataset info * fix lint * * fix wrongly modified parts in previous rebase * fix lint * rename datasets/_base_ as datasets/base * resolve compatibility of pose_limb_color * Add dummy dataset base classes with old names for compatibility * * Rewrite relative unittest based on dataset_info * Add bc-breaking test for functions related to dataset_info * Rename DatasetInfo.dataset_info as DatasetInfo._dataset_info * Fix dataset_info of h36m dataset * Handle breaking change pose_limb_color -> pose_link_color * add unittest for old-fashioned dataset initialization without dataset_info * resolve naming conflict in unittests Co-authored-by: zengwang430521 <[email protected]> Co-authored-by: ly015 <[email protected]>

* add dataset info (#663) * Fix import and deprecation issues in unit tests (#871) * fix some bugs in the unit test of smpl model. * reorganize `tests/` to solve importing issue (PEP 420) * fix deprecation warnings in unit tests Co-authored-by: ly015 <[email protected]> * add benchmark regression test script with tmux (#849) * test the simple case using tmux to run multiple benchmark regression test tasks * modify and rename the config file and script * Delete config_list.yaml * modify the config and rename the filename * Delete test_benchmark_tmux.py * modify the script and rename the filename * Update setup.cfg * using mmcv.load to avoid introducing the extra dependency on yaml * fix some typo * refactor the config file and modify the script accordingly * modify the config and script * rename the config file * Correct dataset preparation guide of WFLW (#873) * add pr template (#875) * add CITATION.cff and update setup.py (#876) * Add copyright header and pre-commit hook (#872) * Add pre-commit hook to automatically add copyright file header * update files with copyright header * Limit copyright checking in the first 2 lines of a file * Exclude configs in demo/ * set max-header-lines as 5 * rebase to master and add copyright to new files * move benchmark_regression into .dev_scripts/benchmark * Translate tasks/2d_body_keypoint.md (#842) * 2rd PR remove poseval * fix lint * revise the CN version Co-authored-by: ly015 <[email protected]> * fix some bugs in the unit test of smpl model. * * reorganiz `tests/` to solve importing issue (PEP 420) * add dataset info * fix lint * * fix wrongly modified parts in previous rebase * fix lint * rename datasets/_base_ as datasets/base * resolve compatibility of pose_limb_color * Add dummy dataset base classes with old names for compatibility * * Rewrite relative unittest based on dataset_info * Add bc-breaking test for functions related to dataset_info * Rename DatasetInfo.dataset_info as DatasetInfo._dataset_info * Fix dataset_info of h36m dataset * Handle breaking change pose_limb_color -> pose_link_color * add unittest for old-fashioned dataset initialization without dataset_info * resolve naming conflict in unittests Co-authored-by: zengwang430521 <[email protected]> Co-authored-by: ly015 <[email protected]> * fix typo * fix typo Co-authored-by: Jas <[email protected]> Co-authored-by: zengwang430521 <[email protected]>

* test the simple case using tmux to run multiple benchmark regression test tasks * modify and rename the config file and script * Delete config_list.yaml * modify the config and rename the filename * Delete test_benchmark_tmux.py * modify the script and rename the filename * Update setup.cfg * using mmcv.load to avoid introducing the extra dependency on yaml * fix some typo * refactor the config file and modify the script accordingly * modify the config and script * rename the config file

* add dataset info (open-mmlab#663) * Fix import and deprecation issues in unit tests (open-mmlab#871) * fix some bugs in the unit test of smpl model. * reorganize `tests/` to solve importing issue (PEP 420) * fix deprecation warnings in unit tests Co-authored-by: ly015 <[email protected]> * add benchmark regression test script with tmux (open-mmlab#849) * test the simple case using tmux to run multiple benchmark regression test tasks * modify and rename the config file and script * Delete config_list.yaml * modify the config and rename the filename * Delete test_benchmark_tmux.py * modify the script and rename the filename * Update setup.cfg * using mmcv.load to avoid introducing the extra dependency on yaml * fix some typo * refactor the config file and modify the script accordingly * modify the config and script * rename the config file * Correct dataset preparation guide of WFLW (open-mmlab#873) * add pr template (open-mmlab#875) * add CITATION.cff and update setup.py (open-mmlab#876) * Add copyright header and pre-commit hook (open-mmlab#872) * Add pre-commit hook to automatically add copyright file header * update files with copyright header * Limit copyright checking in the first 2 lines of a file * Exclude configs in demo/ * set max-header-lines as 5 * rebase to master and add copyright to new files * move benchmark_regression into .dev_scripts/benchmark * Translate tasks/2d_body_keypoint.md (open-mmlab#842) * 2rd PR remove poseval * fix lint * revise the CN version Co-authored-by: ly015 <[email protected]> * fix some bugs in the unit test of smpl model. * * reorganiz `tests/` to solve importing issue (PEP 420) * add dataset info * fix lint * * fix wrongly modified parts in previous rebase * fix lint * rename datasets/_base_ as datasets/base * resolve compatibility of pose_limb_color * Add dummy dataset base classes with old names for compatibility * * Rewrite relative unittest based on dataset_info * Add bc-breaking test for functions related to dataset_info * Rename DatasetInfo.dataset_info as DatasetInfo._dataset_info * Fix dataset_info of h36m dataset * Handle breaking change pose_limb_color -> pose_link_color * add unittest for old-fashioned dataset initialization without dataset_info * resolve naming conflict in unittests Co-authored-by: zengwang430521 <[email protected]> Co-authored-by: ly015 <[email protected]> * fix typo * fix typo Co-authored-by: Jas <[email protected]> Co-authored-by: zengwang430521 <[email protected]>

…lab#849) * [Enhance] Ensure metrics is not empty when saving best ckpts * fix warn to warning * delete a unnecessary method

* test the simple case using tmux to run multiple benchmark regression test tasks * modify and rename the config file and script * Delete config_list.yaml * modify the config and rename the filename * Delete test_benchmark_tmux.py * modify the script and rename the filename * Update setup.cfg * using mmcv.load to avoid introducing the extra dependency on yaml * fix some typo * refactor the config file and modify the script accordingly * modify the config and script * rename the config file

* add dataset info (open-mmlab#663) * Fix import and deprecation issues in unit tests (open-mmlab#871) * fix some bugs in the unit test of smpl model. * reorganize `tests/` to solve importing issue (PEP 420) * fix deprecation warnings in unit tests Co-authored-by: ly015 <[email protected]> * add benchmark regression test script with tmux (open-mmlab#849) * test the simple case using tmux to run multiple benchmark regression test tasks * modify and rename the config file and script * Delete config_list.yaml * modify the config and rename the filename * Delete test_benchmark_tmux.py * modify the script and rename the filename * Update setup.cfg * using mmcv.load to avoid introducing the extra dependency on yaml * fix some typo * refactor the config file and modify the script accordingly * modify the config and script * rename the config file * Correct dataset preparation guide of WFLW (open-mmlab#873) * add pr template (open-mmlab#875) * add CITATION.cff and update setup.py (open-mmlab#876) * Add copyright header and pre-commit hook (open-mmlab#872) * Add pre-commit hook to automatically add copyright file header * update files with copyright header * Limit copyright checking in the first 2 lines of a file * Exclude configs in demo/ * set max-header-lines as 5 * rebase to master and add copyright to new files * move benchmark_regression into .dev_scripts/benchmark * Translate tasks/2d_body_keypoint.md (open-mmlab#842) * 2rd PR remove poseval * fix lint * revise the CN version Co-authored-by: ly015 <[email protected]> * fix some bugs in the unit test of smpl model. * * reorganiz `tests/` to solve importing issue (PEP 420) * add dataset info * fix lint * * fix wrongly modified parts in previous rebase * fix lint * rename datasets/_base_ as datasets/base * resolve compatibility of pose_limb_color * Add dummy dataset base classes with old names for compatibility * * Rewrite relative unittest based on dataset_info * Add bc-breaking test for functions related to dataset_info * Rename DatasetInfo.dataset_info as DatasetInfo._dataset_info * Fix dataset_info of h36m dataset * Handle breaking change pose_limb_color -> pose_link_color * add unittest for old-fashioned dataset initialization without dataset_info * resolve naming conflict in unittests Co-authored-by: zengwang430521 <[email protected]> Co-authored-by: ly015 <[email protected]> * fix typo * fix typo Co-authored-by: Jas <[email protected]> Co-authored-by: zengwang430521 <[email protected]>

test the simple case using tmux to run multiple benchmark regression …

56e3f21

…test tasks

jin-s13 requested a review from ly015 August 10, 2021 12:06