ToL Hierarchical GUI region detection

Our ToL Hierarchical GUI region detection model is based on mmdetection. We have finetuned DINO with a customized configuration on Android Screen Hierarchical Layout (ASHL) dataset and inference on Screen Point-and-Read (ScreenPR) Benchmark. This guide covers how to set up environment, training and inference details.

1. Environment setup

You need to prepare mmdetection environment based on our cloned source code.

Step 1: Install MMEngine and MMCV using MIM.

pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"

Step 2: Install MMDetection from our source repository

cd <the root of repo tol_gui_region_detection>
pip install -v -e . -r requirements/tracking.txt

Step 3: Install extra components to support sync results on wandb.io:

pip install future tensorboard
pip install wandb

2. Training ToL model on ASHL dataset

Step 1 [Optional]: prepare training data with coco style using the migration script configs/dino/convert_mobile_segement_to_multilabel_coco.py. Supposed the training data has been put into ../data/screendata folder. As we also put the generated files configs/dino/data/train/annotation_multilabel_coco.json and configs/dino/data/val/annotation_multilabel_coco.json into our source code, this step can be optional if you don't need configuration different from us.

cd configs/dino/
python convert_mobile_segement_to_multilabel_coco.py

Step 2, Using ./tools/dist_train_custom_multi_bbox.sh to train model on multiple GPUs using Rest backbone. The model configuration file is configs/dino/dino-4scale_r50_8xb2-90e_mobile_multi_bbox.py. For our cases, 4 * A6000 are used and you can change the dist_train_custom_multi_bbox.sh based on your own machine settings.

Run the following script to train on 4 * A6000:

# distributed training
export CUDA_VISIBLE_DEVICES=0,1,2,3
./tools/dist_train_custom_multi_bbox.sh configs/dino/dino-4scale_r50_8xb2-90e_mobile_multi_bbox.py 4

On wandb.ai, the result after 90 epoch as follow:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.941
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.962
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.947
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.702
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.897
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.943
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.959
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.961
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.961
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.814
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.916
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.963
mmengine - INFO - bbox_mAP_copypaste: 0.941 0.962 0.947 0.702 0.897 0.943
mmengine - INFO - Epoch(val) [90][11/11]    coco/bbox_mAP: 0.9410  coco/bbox_mAP_50: 0.9620  coco/bbox_mAP_75: 0.9470  coco/bbox_mAP_s: 0.7020  coco/bbox_mAP_m: 0.8970  coco/bbox_mAP_l: 0.9430  data_time: 0.0137  time: 0.2778

You can use the following script to run test.py for test data and the visualization result will be saved in the folder dino-4scale_r50_8xb2-90e_mobile_multi_bbox_imgs/.

python tools/test.py configs/dino/dino-4scale_r50_8xb2-90e_mobile_multi_bbox.py ./work_dirs/dino-4scale_r50_8xb2-90e_mobile_multi_bbox/epoch_90.pth --show-dir dino-4scale_r50_8xb2-90e_mobile_multi_bbox_imgs/

Step 3 [Optional]: use Swin-l as backbone to train for 12 epoch with configuration file configs/dino/dino-5scale_swin-l_8xb2-36e_mobile_multi_bbox.py. In comparison, the loss curve is much worse than the one of Rest backbone.

export CUDA_VISIBLE_DEVICES=0,1,2,3
python tools/train.py configs/dino/dino-5scale_swin-l_8xb2-36e_mobile_multi_bbox.py --train_batch_size 2 --val_batch_size 2 --lr 0.001 --epoch 12 # 12 out of memory during 16
# distributed training
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train_custom_multi_bbox.sh configs/dino/dino-5scale_swin-l_8xb2-36e_mobile_multi_bbox.py 4

3. Inference on ScreenPR dataset

Step 1: Data preparation

Put ScreenPR dataset under the src folder of Screen-Point-and-Read github folder, having the relative path of ../../../data/mobile_pc_web_osworld to the root of current github project.

Step 2: Using our trained ToL model

The pretrained LoT weight has been shared in DINO weights trained by 90 epoch, save it to ./work_dirs/dino-4scale_r50_8xb2-90e_mobile_multi_bbox/epoch_90.pth and use the following script to trigger inference. A output folder will be generated with the name output_dino-4scale_r50_8xb2-90e_mobile_multi_bbox_mobile_pc_web_osworld under the same parent folder ../../../data/.

export CUDA_VISIBLE_DEVICES=0
python inference_test_screendata.py --input_folder ../../../data/mobile_pc_web_osworld --model_config configs/dino/dino-4scale_r50_8xb2-90e_mobile_multi_bbox.py --checkpoint ./work_dirs/dino-4scale_r50_8xb2-90e_mobile_multi_bbox/epoch_90.pth

Step 3: Using original Dino model

Download the original Dino weights and save it to ./work_dirs/dino-4scale_r50_improved_8xb2-12e_coco/dino-4scale_r50_improved_8xb2-12e_coco_20230818_162607-6f47a913.pth and use the following script to trigger inference.

export CUDA_VISIBLE_DEVICES=0
python inference_test_screendata_by_dino_original.py --input_folder ../../../data/mobile_pc_web_osworld

Name		Name	Last commit message	Last commit date
Latest commit History 2,750 Commits
.circleci		.circleci
.dev_scripts		.dev_scripts
.github		.github
configs		configs
demo		demo
docker		docker
docs		docs
mmdet		mmdet
projects		projects
requirements		requirements
resources		resources
tests		tests
tools		tools
.gitignore		.gitignore
.owners.yml		.owners.yml
.pre-commit-config-zh-cn.yaml		.pre-commit-config-zh-cn.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_mm_detection.md		README_mm_detection.md
README_zh-CN.md		README_zh-CN.md
dataset-index.yml		dataset-index.yml
inference_test_screendata.py		inference_test_screendata.py
inference_test_screendata_by_dino_original.py		inference_test_screendata_by_dino_original.py
inference_with_customized_screendata.py		inference_with_customized_screendata.py
model-index.yml		model-index.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToL Hierarchical GUI region detection

1. Environment setup

2. Training ToL model on ASHL dataset

3. Inference on ScreenPR dataset

Using ToL model trained before

Reference

About

Releases

Packages

Contributors 444

Languages

License

llv22/tol_gui_region_detection

Folders and files

Latest commit

History

Repository files navigation

ToL Hierarchical GUI region detection

1. Environment setup

2. Training ToL model on ASHL dataset

3. Inference on ScreenPR dataset

Using ToL model trained before

Reference

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 444

Languages

Packages