Skip to content

Describe the training and inference for ToL Hierarchical GUI region detection Model

License

Notifications You must be signed in to change notification settings

llv22/tol_gui_region_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ToL Hierarchical GUI region detection

Our ToL Hierarchical GUI region detection model is based on mmdetection. We have finetuned DINO with a customized configuration on Android Screen Hierarchical Layout (ASHL) dataset and inference on Screen Point-and-Read (ScreenPR) Benchmark. This guide covers how to set up environment, training and inference details.

1. Environment setup

You need to prepare mmdetection environment based on our cloned source code.

pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
  • Step 2: Install MMDetection from our source repository
cd <the root of repo tol_gui_region_detection>
pip install -v -e . -r requirements/tracking.txt
  • Step 3: Install extra components to support sync results on wandb.io:
pip install future tensorboard
pip install wandb

2. Training ToL model on ASHL dataset

cd configs/dino/
python convert_mobile_segement_to_multilabel_coco.py

Run the following script to train on 4 * A6000:

# distributed training
export CUDA_VISIBLE_DEVICES=0,1,2,3
./tools/dist_train_custom_multi_bbox.sh configs/dino/dino-4scale_r50_8xb2-90e_mobile_multi_bbox.py 4

On wandb.ai, the result after 90 epoch as follow:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.941
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.962
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.947
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.702
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.897
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.943
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.959
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.961
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.961
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.814
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.916
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.963
mmengine - INFO - bbox_mAP_copypaste: 0.941 0.962 0.947 0.702 0.897 0.943
mmengine - INFO - Epoch(val) [90][11/11]    coco/bbox_mAP: 0.9410  coco/bbox_mAP_50: 0.9620  coco/bbox_mAP_75: 0.9470  coco/bbox_mAP_s: 0.7020  coco/bbox_mAP_m: 0.8970  coco/bbox_mAP_l: 0.9430  data_time: 0.0137  time: 0.2778

You can use the following script to run test.py for test data and the visualization result will be saved in the folder dino-4scale_r50_8xb2-90e_mobile_multi_bbox_imgs/.

python tools/test.py configs/dino/dino-4scale_r50_8xb2-90e_mobile_multi_bbox.py ./work_dirs/dino-4scale_r50_8xb2-90e_mobile_multi_bbox/epoch_90.pth --show-dir dino-4scale_r50_8xb2-90e_mobile_multi_bbox_imgs/
export CUDA_VISIBLE_DEVICES=0,1,2,3
python tools/train.py configs/dino/dino-5scale_swin-l_8xb2-36e_mobile_multi_bbox.py --train_batch_size 2 --val_batch_size 2 --lr 0.001 --epoch 12 # 12 out of memory during 16
# distributed training
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train_custom_multi_bbox.sh configs/dino/dino-5scale_swin-l_8xb2-36e_mobile_multi_bbox.py 4

3. Inference on ScreenPR dataset

  • Step 1: Data preparation

Put ScreenPR dataset under the src folder of Screen-Point-and-Read github folder, having the relative path of ../../../data/mobile_pc_web_osworld to the root of current github project.

  • Step 2: Using our trained ToL model

The pretrained LoT weight has been shared in DINO weights trained by 90 epoch, save it to ./work_dirs/dino-4scale_r50_8xb2-90e_mobile_multi_bbox/epoch_90.pth and use the following script to trigger inference. A output folder will be generated with the name output_dino-4scale_r50_8xb2-90e_mobile_multi_bbox_mobile_pc_web_osworld under the same parent folder ../../../data/.

export CUDA_VISIBLE_DEVICES=0
python inference_test_screendata.py --input_folder ../../../data/mobile_pc_web_osworld --model_config configs/dino/dino-4scale_r50_8xb2-90e_mobile_multi_bbox.py --checkpoint ./work_dirs/dino-4scale_r50_8xb2-90e_mobile_multi_bbox/epoch_90.pth
  • Step 3: Using original Dino model

Download the original Dino weights and save it to ./work_dirs/dino-4scale_r50_improved_8xb2-12e_coco/dino-4scale_r50_improved_8xb2-12e_coco_20230818_162607-6f47a913.pth and use the following script to trigger inference.

export CUDA_VISIBLE_DEVICES=0
python inference_test_screendata_by_dino_original.py --input_folder ../../../data/mobile_pc_web_osworld

Using ToL model trained before

Reference

About

Describe the training and inference for ToL Hierarchical GUI region detection Model

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages