Official implementation of HIPTrack: Visual Tracking with Historical Prompts. (CVPR 2024)
Trackers that follow Siamese paradigm utilize similarity matching between template and search region features for tracking. Many methods have been explored to enhance tracking performance by incorporating tracking history to better handle scenarios involving target appearance variations such as deformation and occlusion. However, the utilization of historical information in existing methods is insufficient and incomprehensive, which typically requires repetitive training and introduces a large amount of computation. In this paper, we show that by providing a tracker that follows Siamese paradigm with precise and updated historical information, a significant performance improvement can be achieved with completely unchanged parameters.
Based on this, we propose a historical prompt network that uses refined historical foreground masks and historical visual features of the target to provide comprehensive and precise prompts for the tracker. We build a novel tracker called HIPTrack based on the historical prompt network, which achieves considerable performance improvements without the need to retrain the entire model.You can download the model weights and raw_result from Google Drive.
Tracker | LaSOT (AUC / Norm P / P) | LaSOT extension (AUC / Norm P / P) | TrackingNet (AUC / Norm P / P) | GOT-10k (AO / SR 0.5 / SR 0.75) |
---|---|---|---|---|
HIPTrack | 72.7 / 82.9 / 79.5 | 53.0 / 64.3 / 60.6 | 84.5 / 89.1 / 83.8 | 77.4 / 88.0 / 74.5 |
Our model (backbone: ViT-B, resolution: 384x384) can run at 45 fps (frames per second) on a single NVIDIA Tesla V100 GPU.
Trainable Parameters (M) | Parameters (M) | MACs (G) | Speed (FPS) | |
---|---|---|---|---|
HIPTrack | 34.1 | 120.4 | 66.9 | 45.3 |
Put the tracking datasets in ./data
. It should look like this:
${PROJECT_ROOT}
-- data
-- lasot
|-- airplane
|-- basketball
|-- bear
...
-- got10k
|-- test
|-- train
|-- val
-- coco
|-- annotations
|-- images
-- trackingnet
|-- TRAIN_0
|-- TRAIN_1
...
|-- TRAIN_11
|-- TEST
Our implementation is based on PyTorch 1.10.1+CUDA11.3. Use the following command to install the runtime environment:
conda env create -f HIPTrack_env_cuda113.yaml
Run the following command to set paths for this project
python3 tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
After running this command, you can also modify paths by editing these two files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
-
For training on datasets except GOT-10k.
Download pre-trained DropTrack weights and put it under
$PROJECT_ROOT$/pretrained_models
.python3 tracking/train.py --script hiptrack --config hiptrack --save_dir ./output --mode multiple --nproc_per_node 4
-
For training on GOT-10k.
Download pre-trained DropTrack weights and put it under
$PROJECT_ROOT$/pretrained_models
.python3 tracking/train.py --script hiptrack --config hiptrack_got --save_dir ./output --mode multiple --nproc_per_node 4
Change the dataset path in lib/test/evaluation/local.py
to your storage path.
- LaSOT or other off-line evaluated benchmarks (modify
--dataset
correspondingly)
python3 tracking/test.py hiptrack hiptrack --dataset lasot --threads 16 --num_gpus 4
python3 tracking/analysis_results.py # need to modify tracker configs and names
- GOT10K-test
python3 tracking/test.py hiptrack hiptrack_got --dataset got10k_test --threads 16 --num_gpus 4
python3 lib/test/utils/transform_got10k.py --tracker_name hiptrack --cfg_name hiptrack_got
- TrackingNet
python3 tracking/test.py hiptrack hiptrack --dataset trackingnet --threads 16 --num_gpus 4
python3 lib/test/utils/transform_trackingnet.py --tracker_name hiptrack --cfg_name hiptrack
bash tracking/profile_hiptrack.sh
@inproceedings{cai2024hiptrack,
title={HIPTrack: Visual Tracking with Historical Prompts},
author={Cai, Wenrui and Liu, Qingjie and Wang, Yunhong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Thanks to the following repo for providing us with a lot of convenience to implement our method.
We also thank to the following repositories for facilitating the analysis in Figure 2 of our paper.
If you have any questions, just create issues or email me. 😀