Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Yiming Xie, Huaizu Jiang, Georgia Gkioxari*, Julian Straub*
ICCV 2023
conda env create -f environment.yml
Download the pretrained weights and put it under
PROJECT_PATH/checkpoint/
.
You can also use gdown to download it in command line:
gdown --id 1FuIf1jDPX-ooOx0x-tS69ejhdn9NFuXz
Download and extract ScanNet by following the instructions provided at http://www.scan-net.org/.
[Expected directory structure of ScanNet (click to expand)]
You can obtain the train/val/test split information from here.
PROJECT_PATH
└───data
| └───scannet
| │ └───scans
| │ | └───scene0000_00
| │ | └───color
| │ | │ │ 0.jpg
| │ | │ │ 1.jpg
| │ | │ │ ...
| │ | │ ...
| │ └───scans_raw
| │ | └───scene0000_00
| │ | └───scene0000_00.aggregation.json
| │ | └───scene0000_00_vh_clean_2.labels.ply
| │ | └───scene0000_00_vh_clean_2.0.010000.segs.json
| │ | │ ...
| | └───scannetv2_test.txt
| | └───scannetv2_train.txt
| | └───scannetv2_val.txt
| | └───scannetv2-labels.combined.tsv
Next download the generated oriented boxes annotations and put it under PROJECT_PATH/data/scannet/
OR you can run the data preparation script by yourself.
python eval.py --cfg ./config/eval.yaml CHECKPOINT_PATH ./checkpoint/parq_release.ckpt
Training with 8 gpus:
python train.py --cfg ./config/train.yaml TRAINER.GPUS 8
We provide a demo of PARQ running with self-captured ARKit data. Please refer to DEMO.md for details about capturing and processing the data. We also provide the example data captured using iPhoneXR.
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{xie2023parq,
title={Pixel-Aligned Recurrent Queries for Multi-View {3D} Object Detection},
author={Xie, Yiming and Jiang, Huaizu and Gkioxari, Georgia and Straub, Julian},
booktitle={ICCV},
year={2023}
}
The majority of PARQ is relased under the MIT License.
LICENSE-MIT file is for file model/transformer_parq.py
.
LICENSE file is for other files.
We want to thank the following contributors that our code is based on: DETR, VoteNet, RotationContinuity, Pixloc .