Shuai Chen, Yash Bhalgat, Xinghui Li, Jiawang Bian, Kejie Li, Zirui Wang, and Victor Prisacariu (CVPR 2024)
We tested our code based on CUDA11.3+, PyTorch 1.11.0+, and Python 3.7+ using docker.
We also provide a conda
environment
conda env create -f environment.yml
conda activate nefes
pip install git+https://github.com/princeton-vl/lietorch.git # if your lietorch doesn't work, you can set lietorch=False in poses.py
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
# install pytorch3d
cd ..
git clone https://github.com/facebookresearch/pytorch3d.git && cd pytorch3d && pip install -e .
This paper uses two public datasets:
-
7-Scenes
We use a similar data preparation as in MapNet. You can download the 7-Scenes datasets to the data/deepslam_data/7Scenes
directory using the script below.
cd data
python setup_7scenes.py
-
we additionally computed a pose averaging stats (pose_avg_stats.txt) and manually tuned world_setup.json in
data/7Scenes
to align the 7Scenes' coordinate system with NeRF's coordinate system (OpenGL). You could generate your own re-alignment to a new pose_avg_stats.txt using the--save_pose_avg_stats
configuration. -
In our
setup_7scenes.py
script, we also copy the 7scenes colmap poses to the deepslam_data/7Scenes/{SCENE}/ folder, courtsey to Brachmann21.
- Cambridge Landmarks
To downlaod Cambridge Landmarks, please use this script.
cd data
python setup_cambridge.py
We also put the pose_avg_stats.txt
and world_setup.json
to the data/Cambridge/CAMBRIDGE_SCENES
like we provided in the source code.
As we described in the paper, we also applied semantic filtering when training NeFeS to filter out temporal objects using Cheng22. Therefore, in the script, we download and put them into data/Cambridge/{CAMBRIDGE_SCENE}/train/semantic
and data/Cambridge/{CAMBRIDGE_SCENE}/test/semantic
.
We currently provide pretrained NeFeS models and DFNet models used in our paper.
Download and decompress paper_models.zip to {REPO_PATH}/logs/paper_models
wget https://www.robots.ox.ac.uk/~shuaic/NeFeS2024/paper_models.zip
unzip paper_models.zip
mkdir logs
mv paper_models/ logs/
Due to our limited resource, my pre-trained models are trained using different GPUs such as Nvidia 3090, 3080ti, RTX 6000, or 1080ti GPUs. We noticed that models' performance might jitter slightly (could be better or worse) when running inference with different types of GPUs. Therefore, all experiments on the paper are reported based on the same GPUs as they were trained. To providing necesssary reference, we also include the experimental results ran by our machines.
sh eval.sh
We provide NeFeS training script in train_nefes.sh
sh train_nefes.sh
In this script, we run a three stage progressive training schedule, as described in the Supplementary Material of the paper.
# Stage 1 of training color only nerf, initializing the 3D geometry to a reasonable extent.
python run_nefes.py --config config/7Scenes/dfnet/config_stairs_stage1.txt
# Stage 2 and 3 for training feature and fusion modules, obtaining best neural feature fields performance for NeFeS.
python run_nefes.py --config config/7Scenes/dfnet/config_stairs_stage2.txt
After training NeFeS, it is ready to test the APRs with NeFeS refinement. Notice that we've already provided paper results above ran by ourselves. To use your own trained model, you can choose to use the following script.
# this script is an example of running DFNet + NeFeS50
sh test_apr_refinement.sh
In the script, we utilize paper models by default in the config file. You could replace the default models with your own models if you have trained ones.
python test_refinement.py --config config/7Scenes/dfnet/config_stairs_DFM.txt --ft_path $YOUR_NeFeS
If your GPU is out-of memory, please consider reducing --netchunk
parameters.
If you want to try to see if NeFeS can refine your own APR model/pose estimator, you can add your network loader to load_APR_and_FeatureNet()
in dm/direct_pose_model.py.
Notice that it is recommanded to train your APR/pose estimator in openGL coordinate system (best way is through our dataloader, as we did for PoseNet (pytorch) and MsTransformer). This is because our NeFeS is trained in openGL convention, otherwise you will have to adjust the cooridnate system yourself.
We thank Dr. Michael Hobley and Dr. Theo Costain for their generous discussion on this work as well as their kind proof reading for our paper manuscripts. We also thank Changkun Liu for kindly providing assistant on ensuring conda environment consistency.
Please cite our paper and star this repo if you find our work helpful. Thanks!
@inproceedings{chen2024nefes,
author = {Chen, Shuai and Bhalgat, Yash and Li, Xinghui and Bian, Jia-Wang and Li, Kejie and Wang, Zirui and Prisacariu, Victor Adrian},
title = {Neural Refinement for Absolute Pose Regression with Feature Synthesis},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {20987-20996}
}
This code builds on previous camera relocalization pipelines, namely Direct-PoseNet and DFNet. Please consider citing:
@inproceedings{chen2022dfnet,
title={DFNet: Enhance Absolute Pose Regression with Direct Feature Matching},
author={Chen, Shuai and Li, Xinghui and Wang, Zirui and Prisacariu, Victor},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2022}
}
@inproceedings{chen2021direct,
title={Direct-PoseNet: Absolute pose regression with photometric consistency},
author={Chen, Shuai and Wang, Zirui and Prisacariu, Victor},
booktitle={2021 International Conference on 3D Vision (3DV)},
pages={1175--1185},
year={2021},
organization={IEEE}
}