# MVSFormer
Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)

[arxiv paper](https://arxiv.org/abs/2208.02541)

- [x] Releasing codes of training and testing
- [x] Adding dynamic pointcloud fusion for T&T
- [x] Releasing pre-trained models

## Installation

```
git clone https://github.com/ewrfcas/MVSFormer.git
cd MVSFormer
pip install -r requirements.txt
```

We also highly recommend to install fusibile from (https://github.com/YoYo000/fusibile) for the depth fusion.

```
git clone https://github.com/YoYo000/fusibile.git
cd fusibile
cmake .
make
```

**Tips:** You should revise CUDA_NVCC_FLAGS in CMakeLists.txt according the gpu device you used. 
We set ```-gencode arch=compute_70,code=sm_70``` instead of ```-gencode arch=compute_60,code=sm_60``` with V100 GPUs.
For other GPU types, you can follow
```
# 1080Ti
-gencode arch=compute_60,code=sm_60

# 2080Ti
-gencode arch=compute_75,code=sm_75

# 3090Ti
-gencode arch=compute_86,code=sm_86

# V100
-gencode arch=compute_70,code=sm_70
```

## Datasets

### DTU

1. Download preprocessed poses from [DTU training data](https://drive.google.com/file/d/1eDjh-_bxKKnEuz5h-HXS7EDJn59clx6V/view), 
and depth from [Depths_raw](https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/cascade-stereo/CasMVSNet/dtu_data/dtu_train_hr/Depths_raw.zip).
2. We also need original rectified images from [the official website](http://roboimagedata2.compute.dtu.dk/data/MVS/Rectified.zip).
3. DTU testing set can be downloaded from [MVSNet](https://drive.google.com/open?id=135oKPefcPTsdtLRzoDAQtPpHuoIrpRI_).

```
dtu_training
 ├── Cameras
 ├── Depths
 ├── Depths_raw
 └── DTU_origin/Rectified (downloaded from the official website with origin image size)
```

### BlendedMVS

Download high-resolution images from [BlendedMVS](https://onedrive.live.com/?authkey=%21ADb9OciQ4zKwJ%5Fw&id=35CFA9803D6F030F%21123&cid=35CFA9803D6F030F)

```
BlendedMVS_raw
 ├── 57f8d9bbe73f6760f10e916a
 .   └── 57f8d9bbe73f6760f10e916a
 .       └── 57f8d9bbe73f6760f10e916a
 .           ├── blended_images
             ├── cams
             └── rendered_depth_maps
```

### Tank-and-Temples (T&T)
Download preprocessed [T&T](https://drive.google.com/file/d/1gAfmeoGNEFl9dL4QcAU4kF0BAyTd-r8Z/view) pre-processed by [MVSNet](https://github.com/YoYo000/MVSNet/issues/14).
Note that users should use the short depth range of cameras, run the evaluation script to produce the point clouds.
Remember to replace the cameras by those in `short_range_caemeras_for_mvsnet.zip` in the `intermediate` folder, which is available at [short_range_caemeras_for_mvsnet.zip](https://drive.google.com/file/d/1Nbsq3WEVSg9tppMjN6hYM_rzuALWnrIy/view?usp=sharing) 

```
tankandtemples
 ├── advanced
 │  ├── Auditorium
 │  ├── Ballroom
 │  ├── ...
 │  └── Temple
 └── intermediate
        ├── Family
        ├── Francis
        ├── ...
        ├── Train
        └── short_range_cameras
```

## Training

### Pretrained weights

DINO-small (https://github.com/facebookresearch/dino): [Weight Link](https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_pretrain.pth)

Twins-small (https://github.com/Meituan-AutoML/Twins): [Weight Link](https://drive.google.com/file/d/131SVOphM_-SaBytf4kWjo3ony5hpOt4S/view?usp=sharing)

Training MVSFormer (Twins-based) on DTU with 2 32GB V100 GPUs cost 2 days. 
We set the max epoch=15 in DTU, but it could achieve the best one in epoch=10 in our implementation.
You are free to adjust the max epoch, but the learning rate decay may be influenced.
```
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer.json \
                                         --exp_name MVSFormer \
                                         --data_path ${YOUR_DTU_PATH} \
                                         --DDP
```
MVSFormer-P (frozen DINO-based).
```
                                         
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer-p.json \
                                         --exp_name MVSFormer-p \
                                         --data_path ${YOUR_DTU_PATH} \
                                         --DDP
```

We should finetune our model based on BlendedMVS before the testing on T&T.
```
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer_blendmvs.json \
                                         --exp_name MVSFormer-blendedmvs \
                                         --data_path ${YOUR_BLENDEMVS_PATH} \
                                         --dtu_model_path ${YOUR_DTU_MODEL_PATH} \
                                         --DDP
```

## Test

Pretrained models: [OneDrive](https://1drv.ms/u/s!Ah2VkULmkiqPryH_Tl2PUS6Is831?e=BgCuOY)

For testing on DTU:
```
CUDA_VISIBLE_DEVICES=0 python test.py --dataset dtu --batch_size 1 \
                                       --testpath ${dtu_test_path} \
                                       --testlist ./lists/dtu/test.txt \
                                       --resume ${MODEL_WEIGHT_PATH} \
                                       --outdir ${OUTPUT_DIR} \
                                       --fusibile_exe_path ./fusibile/fusibile \
                                       --interval_scale 1.06 --num_view 5 \
                                       --numdepth 192 --max_h 1152 --max_w 1536 --filter_method gipuma \
                                       --disp_threshold 0.1 --num_consistent 2 --prob_threshold 0.5,0.5,0.5,0.5 \
                                       --combine_conf \
                                       --tmps 5.0,5.0,5.0,1.0
```

For testing on T&T, T&T uses dpcd, whose confidence is controled by ```conf``` rather than ```prob_threshold```.
Sorry for the confused parameter names, which is the black history of this project.
Note that we recommend to use ```num_view=20``` here, but you should build a new pair.txt with 20 views as MVSNet.
```
CUDA_VISIBLE_DEVICES=0 python test.py --dataset tt --batch_size 1 \
                                      --testpath ${tt_test_path}/intermediate(or advanced) \
                                      --testlist ./lists/tanksandtemples/intermediate.txt(or advanced.txt)
                                      --resume ${MODEL_WEIGHT_PATH} \
                                      --outdir ${OUTPUT_DIR} \ 
                                      --interval_scale 1.0 --num_view 10 --numdepth 256 \
                                      --max_h 1088 --max_w 1920 --filter_method dpcd \
                                      --conf 0.5,0.5,0.5,0.5 \
                                      --use_short_range --combine_conf --tmps 5.0,5.0,5.0,1.0
```

## Cite

If you found our project helpful, please consider citing:

```
@article{caomvsformer,
  title={MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth},
  author={Cao, Chenjie and Ren, Xinlin and Fu, Yanwei},
  journal={Transactions of Machine Learning Research},
  year={2023}
}
```

Our codes are partially based on [CDS-MVSNet](https://github.com/TruongKhang/cds-mvsnet), [DINO](https://github.com/facebookresearch/dino), and [Twins](https://github.com/Meituan-AutoML/Twins).