Hongxiao Yu1,2 · Yuqi Wang1,2 · Yuntao Chen3 · Zhaoxiang Zhang1,2,3
1School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)
2NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences (CASIA)
3Centre for Artificial Intelligence and Robotics (HKISI_CAS)
ECCV 2024
Here we compare our ISO with the previously best NDC-Scene and MonoScene model.
Method | IoU | ceiling | floor | wall | window | chair | bed | sofa | table | tvs | furniture | object | mIoU |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MonoScene | 42.51 | 8.89 | 93.50 | 12.06 | 12.57 | 13.72 | 48.19 | 36.11 | 15.13 | 15.22 | 27.96 | 12.94 | 26.94 |
NDC-Scene | 44.17 | 12.02 | 93.51 | 13.11 | 13.77 | 15.83 | 49.57 | 39.87 | 17.17 | 24.57 | 31.00 | 14.96 | 29.03 |
Ours | 47.11 | 14.21 | 93.47 | 15.89 | 15.14 | 18.35 | 50.01 | 40.82 | 18.25 | 25.90 | 34.08 | 17.67 | 31.25 |
We highlight the best results in bold.
Pretrained models on NYUv2 can be downloaded here.
- Create conda environment:
$ conda create -n iso python=3.9 -y
$ conda activate iso
- This code was implemented with python 3.9, pytorch 2.0.0 and CUDA 11.7. Please install PyTorch:
$ conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
- Install the additional dependencies:
$ git clone --recursive https://github.com/hongxiaoy/ISO.git
$ cd ISO/
$ pip install -r requirements.txt
💡Note
Change L140 in
depth_anything/metric_depth/zoedepth/models/base_models/dpt_dinov2/dpt.py
to
self.pretrained = torch.hub.load('facebookresearch/dinov2', 'dinov2_{:}14'.format(encoder), pretrained=False)
Then, download Depth-Anything pre-trained model and metric depth model checkpoints file to
checkpoints/
.
- Install tbb:
$ conda install -c bioconda tbb=2020.2
- Finally, install ISO:
$ pip install -e ./
💡Note
If you move the ISO dir to another place, you should run
pip cache purge
then run
pip install -e ./
again.
-
Download the NYUv2 dataset.
-
Create a folder to store NYUv2 preprocess data at
/path/to/NYU/preprocess/folder
. -
Store paths in environment variables for faster access:
$ export NYU_PREPROCESS=/path/to/NYU/preprocess/folder $ export NYU_ROOT=/path/to/NYU/depthbin
💡Note
Recommend using
echo "export NYU_PREPROCESS=/path/to/NYU/preprocess/folder" >> ~/.bashrc
format command for future convenience.
-
Preprocess the data to generate labels at a lower scale, which are used to compute the ground truth relation matrices:
$ cd ISO/ $ python iso/data/NYU/preprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS
-
Download the Occ-ScanNet dataset, this include:
posed_images
gathered_data
train_subscenes.txt
val_subscenes.txt
-
Create a root folder to store Occ-ScanNet dataset
/path/to/Occ/ScanNet/folder
, and move the all dataset files to this folder, zip files need extraction. -
Store paths in environment variables for faster access:
$ export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder
💡Note
Recommend using
echo "export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder" >> ~/.bashrc
format command for future convenience.
Download ISO pretrained models on NYUv2, then put them in the folder /path/to/ISO/trained_models
.
huggingface-cli download --repo-type model hongxiaoy/ISO
If you didn't install huggingface-cli
before, please following official instructions.
-
Create folders to store training logs at /path/to/NYU/logdir.
-
Store in an environment variable:
$ export NYU_LOG=/path/to/NYU/logdir
- Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on NYUv2:
$ cd ISO/
$ python iso/scripts/train_iso.py \
dataset=NYU \
NYU_root=$NYU_ROOT \
NYU_preprocess_root=$NYU_PREPROCESS \
logdir=$NYU_LOG \
n_gpus=2 batch_size=4
-
Create folders to store training logs at /path/to/OccScanNet/logdir.
-
Store in an environment variable:
$ export OCC_SCANNET_LOG=/path/to/OccScanNet/logdir
- Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on Occ-ScanNet (should match config file name in train_iso.py):
$ cd ISO/
$ python iso/scripts/train_iso.py \
dataset=OccScanNet \
OccScanNet_root=$OCC_SCANNET_ROOT \
logdir=$OCC_SCANNET_LOG \
n_gpus=2 batch_size=4
To evaluate ISO on NYUv2 test set, type:
$ cd ISO/
$ python iso/scripts/eval_iso.py \
dataset=NYU \
NYU_root=$NYU_ROOT\
NYU_preprocess_root=$NYU_PREPROCESS \
n_gpus=1 batch_size=1
Please create folder /path/to/iso/output to store the ISO outputs and store in environment variable:
export ISO_OUTPUT=/path/to/iso/output
To generate the predictions on the NYUv2 test set, type:
$ cd ISO/
$ python iso/scripts/generate_output.py \
+output_path=$ISO_OUTPUT \
dataset=NYU \
NYU_root=$NYU_ROOT \
NYU_preprocess_root=$NYU_PREPROCESS \
n_gpus=1 batch_size=1
You need to create a new Anaconda environment for visualization.
conda create -n mayavi_vis python=3.7 -y
conda activate mayavi_vis
pip install omegaconf hydra-core PyQt5 mayavi
If you meet some problem when installing mayavi
, please refer to the following instructions:
$ cd ISO/
$ python iso/scripts/visualization/NYU_vis_pred.py +file=/path/to/output/file.pkl
This project is built based on MonoScene. Please refer to (https://github.com/astra-vision/MonoScene) for more documentations and details.
We would like to thank the creators, maintainers, and contributors of the MonoScene, NDC-Scene, ZoeDepth, Depth Anything for their invaluable work. Their dedication and open-source spirit have been instrumental in our development.
@article{yu2024monocular,
title={Monocular Occupancy Prediction for Scalable Indoor Scenes},
author={Yu, Hongxiao and Wang, Yuqi and Chen, Yuntao and Zhang, Zhaoxiang},
journal={arXiv preprint arXiv:2407.11730},
year={2024}
}