Demo code of "Distilled semantics for comprehensive scene understanding from videos", published at CVPR 2020
Fabio Tosi † - Filippo Aleotti † - Pierluigi Zama Ramirez † - Matteo Poggi - Samuele Salti - Luigi Di Stefano - Stefano Mattoccia
† joint first authorship
At the moment, we do not plan to release the training code.
Whole understanding of the surroundings is paramount to autonomous systems. Recent works have shown that deep neural networks can learn geometry (depth) and motion (optical flow) from a monocular video without any explicit supervision from ground truth annotations, particularly hard to source for these two tasks. In this paper, we take an additional step toward holistic scene understanding with monocular cameras by learning depth and motion alongside with semantics, with supervision for the latter provided by a pre-trained network distilling proxy ground truth images. We address the three tasks jointly by a) a novel training protocol based on knowledge distillation and self-supervision and b) a compact network architecture which enables efficient scene understanding on both power hungry GPUs and low-power embedded platforms. We thoroughly assess the performance of our framework and show that it yields state-of-the-art results for monocular depth estimation, optical flow and motion segmentation.
At training time, our final network is an ensamble of many sub networks (depicted in figure), where each one is in charge of a specific task:
- Camera Network: network in charge of intrinsics and pose estimation
- Depth Semantic Network (DSNet): network able to infer both depth and semantic for a given scene
- Optical Flow Network (OFNet): teacher optical flow network
- Self-Distilled Optical Flow Network: student optical flow network, used at testing time
At testing time, we rely on DSNet, CameraNet and Self-Distilled OFNet depending on the task.
For this project, you need TensorFlow version 1.8 and Python 2.x
or 3.x
You can install all the requirements easily running the command:
pip install -r requirements.txt
Pretrained models are available for download:
Training | Network | Resolution | zip |
KITTI | Omeganet | 640x192 | weights |
CS + KITTI (EIGEN) | DSNet | 1024x320 | weights |
CS | DSNet | 1024x320 | weights |
You can run OmegaNet on a single image using the following command:
python --tgt $tgt_path [--ckpt $ckpt --tasks $tasks --dest $dest --src1 $src1 --src2 $src2]
where :
: path to target image (ie, image at time t0). Requiredsrc1
: path to src1 image (ie, image at time t-1). Required only in case offlow
are in tasks listsrc2
: path to src2 image (ie, image at time t+1). Required only in case offlow
are in tasks listckpt
: path to checkpoint. Requiredtasks
: list of tasks to perform, space separated. Default [inverse_depth
: destination folder. Defaultresults
For instance, the following command run OmegaNet on an example batch from KITTI 2015 test set
python --src1 assets/example/000018_09.png \
--tgt assets/example/000018_10.png \
--src2 assets/example/000018_11.png \
--ckpt models/omeganet
To test the network, you have to generate the artifacts for a specific task first, then you can test them.
You can generate the artifacts for a specific task
running the following command:
python --task $task --ckpt $ckpt \
[--cpu --load_only_baseline --filenames_file $filenames ] \
[--height $height --width $width --dest $dest]
: task to perform. Can be [depth
]. Defaultdepth
: path to filename.txt, where are listed all the images to load. Defaultfilenames/eigen_test.txt
: path to checkpoint. Requiredload_only_baseline
: if set, load only Baseline (CameraNet+DSNet). Otherwise, full OmegaNet will be loaded. For instance, if you want to test a Baseline model SD-OFNet weights are not available, so you do not expect to load them.height
: height of resized image. Default192
: width of resized image. Default640
: where save artifacts. Defaultartifacts
: run test on cpu
You can generate depth artifacts using the following script:
export datapath="/path/to/full_kitti/"
python --task depth \
--datapath $datapath \
--filenames_file filenames/eigen_test.txt \
--ckpt models/omeganet \
: path to your FULL KITTI dataset
Artifacts for KITTI can be produced with the following command
export datapath="/path/to/3-frames-KITTI/"
python --task flow \
--datapath $datapath \
--filenames_file filenames/kitti_2015_test.txt \
--ckpt models/omeganet
: path to your 3-frames extended KITTI dataset
Artifacts for KITTI can be produced with the following command.
export datapath="/path_to_kitti/data_semantics/training/image_2"
python --task semantic \
--datapath $datapath \
--filenames_file filenames/kitti_2015_test_semantic.txt \
--ckpt path_to_ckpts/dsnet \
: path to your images of the semantic kitti dataset
Artifacts for KITTI can be produced with the following command.
export datapath="/path/to/kitti/2015/"
python --task mask \
--ckpt path_to_ckpts/omeganet \
--datapath $datapath \
--filenames_file filenames/kitti_2015_test.txt
: path to your 3-frames extended KITTI dataset
You can evaluate the maps running the command:
cd evaluators
python --datapath $datapath \
--prediction_folder $prediction_folder
: path to FULL KITTI datasetprediction_folder
: path to folder with npy files, e.g.../artifacts/depth/
To test optical flow artifacts, run the command:
cd evaluators
python --datapath $datapath \
--prediction_folder $prediction_folder
: path to KITTI/2015prediction_folder
: path to flow predictions, e.g.../artifacts/flow/
To test semantic run the following command:
cd evaluators
python --datapath $datapath \
--prediction_folder $prediction_folder
: path to KITTI/2015/data_semanticsprediction_folder
: path to semantic predictions, e.g.../artifacts/semantic/
When motion mask artifacts are ready, you can test them on KITTI.
cd evaluators
python --datapath $datapath \
--prediction_folder $prediction_folder
: path to KITTI/2015 folderprediction_folder
: path to predicted moving masks, e.g.../artifacts/mask
If you find this code useful in your research, please cite:
title={Distilled semantics for comprehensive scene understanding from videos},
author={Tosi, Fabio and Aleotti, Filippo and Ramirez, Pierluigi Zama and Poggi, Matteo and Salti, Samuele and Di Stefano, Luigi and Mattoccia, Stefano},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
Code is licensed under Apache 2.0 License. More information in the LICENSE
Portions of our code are from other repositories:
Depth evaluation
is from monodepth, for "Unsupervised Monocular Depth Estimation with Left-Right Consistency, by C. Godard, O Mac Aodha, G. Brostow, CVPR 2017".Flow Tools
are from, licensed under MIT license.Rigid flow estimation
is from SfMLearner, for "Unsupervised Learning of Depth and Ego-Motion from Video, by T. Zhou, M. Brown, N. Snavely, D. G. Lowe, CVPR 2017". Code is licensed under MIT License.SelfFlow
network and utilities are from SelfFlow, for "SelFlow: Self-Supervised Learning of Optical Flow, by P. Liu, M. Lyu , I. King, J. Xu, CVPR 2019". Code is licensed under MIT License.- The
Teacher semantic network
is DPC, for "Searching for Efficient Multi-Scale Architectures for Dense Image Prediction, by , L. C. Chen, M. D. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, H. Adam, J. Shlens, Advances in neural information processing systems 2018". Code is licensed under Apache v2 License. We used this network to generate proxy sematic maps.
We would like to thank all these authors for making their code publicly available and, eventually, for sharing pretrained models.