We introduce a multi-task learning setup using synthetic indoor scenes to learn image segmentation and depth estimation in a combined learning procedure. With different encoder-decoder architectures and several multi-task loss functions, we learn a common representation of the tasks. This facilitates learning tasks in conjunction as opposed to learning those tasks separately. We trained a baseline Unet inspired architecture adaptation we call Unet-Hydra and several DeepLab and SOSD-inspired architectures tailored to our tasks individually. Evaluating on mIOU for a semantic segmentation task and RMSE for depth estimation, we achieved a score of 0.60 mIOU and 4.96 RMSE for Unet-Hydra, 0.66 mIOU and 5.21 RMSE for the best DeepLab inspired architecture and 0.76 mIOU and 4.63 RMSE for the SOSD inspired architecture after extensive hyperparameter search. Our best model yields 0.77 mIOU and 4.28 RMSE when evaluated on Hypersim Dataset and results in 0.58 mIOU and 0.59 RMSE when performing transfer learning to real world indoor scenes from the NYU V2 dataset. The main insights of this work are that multi-task learning with different architectures continually outperforms single task learning and that shared task decoder outperform single-decoder architectures. Manually weighted task losses outperform learned weight parameters. Additionally, our work shows that multi-task learning on synthetic data can be successfully used for transfer learning real world data can produce good results on the semantic segmentation task.
-
Notifications
You must be signed in to change notification settings - Fork 1
jessicamecht/multitask_semantic_segmentation
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published