Faculty of Electrical Engineering and Informatics
Department of Control Engineering and Information Technolog
Artifacts:
- CARLA Simulator code
- Python detector
- 3D Webvisualizer coupled with montage and data
- Difference between ground truth and results
- Thesis structure (before each section list parts) (highligh in bold the essential parts) (figures will be mentioned)
- Subject placement, importance of topic
- Related work for comapnies, recent news
- Quick summary of what was done and why
- How it turned out - read more exact on najibghadri.com/msc-thesis
- fill images
- Motivation
- Analyze task
- More detailed introduction
- Why it is difficult
- What is a good system to develop this? Tesla says real world I say both
- What is needed for perception to work (problem):
- Localization, understanding the surrounding etc
- Data set problem
- The unkown problem (hint to energy based models and unsupervised learning)
- broad - narrow problem
- Short Proposed solution
- Freedom of a simulation
- Task flow (flow diagram): simulation, extraction, imaging,
- Short summary of results: ..., detector which uses: state of the art ...
- Detector can be used like a plug in
- Structure of Thesis
- Lastly about Energy based methods
- Each chapter
- All code and thesis available at https://github.com/najibghadri/msc-thesis and the published verision on my website
Selecting the right sensors for the task is half the job. In this chapter we are going to detail the most widely used sensors for autonomous driving and compare them.
- Radar 1
- Lidar 1
- Ultrasonic 1
- RGB cameras 1
- Lidar sensors provide a 3D sparse pointcloud but they are very 1
- Other sensors: GPS, Odometry, Rain 1
- Sensors in CARLA simulation 1
5
Basics: depth from radar/stereo cameras: I choose only cameras
Convolutional Neural networks Intro to Deep Learning and CNNs Object classification
-
Object Detection
-
Classification
-
Localization
-
Bounding box detection
-
Voxelization
-
Key point detection
-
Segmentation
-
Depth estimation
-
Orientation
-
Tracking
-
Road detection:
-
Lane detection
-
Driveable Road
-
Odometry
-
Lidar data detection
-
Algorithms to talk about:
-
Datasets - KITTI, MARS, COCO, Waymo, nuScenes
-
(AlexNet, LeNet, VGG)
-
YOLO
-
R-CNN, Fast, Faster
-
Mask R-CNN - Detectron2
-
Detectron
-
PointNet
-
VoxelNet
-
Segmentation Networks
-
....
5-6
It is important for a self-driving company to openly detail their technical solution because it let's people trust the autopilot.
- Simulations
- Miles done
- Risk
- Tesla eight cameras, 12 ultrasonic sensors, and one forward-facing radar.
- Their view on simulations
expensive. - - MobileEye - do some pros/cons
3-4
- to simplify the task
- Later will talk about improvements
- Plane assumption: The objects and road has ~0 pitch and ~0 roll (valid for most of the time)
- No consistency through time
- Human pose does not matter
- Day light situation
-3
- Task flow again: simulation, extraction, imaging,
- Choosing the sensor suite
- The simulation idea for dataset and ground truth instead of dataset
- Drawbacks, limitations
- Pros cons
- Tools used
- Linux Ubuntu
- VS Code
- Python, Scripts, Colab
- CARLA
- LOT OF ISSUES
- Stereo imaging
- Simulation imaging: HD 720p, Camera matrix, compression, noise, reality, distortion, focus, etc, cropping, occlusion, etc, throughoutput
- Two coordinate systems
- Which of the algorithms described in the Related Work chapter we chose and why
- No training
- What frameworks are available and how are they different and why we chose the one we chose
- Detectron2
- About
- Why Instance segm
- Comparisons
- Depth estimation
- Camera Calibration
- Projective Camera Model
- Inverse transformation explain, Translation: same matrix as camera why
- Stereo Block Matching Algorithm (newer)
- Detector - the final solution
- Peudo code - the algorithm
- Web visualizer
- Framework
- Usage and results
-7
- Explaining errors
- Car tilt problem
- Carla position problem
- Z coordinate hack explain why its ok, CARLA issue
- Fine tuning:
- Depth mean vs mode
- FPS
- All sides: FPS avg: 0.53 FPS TITAN X
- If saving pictures: FPS avg: 0.29
- One side:
- Three sides:
- FPS of one side my computer vs Titan X
- Different models and their accuracy and FPS one side
- Mask R CNN
- Results I am proud of
- Precision, recall acc, danger
- Dangerousness
- Hardware requirements
-5
- Tracking
- YOLO
- Lane detection
- 3D Bounding box detection
- Keypoint detection
- Night results
-5
- Optimal sensor suite
- Correlation
- Mono depth correction
- Less sensors: rectified cameras - exo stereo
- Better scene understanding: road segmentation, path regression
- The biggest improvements in my opinion are unsupervised learning energy based methods - for PHD
- Energy based method - Yann LeCun
- Latent space for possible outcomes
- Traffic situation understanding
- Surrounding understanding
- Drivable area reconstruction from other actors
- Orientation, keypoint detection, wheel, etc detection
- Voxel reconstruction of actors
- Car position, tilt, velocity detection and correction, odometric correction
- Size based depth correction
- Parallax motion based depth correction
- Traffic light understanding
- Foreign object detection - White list based - difficult problem! (https://link.springer.com/article/10.1186/s13640-018-0261-2)
- Dark situations: solution: night detectors, different models
-5
- My prediction: only an open commond ever-growing AI could qualify as a super driving AI
- Give general conclusion (what we did and why is it good)
- Evaluate results quickly
- Describe opportunities for further research/improvement
-3
The concept is the following: I use modern CNNs suchs as YOLOv4, Deep SORT, R-CNN or others to perform object detection, semantic segmentation, and additional feature detections with classical methods to achieve lane detection such as Hough transform and perform distance estimation using stereo imaging.
For this project I decided that I will test my system on a simulation. This gives me great freedom and efficiency to focus on developing the algorithms instead of worrying about the lack of good datasets, because as we know, generating a dataset is half of the hustle in ML today (yet). However in a simulation you can do anythis almost instantly if you put aside the rendering time. Put arbitrary number of cameras anywhere, use any car, be on any kind of road, use camera effects, generate ground truth of any kind programatically (bounding boxes, segmentation, depth, steering data, location data).
After extensive research I stumbled upon CARLA Simulator. The project is developed by the Barcelonian university UAB's computer vision CVC Lab. This simulator has a really good API that let's us do what I described above in python.
The following is the sensor architecture in my system; Only 10 90° FOV RGB cameras, Front stereo, left/right 45° angled corner stereos and left/right side stereos. Here you can see how it looks like with one image for each side:
With the output combination of CNNs and other algorithms I will then perform ”self-supervised” deep-learning with continuous energy-based method to learn the latent space of generic driving scene scenarios.
This is going to be the next part of my thesis that I am still working on.
Yann LeCun: "Energy-Based Self-Supervised Learning"
CES 2020: An Hour with Amnon - Autonomous Vehicles Powered by Mobileye
How the Mobileye Roadbook™ Enables L2+ Solutions
Testing The World's Smartest Autonomous Car (NOT A Tesla)
https://deepdrive.voyage.auto/
https://www.foretellix.com/technology/