0f-abs.tex

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Abstract
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The recognition of three dimensional visual data is a fundamental problem in computer vision.  Automatic 3D object recognition is becoming a vital part of a wide range of applications, such as robotics, human-computer interaction, activity analysis and video search. This thesis focuses on the sub-problems of 3D object recognition: \emph{classification} and \emph{pose estimation} of \emph{geometric shapes} and \emph{human actions} from 3D data, including voxels, point clouds, videos and depth image sequences.  

The main contributions of this work are various new practical solutions to the aforementioned tasks. 
This thesis first introduces a performance evaluation of 3D interest point detectors used in existing 3D object recognition systems. Quantitative analysis is performed using a new evaluation metric, which is used to measure the repeatability and accuracy of an interest point simultaneously. 3D interest points are also compared and analysed qualitatively. For recognition of 3D geometric shapes, this thesis proposes a weakly-supervised constellation model that performs classification and registration simultaneously, without using pose information in training. 
This thesis also introduces three different random forest-based algorithms for human action classification, 3D body pose estimation and 3D hand pose estimation respectively. 
This thesis presents a novel approach to classify human actions in real-time. A semantic texton forest is proposed to extract visual codewords from videos. Actions are subsequently recognised from the codewords using a new spatiotemporal matching technique. 
This thesis addresses 3D human pose estimation by combining video-based action recognition and 3D pose estimation. In the proposed approach, human action is considered as a strong cue for estimating 3D human pose from unconstrained, monocular videos. A hybrid classification-regression forest algorithm is introduced to perform human action classification and 3D pose estimation simultaneously.  
Furthermore, this thesis also presents a semi-supervised regression forest to estimate 3D hand poses from noisy depth images. The proposed approach adopts transductive learning to handle the discrepancies between realistic and synthetic training data. A data-driven, pseudo-kinematics technique is applied to refine the 3D joint locations estimated by the regression forest.  

\newpage
\section{Keywords} Object recognition, classification, pose estimation, 3D shape, human action, human pose estimation, hand pose estimation, random forest, interest point detector, performance evaluation, constellation model, random forest, semantic texton forest, regression forest, action-detection.