\author{Jim 2martens}
\title{Deep Sliding Shapes: A Review}
\item object detecion is central task for neural networks
\item combination of classification and localization tasks
\item output are usually bounding boxes and classifications
\item 2D object detection very mature with Single Shot MultiBox Detector\cite{Liu2016}
\item with more availability of depth data, usage of depth becomes more
\item early approaches use depth as fourth channel in 2D object detection,
for example Depth RCNN\cite{Gupta2015}
\item Deep Sliding Shapes\cite{Song2016} uses 3D data for actual 3D deep
learning and uses 2D object detectors
\item encoding 3D representation and normalization
\item multi-scale 3D region proposal network
\item joint amodal object recognition network
\begin{frame}{Representation and Normalization}
\item raw 3D space divided into equally spaced 3D voxel grid
\item data encoded by Truncated Signed Distance Function
\item each voxel stores distance from its center to surface of input depth
map and direction of each surface point
\item every scene is rotated to align with gravity direction
\item major room directions are used for proposal orientations
\begin{frame}{Region Proposal Network}
\item proposes a few interesting regions for the object recognition network
\item each region proposal corresponds to one anchor box
\item two scales are used since anchor box size varies a lot (from 0.3
to 2 meters)
\item a full 3D convolutional architecture is used
\item after the calculation of the region proposals multiple bars have
to be met for regions for them to be proposed
\item in the end only the top 2000 regions move on (after the convolution
with only dropping all regions with point density lower than 0.005
points per cubic centimeter a total of 107674 regions remain on average)
\begin{frame}{Object Recognition Network}
\item starts with both 3D and 2D object recognition networks
\item VGGnet pretrained on ImageNet is used for extracting colour features
\item resulting feature vectors of both networks are concatenated
\item at the end two separate fully connected layers predict object label
and 3D bounding box
\item some outlier protection measures are applied
\section{Experimental Results}
\item evaluated on NYUv2\cite{Silberman2012} and SUN RGB-D\cite{Song2015}
\item threshold of 0.25 used for average recall of proposal generation and
average precision of detection
\item ground truth bounding boxes obtained from SUN RGB-D
\item single-scale RPN, multi-scale RPN and multi-scale RPN with RGB colour
usage (RGB colour encoded in 3D TSDF) were compared against each
other and the baselines using the NYU data set
\item 3D selective search and naive 2D to 3D conversion used as baselines
\item second experiment tested ORN with different region proposals
\item works well on non-planar objects with depth information
\item 2D component helps in distinguishing similar shaped objects
\item 3D Deep Sliding Shapes outperforms chosen state-of-the-art methods
\item idea to use 3D data directly very intruiging
\item high-level structure of region proposal followed by object recognition
is visible in more recent approaches like Frustum Pointnet\cite{Qi2017}
as well
\item motivations for used data sets NYUv2 and SUN RGB-D unclear
\item no information on process of "obtaining" ground truth bounding boxes
from SUN RGB-D data set
\item no implementation details provided