Finished introduction (raw version)
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
c9415c34fe
commit
23fce70d84
102
body.tex
102
body.tex
|
@ -37,13 +37,24 @@ regression and classification. Regression deals with any case
|
|||
where the goal for the network is to come close to an ideal
|
||||
function that connects all data points. Classification, however,
|
||||
describes tasks where the network is supposed to identify the
|
||||
class of any given input. In this thesis, I will focus on
|
||||
classification.
|
||||
class of any given input. In this thesis, I will work with both.
|
||||
|
||||
\subsection*{Object Detection in Open Set Conditions}
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[scale=1.0]{open-set}
|
||||
\caption{Open set problem: The test set contains classes that
|
||||
were not present during training time.
|
||||
Icons in this image have been taken from the COCO data set
|
||||
website (\url{https://cocodataset.org/\#explore}) and were
|
||||
vectorized afterwards. Resembles figure 1 of Miller et al.~\cite{Miller2018}.}
|
||||
\label{fig:open-set}
|
||||
\end{figure}
|
||||
|
||||
More specifically, I will look at object detection in the open set
|
||||
conditions. In non-technical words this effectively describes
|
||||
conditions (see figure \ref{fig:open-set}).
|
||||
In non-technical words this effectively describes
|
||||
the kind of situation you encounter with CCTV cameras or robots
|
||||
outside of a laboratory. Both use cameras that record
|
||||
images. Subsequently a neural network analyses the image
|
||||
|
@ -64,7 +75,7 @@ of the network as false positive.
|
|||
This goes back to the need for automatic explanation. Such a system
|
||||
should by itself recognize that the given object is unknown and
|
||||
hence mark any classification result of the network as meaningless.
|
||||
Technically there are two slightly different things that deal
|
||||
Technically there are two slightly different approaches that deal
|
||||
with this type of task: model uncertainty and novelty detection.
|
||||
|
||||
Model uncertainty can be measured with dropout sampling.
|
||||
|
@ -80,11 +91,10 @@ low this signifies a low uncertainty. An unknown object is more
|
|||
likely to cause high uncertainty which allows for an identification
|
||||
of false positive cases.
|
||||
|
||||
Novelty detection is the more direct approach to solve the task.
|
||||
Novelty detection is another approach to solve the task.
|
||||
In the realm of neural networks it is usually done with the help of
|
||||
auto-encoders that essentially solve a regression task of finding an
|
||||
identity function that reconstructs on the output the given
|
||||
input~\cite{Pimentel2014}. Auto-encoders have
|
||||
auto-encoders that solve a regression task of finding an
|
||||
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have
|
||||
internally at least two components: an encoder, and a decoder or
|
||||
generator. The job of the encoder is to find an encoding that
|
||||
compresses the input as good as possible while simultaneously
|
||||
|
@ -94,35 +104,44 @@ that reconstructs the input as accurate as possible. During
|
|||
training these auto-encoders learn to reproduce a certain group
|
||||
of object classes. The actual novelty detection takes place
|
||||
during testing: Given an image, and the output and loss of the
|
||||
auto-encoder, a novelty score is calculated. A low novelty
|
||||
auto-encoder, a novelty score is calculated. For some novelty
|
||||
detection approaches the reconstruction loss is exactly the novelty
|
||||
score, others consider more factors. A low novelty
|
||||
score signals a known object. The opposite is true for a high
|
||||
novelty score.
|
||||
|
||||
\subsection*{Research Question}
|
||||
|
||||
Both presented approaches describe one way to solve the aforementioned
|
||||
problem of explanation. They can be differentiated by measuring
|
||||
their performance: the best theoretical idea is useless if it does
|
||||
not perform well. Miller et al. have shown
|
||||
some success in using dropout sampling. However, the many forward
|
||||
passes during testing for every image seem computationally expensive.
|
||||
In comparison a single run through a trained auto-encoder seems
|
||||
intuitively to be faster. This leads to the hypothesis (see below).
|
||||
Auto-encoders work well for data sets like MNIST~\cite{Deng2012}
|
||||
but perform poorly on challenging real world data sets
|
||||
like MS COCO~\cite{Lin2014}. Therefore, a comparison between
|
||||
model uncertainty and novelty detection is considered out of
|
||||
scope for this thesis.
|
||||
|
||||
For the purpose of this thesis, I will
|
||||
use the work of Miller et al. as baseline to compare against.
|
||||
They use the SSD~\cite{Liu2016} network for object detection,
|
||||
modified by added dropout layers, and the SceneNet
|
||||
RGB-D~\cite{McCormac2017} data set using the MS COCO~\cite{Lin2014}
|
||||
classes. I will use a simple implementation of an auto-encoder and
|
||||
novelty detection to compare with the work of Miller et al.
|
||||
SSD for the object detection and SceneNet RGB-D as the data
|
||||
set are used for both approaches.
|
||||
Miller et al.~\cite{Miller2018} used an SSD pre-trained on COCO
|
||||
without further fine-tuning on the SceneNet RGB-D data
|
||||
set~\cite{McCormac2017} and reported good results regarding
|
||||
open set error for an SSD variant with dropout sampling and entropy
|
||||
thresholding.
|
||||
If their results are generalizable it should be possible to replicate
|
||||
the relative difference between the variants on the COCO data set.
|
||||
This leads to the following hypothesis: \emph{Dropout sampling
|
||||
delivers better object detection performance under open set
|
||||
conditions compared to object detection without it.}
|
||||
|
||||
\paragraph{Hypothesis} Novelty detection using auto-encoders
|
||||
delivers similar or better object detection performance under open set
|
||||
conditions while being less computationally expensive compared to
|
||||
dropout sampling.
|
||||
For the purpose of this thesis, I will use the vanilla SSD as
|
||||
baseline to compare against. In particular, vanilla SSD uses
|
||||
a per-class confidence threshold of 0.01, an IOU threshold of 0.45
|
||||
for the non-maximum suppression, and a top k value of 200.
|
||||
The effect of an entropy threshold is measured against this vanilla
|
||||
SSD by applying entropy thresholds from 0.1 to 2.4 (limits taken from
|
||||
Miller et al.). Dropout sampling is compared to vanilla SSD, both
|
||||
with and without entropy thresholding. The number of forward
|
||||
passes is varied to identify their impact.
|
||||
|
||||
\paragraph{Hypothesis} Dropout sampling
|
||||
delivers better object detection performance under open set
|
||||
conditions compared to object detection without it.
|
||||
|
||||
\paragraph{Contribution}
|
||||
The contribution of this thesis is a comparison between dropout
|
||||
|
@ -131,8 +150,24 @@ of both for object detection in the open set conditions using
|
|||
the SSD network for object detection and the SceneNet RGB-D data set
|
||||
with MS COCO classes.
|
||||
|
||||
\subsection*{Reader's guide}
|
||||
|
||||
First, chapter \ref{chap:background} presents related works and
|
||||
provides the background for dropout sampling a.k.a Bayesian SSD.
|
||||
Afterwards, chapter \ref{chap:methods} explains how the Bayesian SSD
|
||||
works, and provides details about the software and source code design.
|
||||
Chapter \ref{chap:experiments-results} presents the data sets,
|
||||
the experimental setup, and the results. This is followed by
|
||||
chapter \ref{chap:discussion} and \ref{chap:closing}, focusing on
|
||||
the discussion and closing respectively.
|
||||
|
||||
Therefore, the contribution is found in chapters \ref{chap:methods},
|
||||
\ref{chap:experiments-results}, and \ref{chap:discussion}.
|
||||
|
||||
\chapter{Background}
|
||||
|
||||
\label{chap:background}
|
||||
|
||||
This chapter will begin with an overview over previous works
|
||||
in the field of this thesis. Afterwards the theoretical foundations
|
||||
of the work of Miller et al.~\cite{Miller2018} and auto-encoders will
|
||||
|
@ -582,6 +617,8 @@ the novelty test. Nonetheless it could be the better method.
|
|||
|
||||
\chapter{Methods}
|
||||
|
||||
\label{chap:methods}
|
||||
|
||||
This chapter starts with the design of the source code; the
|
||||
source code is so much more than a means to an end. The thesis
|
||||
uses two data sets: MS COCO and SceneNet RGB-D; a section
|
||||
|
@ -752,6 +789,8 @@ detection is out of the question under theses circumstances.
|
|||
|
||||
\chapter{Experimental Setup and Results}
|
||||
|
||||
\label{chap:experiments-results}
|
||||
|
||||
\section{Data sets}
|
||||
|
||||
\section{Experimental Setup}
|
||||
|
@ -760,6 +799,8 @@ detection is out of the question under theses circumstances.
|
|||
|
||||
\chapter{Discussion}
|
||||
|
||||
\label{chap:discussion}
|
||||
|
||||
To recap, the hypothesis is repeated here.
|
||||
|
||||
\begin{description}
|
||||
|
@ -786,3 +827,4 @@ was used.
|
|||
|
||||
|
||||
\chapter{Closing}
|
||||
\label{chap:closing}
|
||||
|
|
Loading…
Reference in New Issue