Finished introduction (raw version)

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-08-13 12:24:56 +02:00
parent c9415c34fe
commit 23fce70d84

102
body.tex
View File

@ -37,13 +37,24 @@ regression and classification. Regression deals with any case
where the goal for the network is to come close to an ideal where the goal for the network is to come close to an ideal
function that connects all data points. Classification, however, function that connects all data points. Classification, however,
describes tasks where the network is supposed to identify the describes tasks where the network is supposed to identify the
class of any given input. In this thesis, I will focus on class of any given input. In this thesis, I will work with both.
classification.
\subsection*{Object Detection in Open Set Conditions} \subsection*{Object Detection in Open Set Conditions}
\begin{figure}
\centering
\includegraphics[scale=1.0]{open-set}
\caption{Open set problem: The test set contains classes that
were not present during training time.
Icons in this image have been taken from the COCO data set
website (\url{https://cocodataset.org/\#explore}) and were
vectorized afterwards. Resembles figure 1 of Miller et al.~\cite{Miller2018}.}
\label{fig:open-set}
\end{figure}
More specifically, I will look at object detection in the open set More specifically, I will look at object detection in the open set
conditions. In non-technical words this effectively describes conditions (see figure \ref{fig:open-set}).
In non-technical words this effectively describes
the kind of situation you encounter with CCTV cameras or robots the kind of situation you encounter with CCTV cameras or robots
outside of a laboratory. Both use cameras that record outside of a laboratory. Both use cameras that record
images. Subsequently a neural network analyses the image images. Subsequently a neural network analyses the image
@ -64,7 +75,7 @@ of the network as false positive.
This goes back to the need for automatic explanation. Such a system This goes back to the need for automatic explanation. Such a system
should by itself recognize that the given object is unknown and should by itself recognize that the given object is unknown and
hence mark any classification result of the network as meaningless. hence mark any classification result of the network as meaningless.
Technically there are two slightly different things that deal Technically there are two slightly different approaches that deal
with this type of task: model uncertainty and novelty detection. with this type of task: model uncertainty and novelty detection.
Model uncertainty can be measured with dropout sampling. Model uncertainty can be measured with dropout sampling.
@ -80,11 +91,10 @@ low this signifies a low uncertainty. An unknown object is more
likely to cause high uncertainty which allows for an identification likely to cause high uncertainty which allows for an identification
of false positive cases. of false positive cases.
Novelty detection is the more direct approach to solve the task. Novelty detection is another approach to solve the task.
In the realm of neural networks it is usually done with the help of In the realm of neural networks it is usually done with the help of
auto-encoders that essentially solve a regression task of finding an auto-encoders that solve a regression task of finding an
identity function that reconstructs on the output the given identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have
input~\cite{Pimentel2014}. Auto-encoders have
internally at least two components: an encoder, and a decoder or internally at least two components: an encoder, and a decoder or
generator. The job of the encoder is to find an encoding that generator. The job of the encoder is to find an encoding that
compresses the input as good as possible while simultaneously compresses the input as good as possible while simultaneously
@ -94,35 +104,44 @@ that reconstructs the input as accurate as possible. During
training these auto-encoders learn to reproduce a certain group training these auto-encoders learn to reproduce a certain group
of object classes. The actual novelty detection takes place of object classes. The actual novelty detection takes place
during testing: Given an image, and the output and loss of the during testing: Given an image, and the output and loss of the
auto-encoder, a novelty score is calculated. A low novelty auto-encoder, a novelty score is calculated. For some novelty
detection approaches the reconstruction loss is exactly the novelty
score, others consider more factors. A low novelty
score signals a known object. The opposite is true for a high score signals a known object. The opposite is true for a high
novelty score. novelty score.
\subsection*{Research Question} \subsection*{Research Question}
Both presented approaches describe one way to solve the aforementioned Auto-encoders work well for data sets like MNIST~\cite{Deng2012}
problem of explanation. They can be differentiated by measuring but perform poorly on challenging real world data sets
their performance: the best theoretical idea is useless if it does like MS COCO~\cite{Lin2014}. Therefore, a comparison between
not perform well. Miller et al. have shown model uncertainty and novelty detection is considered out of
some success in using dropout sampling. However, the many forward scope for this thesis.
passes during testing for every image seem computationally expensive.
In comparison a single run through a trained auto-encoder seems
intuitively to be faster. This leads to the hypothesis (see below).
For the purpose of this thesis, I will Miller et al.~\cite{Miller2018} used an SSD pre-trained on COCO
use the work of Miller et al. as baseline to compare against. without further fine-tuning on the SceneNet RGB-D data
They use the SSD~\cite{Liu2016} network for object detection, set~\cite{McCormac2017} and reported good results regarding
modified by added dropout layers, and the SceneNet open set error for an SSD variant with dropout sampling and entropy
RGB-D~\cite{McCormac2017} data set using the MS COCO~\cite{Lin2014} thresholding.
classes. I will use a simple implementation of an auto-encoder and If their results are generalizable it should be possible to replicate
novelty detection to compare with the work of Miller et al. the relative difference between the variants on the COCO data set.
SSD for the object detection and SceneNet RGB-D as the data This leads to the following hypothesis: \emph{Dropout sampling
set are used for both approaches. delivers better object detection performance under open set
conditions compared to object detection without it.}
\paragraph{Hypothesis} Novelty detection using auto-encoders For the purpose of this thesis, I will use the vanilla SSD as
delivers similar or better object detection performance under open set baseline to compare against. In particular, vanilla SSD uses
conditions while being less computationally expensive compared to a per-class confidence threshold of 0.01, an IOU threshold of 0.45
dropout sampling. for the non-maximum suppression, and a top k value of 200.
The effect of an entropy threshold is measured against this vanilla
SSD by applying entropy thresholds from 0.1 to 2.4 (limits taken from
Miller et al.). Dropout sampling is compared to vanilla SSD, both
with and without entropy thresholding. The number of forward
passes is varied to identify their impact.
\paragraph{Hypothesis} Dropout sampling
delivers better object detection performance under open set
conditions compared to object detection without it.
\paragraph{Contribution} \paragraph{Contribution}
The contribution of this thesis is a comparison between dropout The contribution of this thesis is a comparison between dropout
@ -131,8 +150,24 @@ of both for object detection in the open set conditions using
the SSD network for object detection and the SceneNet RGB-D data set the SSD network for object detection and the SceneNet RGB-D data set
with MS COCO classes. with MS COCO classes.
\subsection*{Reader's guide}
First, chapter \ref{chap:background} presents related works and
provides the background for dropout sampling a.k.a Bayesian SSD.
Afterwards, chapter \ref{chap:methods} explains how the Bayesian SSD
works, and provides details about the software and source code design.
Chapter \ref{chap:experiments-results} presents the data sets,
the experimental setup, and the results. This is followed by
chapter \ref{chap:discussion} and \ref{chap:closing}, focusing on
the discussion and closing respectively.
Therefore, the contribution is found in chapters \ref{chap:methods},
\ref{chap:experiments-results}, and \ref{chap:discussion}.
\chapter{Background} \chapter{Background}
\label{chap:background}
This chapter will begin with an overview over previous works This chapter will begin with an overview over previous works
in the field of this thesis. Afterwards the theoretical foundations in the field of this thesis. Afterwards the theoretical foundations
of the work of Miller et al.~\cite{Miller2018} and auto-encoders will of the work of Miller et al.~\cite{Miller2018} and auto-encoders will
@ -582,6 +617,8 @@ the novelty test. Nonetheless it could be the better method.
\chapter{Methods} \chapter{Methods}
\label{chap:methods}
This chapter starts with the design of the source code; the This chapter starts with the design of the source code; the
source code is so much more than a means to an end. The thesis source code is so much more than a means to an end. The thesis
uses two data sets: MS COCO and SceneNet RGB-D; a section uses two data sets: MS COCO and SceneNet RGB-D; a section
@ -752,6 +789,8 @@ detection is out of the question under theses circumstances.
\chapter{Experimental Setup and Results} \chapter{Experimental Setup and Results}
\label{chap:experiments-results}
\section{Data sets} \section{Data sets}
\section{Experimental Setup} \section{Experimental Setup}
@ -760,6 +799,8 @@ detection is out of the question under theses circumstances.
\chapter{Discussion} \chapter{Discussion}
\label{chap:discussion}
To recap, the hypothesis is repeated here. To recap, the hypothesis is repeated here.
\begin{description} \begin{description}
@ -786,3 +827,4 @@ was used.
\chapter{Closing} \chapter{Closing}
\label{chap:closing}