Finished introduction (raw version)
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
102
body.tex
102
body.tex
@ -37,13 +37,24 @@ regression and classification. Regression deals with any case
|
|||||||
where the goal for the network is to come close to an ideal
|
where the goal for the network is to come close to an ideal
|
||||||
function that connects all data points. Classification, however,
|
function that connects all data points. Classification, however,
|
||||||
describes tasks where the network is supposed to identify the
|
describes tasks where the network is supposed to identify the
|
||||||
class of any given input. In this thesis, I will focus on
|
class of any given input. In this thesis, I will work with both.
|
||||||
classification.
|
|
||||||
|
|
||||||
\subsection*{Object Detection in Open Set Conditions}
|
\subsection*{Object Detection in Open Set Conditions}
|
||||||
|
|
||||||
|
\begin{figure}
|
||||||
|
\centering
|
||||||
|
\includegraphics[scale=1.0]{open-set}
|
||||||
|
\caption{Open set problem: The test set contains classes that
|
||||||
|
were not present during training time.
|
||||||
|
Icons in this image have been taken from the COCO data set
|
||||||
|
website (\url{https://cocodataset.org/\#explore}) and were
|
||||||
|
vectorized afterwards. Resembles figure 1 of Miller et al.~\cite{Miller2018}.}
|
||||||
|
\label{fig:open-set}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
More specifically, I will look at object detection in the open set
|
More specifically, I will look at object detection in the open set
|
||||||
conditions. In non-technical words this effectively describes
|
conditions (see figure \ref{fig:open-set}).
|
||||||
|
In non-technical words this effectively describes
|
||||||
the kind of situation you encounter with CCTV cameras or robots
|
the kind of situation you encounter with CCTV cameras or robots
|
||||||
outside of a laboratory. Both use cameras that record
|
outside of a laboratory. Both use cameras that record
|
||||||
images. Subsequently a neural network analyses the image
|
images. Subsequently a neural network analyses the image
|
||||||
@ -64,7 +75,7 @@ of the network as false positive.
|
|||||||
This goes back to the need for automatic explanation. Such a system
|
This goes back to the need for automatic explanation. Such a system
|
||||||
should by itself recognize that the given object is unknown and
|
should by itself recognize that the given object is unknown and
|
||||||
hence mark any classification result of the network as meaningless.
|
hence mark any classification result of the network as meaningless.
|
||||||
Technically there are two slightly different things that deal
|
Technically there are two slightly different approaches that deal
|
||||||
with this type of task: model uncertainty and novelty detection.
|
with this type of task: model uncertainty and novelty detection.
|
||||||
|
|
||||||
Model uncertainty can be measured with dropout sampling.
|
Model uncertainty can be measured with dropout sampling.
|
||||||
@ -80,11 +91,10 @@ low this signifies a low uncertainty. An unknown object is more
|
|||||||
likely to cause high uncertainty which allows for an identification
|
likely to cause high uncertainty which allows for an identification
|
||||||
of false positive cases.
|
of false positive cases.
|
||||||
|
|
||||||
Novelty detection is the more direct approach to solve the task.
|
Novelty detection is another approach to solve the task.
|
||||||
In the realm of neural networks it is usually done with the help of
|
In the realm of neural networks it is usually done with the help of
|
||||||
auto-encoders that essentially solve a regression task of finding an
|
auto-encoders that solve a regression task of finding an
|
||||||
identity function that reconstructs on the output the given
|
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have
|
||||||
input~\cite{Pimentel2014}. Auto-encoders have
|
|
||||||
internally at least two components: an encoder, and a decoder or
|
internally at least two components: an encoder, and a decoder or
|
||||||
generator. The job of the encoder is to find an encoding that
|
generator. The job of the encoder is to find an encoding that
|
||||||
compresses the input as good as possible while simultaneously
|
compresses the input as good as possible while simultaneously
|
||||||
@ -94,35 +104,44 @@ that reconstructs the input as accurate as possible. During
|
|||||||
training these auto-encoders learn to reproduce a certain group
|
training these auto-encoders learn to reproduce a certain group
|
||||||
of object classes. The actual novelty detection takes place
|
of object classes. The actual novelty detection takes place
|
||||||
during testing: Given an image, and the output and loss of the
|
during testing: Given an image, and the output and loss of the
|
||||||
auto-encoder, a novelty score is calculated. A low novelty
|
auto-encoder, a novelty score is calculated. For some novelty
|
||||||
|
detection approaches the reconstruction loss is exactly the novelty
|
||||||
|
score, others consider more factors. A low novelty
|
||||||
score signals a known object. The opposite is true for a high
|
score signals a known object. The opposite is true for a high
|
||||||
novelty score.
|
novelty score.
|
||||||
|
|
||||||
\subsection*{Research Question}
|
\subsection*{Research Question}
|
||||||
|
|
||||||
Both presented approaches describe one way to solve the aforementioned
|
Auto-encoders work well for data sets like MNIST~\cite{Deng2012}
|
||||||
problem of explanation. They can be differentiated by measuring
|
but perform poorly on challenging real world data sets
|
||||||
their performance: the best theoretical idea is useless if it does
|
like MS COCO~\cite{Lin2014}. Therefore, a comparison between
|
||||||
not perform well. Miller et al. have shown
|
model uncertainty and novelty detection is considered out of
|
||||||
some success in using dropout sampling. However, the many forward
|
scope for this thesis.
|
||||||
passes during testing for every image seem computationally expensive.
|
|
||||||
In comparison a single run through a trained auto-encoder seems
|
|
||||||
intuitively to be faster. This leads to the hypothesis (see below).
|
|
||||||
|
|
||||||
For the purpose of this thesis, I will
|
Miller et al.~\cite{Miller2018} used an SSD pre-trained on COCO
|
||||||
use the work of Miller et al. as baseline to compare against.
|
without further fine-tuning on the SceneNet RGB-D data
|
||||||
They use the SSD~\cite{Liu2016} network for object detection,
|
set~\cite{McCormac2017} and reported good results regarding
|
||||||
modified by added dropout layers, and the SceneNet
|
open set error for an SSD variant with dropout sampling and entropy
|
||||||
RGB-D~\cite{McCormac2017} data set using the MS COCO~\cite{Lin2014}
|
thresholding.
|
||||||
classes. I will use a simple implementation of an auto-encoder and
|
If their results are generalizable it should be possible to replicate
|
||||||
novelty detection to compare with the work of Miller et al.
|
the relative difference between the variants on the COCO data set.
|
||||||
SSD for the object detection and SceneNet RGB-D as the data
|
This leads to the following hypothesis: \emph{Dropout sampling
|
||||||
set are used for both approaches.
|
delivers better object detection performance under open set
|
||||||
|
conditions compared to object detection without it.}
|
||||||
|
|
||||||
\paragraph{Hypothesis} Novelty detection using auto-encoders
|
For the purpose of this thesis, I will use the vanilla SSD as
|
||||||
delivers similar or better object detection performance under open set
|
baseline to compare against. In particular, vanilla SSD uses
|
||||||
conditions while being less computationally expensive compared to
|
a per-class confidence threshold of 0.01, an IOU threshold of 0.45
|
||||||
dropout sampling.
|
for the non-maximum suppression, and a top k value of 200.
|
||||||
|
The effect of an entropy threshold is measured against this vanilla
|
||||||
|
SSD by applying entropy thresholds from 0.1 to 2.4 (limits taken from
|
||||||
|
Miller et al.). Dropout sampling is compared to vanilla SSD, both
|
||||||
|
with and without entropy thresholding. The number of forward
|
||||||
|
passes is varied to identify their impact.
|
||||||
|
|
||||||
|
\paragraph{Hypothesis} Dropout sampling
|
||||||
|
delivers better object detection performance under open set
|
||||||
|
conditions compared to object detection without it.
|
||||||
|
|
||||||
\paragraph{Contribution}
|
\paragraph{Contribution}
|
||||||
The contribution of this thesis is a comparison between dropout
|
The contribution of this thesis is a comparison between dropout
|
||||||
@ -131,8 +150,24 @@ of both for object detection in the open set conditions using
|
|||||||
the SSD network for object detection and the SceneNet RGB-D data set
|
the SSD network for object detection and the SceneNet RGB-D data set
|
||||||
with MS COCO classes.
|
with MS COCO classes.
|
||||||
|
|
||||||
|
\subsection*{Reader's guide}
|
||||||
|
|
||||||
|
First, chapter \ref{chap:background} presents related works and
|
||||||
|
provides the background for dropout sampling a.k.a Bayesian SSD.
|
||||||
|
Afterwards, chapter \ref{chap:methods} explains how the Bayesian SSD
|
||||||
|
works, and provides details about the software and source code design.
|
||||||
|
Chapter \ref{chap:experiments-results} presents the data sets,
|
||||||
|
the experimental setup, and the results. This is followed by
|
||||||
|
chapter \ref{chap:discussion} and \ref{chap:closing}, focusing on
|
||||||
|
the discussion and closing respectively.
|
||||||
|
|
||||||
|
Therefore, the contribution is found in chapters \ref{chap:methods},
|
||||||
|
\ref{chap:experiments-results}, and \ref{chap:discussion}.
|
||||||
|
|
||||||
\chapter{Background}
|
\chapter{Background}
|
||||||
|
|
||||||
|
\label{chap:background}
|
||||||
|
|
||||||
This chapter will begin with an overview over previous works
|
This chapter will begin with an overview over previous works
|
||||||
in the field of this thesis. Afterwards the theoretical foundations
|
in the field of this thesis. Afterwards the theoretical foundations
|
||||||
of the work of Miller et al.~\cite{Miller2018} and auto-encoders will
|
of the work of Miller et al.~\cite{Miller2018} and auto-encoders will
|
||||||
@ -582,6 +617,8 @@ the novelty test. Nonetheless it could be the better method.
|
|||||||
|
|
||||||
\chapter{Methods}
|
\chapter{Methods}
|
||||||
|
|
||||||
|
\label{chap:methods}
|
||||||
|
|
||||||
This chapter starts with the design of the source code; the
|
This chapter starts with the design of the source code; the
|
||||||
source code is so much more than a means to an end. The thesis
|
source code is so much more than a means to an end. The thesis
|
||||||
uses two data sets: MS COCO and SceneNet RGB-D; a section
|
uses two data sets: MS COCO and SceneNet RGB-D; a section
|
||||||
@ -752,6 +789,8 @@ detection is out of the question under theses circumstances.
|
|||||||
|
|
||||||
\chapter{Experimental Setup and Results}
|
\chapter{Experimental Setup and Results}
|
||||||
|
|
||||||
|
\label{chap:experiments-results}
|
||||||
|
|
||||||
\section{Data sets}
|
\section{Data sets}
|
||||||
|
|
||||||
\section{Experimental Setup}
|
\section{Experimental Setup}
|
||||||
@ -760,6 +799,8 @@ detection is out of the question under theses circumstances.
|
|||||||
|
|
||||||
\chapter{Discussion}
|
\chapter{Discussion}
|
||||||
|
|
||||||
|
\label{chap:discussion}
|
||||||
|
|
||||||
To recap, the hypothesis is repeated here.
|
To recap, the hypothesis is repeated here.
|
||||||
|
|
||||||
\begin{description}
|
\begin{description}
|
||||||
@ -786,3 +827,4 @@ was used.
|
|||||||
|
|
||||||
|
|
||||||
\chapter{Closing}
|
\chapter{Closing}
|
||||||
|
\label{chap:closing}
|
||||||
|
|||||||
Reference in New Issue
Block a user