From 23fce70d8443736a6668dfb127e3335d1f4a261b Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Tue, 13 Aug 2019 12:24:56 +0200 Subject: [PATCH] Finished introduction (raw version) Signed-off-by: Jim Martens --- body.tex | 102 +++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 72 insertions(+), 30 deletions(-) diff --git a/body.tex b/body.tex index ecaeae0..56aeed3 100644 --- a/body.tex +++ b/body.tex @@ -37,13 +37,24 @@ regression and classification. Regression deals with any case where the goal for the network is to come close to an ideal function that connects all data points. Classification, however, describes tasks where the network is supposed to identify the -class of any given input. In this thesis, I will focus on -classification. +class of any given input. In this thesis, I will work with both. \subsection*{Object Detection in Open Set Conditions} +\begin{figure} + \centering + \includegraphics[scale=1.0]{open-set} + \caption{Open set problem: The test set contains classes that + were not present during training time. + Icons in this image have been taken from the COCO data set + website (\url{https://cocodataset.org/\#explore}) and were + vectorized afterwards. Resembles figure 1 of Miller et al.~\cite{Miller2018}.} + \label{fig:open-set} +\end{figure} + More specifically, I will look at object detection in the open set -conditions. In non-technical words this effectively describes +conditions (see figure \ref{fig:open-set}). +In non-technical words this effectively describes the kind of situation you encounter with CCTV cameras or robots outside of a laboratory. Both use cameras that record images. Subsequently a neural network analyses the image @@ -64,7 +75,7 @@ of the network as false positive. This goes back to the need for automatic explanation. Such a system should by itself recognize that the given object is unknown and hence mark any classification result of the network as meaningless. -Technically there are two slightly different things that deal +Technically there are two slightly different approaches that deal with this type of task: model uncertainty and novelty detection. Model uncertainty can be measured with dropout sampling. @@ -80,11 +91,10 @@ low this signifies a low uncertainty. An unknown object is more likely to cause high uncertainty which allows for an identification of false positive cases. -Novelty detection is the more direct approach to solve the task. +Novelty detection is another approach to solve the task. In the realm of neural networks it is usually done with the help of -auto-encoders that essentially solve a regression task of finding an -identity function that reconstructs on the output the given -input~\cite{Pimentel2014}. Auto-encoders have +auto-encoders that solve a regression task of finding an +identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have internally at least two components: an encoder, and a decoder or generator. The job of the encoder is to find an encoding that compresses the input as good as possible while simultaneously @@ -94,35 +104,44 @@ that reconstructs the input as accurate as possible. During training these auto-encoders learn to reproduce a certain group of object classes. The actual novelty detection takes place during testing: Given an image, and the output and loss of the -auto-encoder, a novelty score is calculated. A low novelty +auto-encoder, a novelty score is calculated. For some novelty +detection approaches the reconstruction loss is exactly the novelty +score, others consider more factors. A low novelty score signals a known object. The opposite is true for a high novelty score. \subsection*{Research Question} -Both presented approaches describe one way to solve the aforementioned -problem of explanation. They can be differentiated by measuring -their performance: the best theoretical idea is useless if it does -not perform well. Miller et al. have shown -some success in using dropout sampling. However, the many forward -passes during testing for every image seem computationally expensive. -In comparison a single run through a trained auto-encoder seems -intuitively to be faster. This leads to the hypothesis (see below). +Auto-encoders work well for data sets like MNIST~\cite{Deng2012} +but perform poorly on challenging real world data sets +like MS COCO~\cite{Lin2014}. Therefore, a comparison between +model uncertainty and novelty detection is considered out of +scope for this thesis. -For the purpose of this thesis, I will -use the work of Miller et al. as baseline to compare against. -They use the SSD~\cite{Liu2016} network for object detection, -modified by added dropout layers, and the SceneNet -RGB-D~\cite{McCormac2017} data set using the MS COCO~\cite{Lin2014} -classes. I will use a simple implementation of an auto-encoder and -novelty detection to compare with the work of Miller et al. -SSD for the object detection and SceneNet RGB-D as the data -set are used for both approaches. +Miller et al.~\cite{Miller2018} used an SSD pre-trained on COCO +without further fine-tuning on the SceneNet RGB-D data +set~\cite{McCormac2017} and reported good results regarding +open set error for an SSD variant with dropout sampling and entropy +thresholding. +If their results are generalizable it should be possible to replicate +the relative difference between the variants on the COCO data set. +This leads to the following hypothesis: \emph{Dropout sampling +delivers better object detection performance under open set +conditions compared to object detection without it.} -\paragraph{Hypothesis} Novelty detection using auto-encoders -delivers similar or better object detection performance under open set -conditions while being less computationally expensive compared to -dropout sampling. +For the purpose of this thesis, I will use the vanilla SSD as +baseline to compare against. In particular, vanilla SSD uses +a per-class confidence threshold of 0.01, an IOU threshold of 0.45 +for the non-maximum suppression, and a top k value of 200. +The effect of an entropy threshold is measured against this vanilla +SSD by applying entropy thresholds from 0.1 to 2.4 (limits taken from +Miller et al.). Dropout sampling is compared to vanilla SSD, both +with and without entropy thresholding. The number of forward +passes is varied to identify their impact. + +\paragraph{Hypothesis} Dropout sampling +delivers better object detection performance under open set +conditions compared to object detection without it. \paragraph{Contribution} The contribution of this thesis is a comparison between dropout @@ -131,8 +150,24 @@ of both for object detection in the open set conditions using the SSD network for object detection and the SceneNet RGB-D data set with MS COCO classes. +\subsection*{Reader's guide} + +First, chapter \ref{chap:background} presents related works and +provides the background for dropout sampling a.k.a Bayesian SSD. +Afterwards, chapter \ref{chap:methods} explains how the Bayesian SSD +works, and provides details about the software and source code design. +Chapter \ref{chap:experiments-results} presents the data sets, +the experimental setup, and the results. This is followed by +chapter \ref{chap:discussion} and \ref{chap:closing}, focusing on +the discussion and closing respectively. + +Therefore, the contribution is found in chapters \ref{chap:methods}, +\ref{chap:experiments-results}, and \ref{chap:discussion}. + \chapter{Background} +\label{chap:background} + This chapter will begin with an overview over previous works in the field of this thesis. Afterwards the theoretical foundations of the work of Miller et al.~\cite{Miller2018} and auto-encoders will @@ -582,6 +617,8 @@ the novelty test. Nonetheless it could be the better method. \chapter{Methods} +\label{chap:methods} + This chapter starts with the design of the source code; the source code is so much more than a means to an end. The thesis uses two data sets: MS COCO and SceneNet RGB-D; a section @@ -752,6 +789,8 @@ detection is out of the question under theses circumstances. \chapter{Experimental Setup and Results} +\label{chap:experiments-results} + \section{Data sets} \section{Experimental Setup} @@ -760,6 +799,8 @@ detection is out of the question under theses circumstances. \chapter{Discussion} +\label{chap:discussion} + To recap, the hypothesis is repeated here. \begin{description} @@ -786,3 +827,4 @@ was used. \chapter{Closing} +\label{chap:closing}