Expanded methods chapter with vanilla SSD explanation

Signed-off-by: Jim Martens <github@2martens.de>
2019-09-19 13:56:55 +02:00
parent 8ba241a8d7
commit 59b09c45ff
1 changed files with 41 additions and 27 deletions
--- a/body.tex
+++ b/body.tex
@ -129,7 +129,7 @@ This leads to the following hypothesis: \emph{Dropout sampling
 delivers better object detection performance under open set
 conditions compared to object detection without it.}
-For the purpose of this thesis, I will use the vanilla SSD as
+For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as
 baseline to compare against. In particular, vanilla SSD uses
 a per-class confidence threshold of 0.01, an IOU threshold of 0.45
 for the non-maximum suppression, and a top k value of 200.
@ -421,16 +421,9 @@ be used to identify and reject these false positive cases.
 \label{chap:methods}
-This chapter explains the functionality of the Bayesian SSD and the
+This chapter explains the functionality of vanilla SSD, Bayesian SSD, and the decoding pipelines.
 decoding pipelines.
-\section{Bayesian SSD for Model Uncertainty}
+\section{Vanilla SSD}
 Bayesian SSD adds dropout sampling to the vanilla SSD. First,
 the model architecture will be explained, followed by details on
 the uncertainty calculation, and implementation details.
 \subsection{Model Architecture}
 \begin{figure}
    \centering
@ -440,11 +433,29 @@ the uncertainty calculation, and implementation details.
    \label{fig:vanilla-ssd}
 \end{figure}
-Vanilla SSD is based upon the VGG-16 network (see figure \ref{fig:vanilla-ssd}) and adds extra feature layers. These layers
+Vanilla SSD is based upon the VGG-16 network (see figure
-predict the offsets to the anchor boxes, which have different sizes
+\ref{fig:vanilla-ssd}) and adds extra feature layers. The entire
-and aspect ratios. The feature layers also predict the
+image (always size 300x300) is divided up into anchor boxes. During
-corresponding confidences. By comparison, Bayesian SSD only adds
+training, each of these boxes is mapped to a ground truth box or
-two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ssd}).
+background. For every anchor box the offset to
 the object, and the class confidences are calculated. The output of the
 SSD network are the predictions with class confidences, offsets to the
 anchor box, anchor box coordinates, and variance. The model loss is a
 weighted sum of localisation and confidence loss. As the network
 has a fixed number of anchor boxes, every forward pass creates the same
 number of detections - 8732 in the case of SSD 300x300.
 Notably, the object proposals are made in a single run for an image -
 single shot.
 Other techniques like Faster R-CNN employ region proposals
 and pooling. For more detailed information on SSD, please refer to
 Liu et al.~\cite{Liu2016}.
 \section{Bayesian SSD for Model Uncertainty}
 Networks trained with dropout are a general approximate Bayesian model~\cite{Gal2017}. As such, they can be used for everything a true
 Bayesian model could be used for. The idea is applied to SSD in this
 thesis: two dropout layers are added to vanilla SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}).
 \begin{figure}
    \centering
@ -454,14 +465,14 @@ two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ss
    \label{fig:bayesian-ssd}
 \end{figure}
-\subsection{Model Uncertainty}
+Motivation for this is model uncertainty: an uncertain model will
-
+predict different classes for the same object on the same image across
-Dropout sampling measures model uncertainty with the help of
+multiple forward passes. This uncertainty is measured with entropy:
-entropy: every forward pass creates predictions, these are
+every forward pass results in predictions, these are partitioned into
-partitioned into observations, and then their entropy is calculated.
+observations, and subsequently their entropy is calculated.
-Entropy works to detect uncertainty because uncertain networks
+A higher entropy indicates a more uniform distribution of confidences
-will produce different classifications for the same object in an
+whereas a lower entropy indicates a larger confidence in one class
-image across multiple forward passes.
+and very low confidences in other classes.
 \subsection{Implementation Details}
@ -469,8 +480,11 @@ For this thesis, an SSD implementation based on Tensorflow~\cite{Abadi2015} and
 Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
 was used. It was modified to support entropy thresholding,
 partitioning of observations, and dropout
-layers in the SSD model. %Entropy thresholding takes place before
+layers in the SSD model. Entropy thresholding takes place before
-%the per-class confidence threshold is applied.
+the per-class confidence threshold is applied.
 The Bayesian variant was not fine-tuned and operates with the same
 weights that vanilla SSD uses as well. 
 \section{Decoding Pipelines}
@ -624,8 +638,8 @@ an open set condition. To this end, the weights for the last
 All images of the minival2014 data set were used but only ground truth
 belonging to the first 60 classes was loaded. The remaining 20
-classes were considered "unknown" and were not presented with bounding
+classes were considered "unknown" and no ground truth bounding
-boxes during the inference phase.
+boxes for them were provided during the inference phase.
 \section{Experimental Setup}