From 59b09c45ffe36495fe348e1f42608fccd7462df4 Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Thu, 19 Sep 2019 13:56:55 +0200 Subject: [PATCH] Expanded methods chapter with vanilla SSD explanation Signed-off-by: Jim Martens --- body.tex | 68 ++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 41 insertions(+), 27 deletions(-) diff --git a/body.tex b/body.tex index 13b1658..d02667e 100644 --- a/body.tex +++ b/body.tex @@ -129,7 +129,7 @@ This leads to the following hypothesis: \emph{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it.} -For the purpose of this thesis, I will use the vanilla SSD as +For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as baseline to compare against. In particular, vanilla SSD uses a per-class confidence threshold of 0.01, an IOU threshold of 0.45 for the non-maximum suppression, and a top k value of 200. @@ -421,16 +421,9 @@ be used to identify and reject these false positive cases. \label{chap:methods} -This chapter explains the functionality of the Bayesian SSD and the -decoding pipelines. +This chapter explains the functionality of vanilla SSD, Bayesian SSD, and the decoding pipelines. -\section{Bayesian SSD for Model Uncertainty} - -Bayesian SSD adds dropout sampling to the vanilla SSD. First, -the model architecture will be explained, followed by details on -the uncertainty calculation, and implementation details. - -\subsection{Model Architecture} +\section{Vanilla SSD} \begin{figure} \centering @@ -440,11 +433,29 @@ the uncertainty calculation, and implementation details. \label{fig:vanilla-ssd} \end{figure} -Vanilla SSD is based upon the VGG-16 network (see figure \ref{fig:vanilla-ssd}) and adds extra feature layers. These layers -predict the offsets to the anchor boxes, which have different sizes -and aspect ratios. The feature layers also predict the -corresponding confidences. By comparison, Bayesian SSD only adds -two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ssd}). +Vanilla SSD is based upon the VGG-16 network (see figure +\ref{fig:vanilla-ssd}) and adds extra feature layers. The entire +image (always size 300x300) is divided up into anchor boxes. During +training, each of these boxes is mapped to a ground truth box or +background. For every anchor box the offset to +the object, and the class confidences are calculated. The output of the +SSD network are the predictions with class confidences, offsets to the +anchor box, anchor box coordinates, and variance. The model loss is a +weighted sum of localisation and confidence loss. As the network +has a fixed number of anchor boxes, every forward pass creates the same +number of detections - 8732 in the case of SSD 300x300. + +Notably, the object proposals are made in a single run for an image - +single shot. +Other techniques like Faster R-CNN employ region proposals +and pooling. For more detailed information on SSD, please refer to +Liu et al.~\cite{Liu2016}. + +\section{Bayesian SSD for Model Uncertainty} + +Networks trained with dropout are a general approximate Bayesian model~\cite{Gal2017}. As such, they can be used for everything a true +Bayesian model could be used for. The idea is applied to SSD in this +thesis: two dropout layers are added to vanilla SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}). \begin{figure} \centering @@ -454,14 +465,14 @@ two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ss \label{fig:bayesian-ssd} \end{figure} -\subsection{Model Uncertainty} - -Dropout sampling measures model uncertainty with the help of -entropy: every forward pass creates predictions, these are -partitioned into observations, and then their entropy is calculated. -Entropy works to detect uncertainty because uncertain networks -will produce different classifications for the same object in an -image across multiple forward passes. +Motivation for this is model uncertainty: an uncertain model will +predict different classes for the same object on the same image across +multiple forward passes. This uncertainty is measured with entropy: +every forward pass results in predictions, these are partitioned into +observations, and subsequently their entropy is calculated. +A higher entropy indicates a more uniform distribution of confidences +whereas a lower entropy indicates a larger confidence in one class +and very low confidences in other classes. \subsection{Implementation Details} @@ -469,8 +480,11 @@ For this thesis, an SSD implementation based on Tensorflow~\cite{Abadi2015} and Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}} was used. It was modified to support entropy thresholding, partitioning of observations, and dropout -layers in the SSD model. %Entropy thresholding takes place before -%the per-class confidence threshold is applied. +layers in the SSD model. Entropy thresholding takes place before +the per-class confidence threshold is applied. + +The Bayesian variant was not fine-tuned and operates with the same +weights that vanilla SSD uses as well. \section{Decoding Pipelines} @@ -624,8 +638,8 @@ an open set condition. To this end, the weights for the last All images of the minival2014 data set were used but only ground truth belonging to the first 60 classes was loaded. The remaining 20 -classes were considered "unknown" and were not presented with bounding -boxes during the inference phase. +classes were considered "unknown" and no ground truth bounding +boxes for them were provided during the inference phase. \section{Experimental Setup}