diff --git a/body.tex b/body.tex index e09ac69..4bab418 100644 --- a/body.tex +++ b/body.tex @@ -374,12 +374,78 @@ implementation details will be presented. \section{Bayesian SSD for Model Uncertainty} +Bayesian SSD adds dropout sampling to the vanilla SSD. First, +the model architecture will be explained, followed by details on +the uncertainty calculation, and implementation details. + \subsection{Model Architecture} +\begin{figure} + \centering + \includegraphics[scale=1.2]{vanilla-ssd} + \caption{The vanilla SSD network as defined by Liu et al.~\cite{Liu2016}. VGG-16 is the base network, extended with extra feature layers. These predict offsets to anchor boxes with different sizes and aspect ratios. Furthermore, they predict the + corresponding confidences.} + \label{fig:vanilla-ssd} +\end{figure} + +Vanilla SSD is based upon the VGG-16 network (see figure \ref{fig:vanilla-ssd}) and adds extra feature layers. These layers +predict the offsets to the anchor boxes, which have different sizes +and aspect ratios. The feature layers also predict the +corresponding confidences. By comparison, Bayesian SSD only adds +two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ssd}). + +\begin{figure} + \centering + \includegraphics[scale=1.2]{bayesian-ssd} + \caption{The Bayesian SSD network as defined by Miller et al.~\cite{Miller2018}. It adds dropout layers after the fc6 + and fc7 layers.} + \label{fig:bayesian-ssd} +\end{figure} + \subsection{Model Uncertainty} +Dropout sampling measures model uncertainty with the help of +entropy: every forward pass creates predictions, these are +partitioned into observations, and then their entropy is calculated. +Entropy works to detect uncertainty because uncertain networks +will produce different classifications for the same object in an +image across multiple forward passes. + +The vanilla SSD +per-class confidence threshold and non-maximum suppression has one +weakness: even if SSD correctly predicts all objects as the +background class with high confidence, the per-class confidence +threshold of 0.01 will consider predictions with very low +confidences; as background boxes are not present in the maxima +collection, many low confidence boxes can be. Furthermore, the +same detection can be present in the maxima collection for multiple +classes. In this case, the entropy threshold would let the detection +pass because the background class has high confidence. Subsequently, +a low per-class confidence threshold does not restrict the boxes +either. Therefore, the decoding output is worse than the actual +predictions of the network. +Bayesian SSD cannot help in this situation because the network +is not actually uncertain. + +SSD was developed with closed set conditions in mind. A well trained +network in such a situation does not have many high confidence +background detections. In an open set environment, background +detections are the correct behaviour for unknown classes. +In order to get useful detections out of the decoding, a higher +confidence threshold is required. + \subsection{Implementation Details} +For this thesis, an SSD implementation based on Tensorflow and +Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}} +was used. It was modified to support entropy thresholding, and dropout +layers in the SSD model. Entropy thresholding takes place before +the per-class confidence threshold is applied. + + + + + \section{Software and Source Code Design} The source code of many published papers is either not available