Expanded methods chapter with vanilla SSD explanation

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-09-19 13:56:55 +02:00
parent 8ba241a8d7
commit 59b09c45ff

View File

@ -129,7 +129,7 @@ This leads to the following hypothesis: \emph{Dropout sampling
delivers better object detection performance under open set
conditions compared to object detection without it.}
For the purpose of this thesis, I will use the vanilla SSD as
For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as
baseline to compare against. In particular, vanilla SSD uses
a per-class confidence threshold of 0.01, an IOU threshold of 0.45
for the non-maximum suppression, and a top k value of 200.
@ -421,16 +421,9 @@ be used to identify and reject these false positive cases.
\label{chap:methods}
This chapter explains the functionality of the Bayesian SSD and the
decoding pipelines.
This chapter explains the functionality of vanilla SSD, Bayesian SSD, and the decoding pipelines.
\section{Bayesian SSD for Model Uncertainty}
Bayesian SSD adds dropout sampling to the vanilla SSD. First,
the model architecture will be explained, followed by details on
the uncertainty calculation, and implementation details.
\subsection{Model Architecture}
\section{Vanilla SSD}
\begin{figure}
\centering
@ -440,11 +433,29 @@ the uncertainty calculation, and implementation details.
\label{fig:vanilla-ssd}
\end{figure}
Vanilla SSD is based upon the VGG-16 network (see figure \ref{fig:vanilla-ssd}) and adds extra feature layers. These layers
predict the offsets to the anchor boxes, which have different sizes
and aspect ratios. The feature layers also predict the
corresponding confidences. By comparison, Bayesian SSD only adds
two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ssd}).
Vanilla SSD is based upon the VGG-16 network (see figure
\ref{fig:vanilla-ssd}) and adds extra feature layers. The entire
image (always size 300x300) is divided up into anchor boxes. During
training, each of these boxes is mapped to a ground truth box or
background. For every anchor box the offset to
the object, and the class confidences are calculated. The output of the
SSD network are the predictions with class confidences, offsets to the
anchor box, anchor box coordinates, and variance. The model loss is a
weighted sum of localisation and confidence loss. As the network
has a fixed number of anchor boxes, every forward pass creates the same
number of detections - 8732 in the case of SSD 300x300.
Notably, the object proposals are made in a single run for an image -
single shot.
Other techniques like Faster R-CNN employ region proposals
and pooling. For more detailed information on SSD, please refer to
Liu et al.~\cite{Liu2016}.
\section{Bayesian SSD for Model Uncertainty}
Networks trained with dropout are a general approximate Bayesian model~\cite{Gal2017}. As such, they can be used for everything a true
Bayesian model could be used for. The idea is applied to SSD in this
thesis: two dropout layers are added to vanilla SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}).
\begin{figure}
\centering
@ -454,14 +465,14 @@ two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ss
\label{fig:bayesian-ssd}
\end{figure}
\subsection{Model Uncertainty}
Dropout sampling measures model uncertainty with the help of
entropy: every forward pass creates predictions, these are
partitioned into observations, and then their entropy is calculated.
Entropy works to detect uncertainty because uncertain networks
will produce different classifications for the same object in an
image across multiple forward passes.
Motivation for this is model uncertainty: an uncertain model will
predict different classes for the same object on the same image across
multiple forward passes. This uncertainty is measured with entropy:
every forward pass results in predictions, these are partitioned into
observations, and subsequently their entropy is calculated.
A higher entropy indicates a more uniform distribution of confidences
whereas a lower entropy indicates a larger confidence in one class
and very low confidences in other classes.
\subsection{Implementation Details}
@ -469,8 +480,11 @@ For this thesis, an SSD implementation based on Tensorflow~\cite{Abadi2015} and
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
was used. It was modified to support entropy thresholding,
partitioning of observations, and dropout
layers in the SSD model. %Entropy thresholding takes place before
%the per-class confidence threshold is applied.
layers in the SSD model. Entropy thresholding takes place before
the per-class confidence threshold is applied.
The Bayesian variant was not fine-tuned and operates with the same
weights that vanilla SSD uses as well.
\section{Decoding Pipelines}
@ -624,8 +638,8 @@ an open set condition. To this end, the weights for the last
All images of the minival2014 data set were used but only ground truth
belonging to the first 60 classes was loaded. The remaining 20
classes were considered "unknown" and were not presented with bounding
boxes during the inference phase.
classes were considered "unknown" and no ground truth bounding
boxes for them were provided during the inference phase.
\section{Experimental Setup}