Expanded methods chapter with vanilla SSD explanation
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
68
body.tex
68
body.tex
@ -129,7 +129,7 @@ This leads to the following hypothesis: \emph{Dropout sampling
|
|||||||
delivers better object detection performance under open set
|
delivers better object detection performance under open set
|
||||||
conditions compared to object detection without it.}
|
conditions compared to object detection without it.}
|
||||||
|
|
||||||
For the purpose of this thesis, I will use the vanilla SSD as
|
For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as
|
||||||
baseline to compare against. In particular, vanilla SSD uses
|
baseline to compare against. In particular, vanilla SSD uses
|
||||||
a per-class confidence threshold of 0.01, an IOU threshold of 0.45
|
a per-class confidence threshold of 0.01, an IOU threshold of 0.45
|
||||||
for the non-maximum suppression, and a top k value of 200.
|
for the non-maximum suppression, and a top k value of 200.
|
||||||
@ -421,16 +421,9 @@ be used to identify and reject these false positive cases.
|
|||||||
|
|
||||||
\label{chap:methods}
|
\label{chap:methods}
|
||||||
|
|
||||||
This chapter explains the functionality of the Bayesian SSD and the
|
This chapter explains the functionality of vanilla SSD, Bayesian SSD, and the decoding pipelines.
|
||||||
decoding pipelines.
|
|
||||||
|
|
||||||
\section{Bayesian SSD for Model Uncertainty}
|
\section{Vanilla SSD}
|
||||||
|
|
||||||
Bayesian SSD adds dropout sampling to the vanilla SSD. First,
|
|
||||||
the model architecture will be explained, followed by details on
|
|
||||||
the uncertainty calculation, and implementation details.
|
|
||||||
|
|
||||||
\subsection{Model Architecture}
|
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
@ -440,11 +433,29 @@ the uncertainty calculation, and implementation details.
|
|||||||
\label{fig:vanilla-ssd}
|
\label{fig:vanilla-ssd}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Vanilla SSD is based upon the VGG-16 network (see figure \ref{fig:vanilla-ssd}) and adds extra feature layers. These layers
|
Vanilla SSD is based upon the VGG-16 network (see figure
|
||||||
predict the offsets to the anchor boxes, which have different sizes
|
\ref{fig:vanilla-ssd}) and adds extra feature layers. The entire
|
||||||
and aspect ratios. The feature layers also predict the
|
image (always size 300x300) is divided up into anchor boxes. During
|
||||||
corresponding confidences. By comparison, Bayesian SSD only adds
|
training, each of these boxes is mapped to a ground truth box or
|
||||||
two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ssd}).
|
background. For every anchor box the offset to
|
||||||
|
the object, and the class confidences are calculated. The output of the
|
||||||
|
SSD network are the predictions with class confidences, offsets to the
|
||||||
|
anchor box, anchor box coordinates, and variance. The model loss is a
|
||||||
|
weighted sum of localisation and confidence loss. As the network
|
||||||
|
has a fixed number of anchor boxes, every forward pass creates the same
|
||||||
|
number of detections - 8732 in the case of SSD 300x300.
|
||||||
|
|
||||||
|
Notably, the object proposals are made in a single run for an image -
|
||||||
|
single shot.
|
||||||
|
Other techniques like Faster R-CNN employ region proposals
|
||||||
|
and pooling. For more detailed information on SSD, please refer to
|
||||||
|
Liu et al.~\cite{Liu2016}.
|
||||||
|
|
||||||
|
\section{Bayesian SSD for Model Uncertainty}
|
||||||
|
|
||||||
|
Networks trained with dropout are a general approximate Bayesian model~\cite{Gal2017}. As such, they can be used for everything a true
|
||||||
|
Bayesian model could be used for. The idea is applied to SSD in this
|
||||||
|
thesis: two dropout layers are added to vanilla SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}).
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
@ -454,14 +465,14 @@ two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ss
|
|||||||
\label{fig:bayesian-ssd}
|
\label{fig:bayesian-ssd}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\subsection{Model Uncertainty}
|
Motivation for this is model uncertainty: an uncertain model will
|
||||||
|
predict different classes for the same object on the same image across
|
||||||
Dropout sampling measures model uncertainty with the help of
|
multiple forward passes. This uncertainty is measured with entropy:
|
||||||
entropy: every forward pass creates predictions, these are
|
every forward pass results in predictions, these are partitioned into
|
||||||
partitioned into observations, and then their entropy is calculated.
|
observations, and subsequently their entropy is calculated.
|
||||||
Entropy works to detect uncertainty because uncertain networks
|
A higher entropy indicates a more uniform distribution of confidences
|
||||||
will produce different classifications for the same object in an
|
whereas a lower entropy indicates a larger confidence in one class
|
||||||
image across multiple forward passes.
|
and very low confidences in other classes.
|
||||||
|
|
||||||
\subsection{Implementation Details}
|
\subsection{Implementation Details}
|
||||||
|
|
||||||
@ -469,8 +480,11 @@ For this thesis, an SSD implementation based on Tensorflow~\cite{Abadi2015} and
|
|||||||
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
|
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
|
||||||
was used. It was modified to support entropy thresholding,
|
was used. It was modified to support entropy thresholding,
|
||||||
partitioning of observations, and dropout
|
partitioning of observations, and dropout
|
||||||
layers in the SSD model. %Entropy thresholding takes place before
|
layers in the SSD model. Entropy thresholding takes place before
|
||||||
%the per-class confidence threshold is applied.
|
the per-class confidence threshold is applied.
|
||||||
|
|
||||||
|
The Bayesian variant was not fine-tuned and operates with the same
|
||||||
|
weights that vanilla SSD uses as well.
|
||||||
|
|
||||||
\section{Decoding Pipelines}
|
\section{Decoding Pipelines}
|
||||||
|
|
||||||
@ -624,8 +638,8 @@ an open set condition. To this end, the weights for the last
|
|||||||
|
|
||||||
All images of the minival2014 data set were used but only ground truth
|
All images of the minival2014 data set were used but only ground truth
|
||||||
belonging to the first 60 classes was loaded. The remaining 20
|
belonging to the first 60 classes was loaded. The remaining 20
|
||||||
classes were considered "unknown" and were not presented with bounding
|
classes were considered "unknown" and no ground truth bounding
|
||||||
boxes during the inference phase.
|
boxes for them were provided during the inference phase.
|
||||||
|
|
||||||
\section{Experimental Setup}
|
\section{Experimental Setup}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user