diff --git a/body.tex b/body.tex
index 618608d..286e97a 100644
--- a/body.tex
+++ b/body.tex
@@ -115,15 +115,15 @@ novelty score.
Auto-encoders work well for data sets like MNIST~\cite{Deng2012}
but perform poorly on challenging real world data sets
like MS COCO~\cite{Lin2014}, complicating any potential comparison between
-them and object detection networks like SSD.
+them and object detection networks like \gls{SSD}.
Therefore, a comparison between model uncertainty with a network like
SSD and novelty detection with auto-encoders is considered out of scope
for this thesis.
-Miller et al.~\cite{Miller2018} used an SSD pre-trained on COCO
+Miller et al.~\cite{Miller2018} used an \gls{SSD} pre-trained on COCO
without further fine-tuning on the SceneNet RGB-D data
set~\cite{McCormac2017} and reported good results regarding
-open set error for an SSD variant with dropout sampling and entropy
+open set error for an \gls{SSD} variant with dropout sampling and entropy
thresholding.
If their results are generalisable it should be possible to replicate
the relative difference between the variants on the COCO data set.
@@ -131,15 +131,15 @@ This leads to the following hypothesis: \emph{Dropout sampling
delivers better object detection performance under open set
conditions compared to object detection without it.}
-For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as
-baseline to compare against. In particular, vanilla SSD uses
+For the purpose of this thesis, I will use the \gls{vanilla} \gls{SSD} (as in: the original SSD) as
+baseline to compare against. In particular, \gls{vanilla} \gls{SSD} uses
a per-class confidence threshold of 0.01, an IOU threshold of 0.45
-for the non-maximum suppression, and a top \(k\) value of 200. For this
+for the \gls{NMS}, and a top \(k\) value of 200. For this
thesis, the top \(k\) value was changed to 20 and the confidence threshold
of 0.2 was tried as well.
-The effect of an entropy threshold is measured against this vanilla
+The effect of an entropy threshold is measured against this \gls{vanilla}
SSD by applying entropy thresholds from 0.1 to 2.4 inclusive (limits taken from
-Miller et al.). Dropout sampling is compared to vanilla SSD
+Miller et al.). Dropout sampling is compared to \gls{vanilla} SSD
with and without entropy thresholding.
\paragraph{Hypothesis} Dropout sampling
@@ -150,8 +150,8 @@ conditions compared to object detection without it.
First, chapter \ref{chap:background} presents related works and
provides the background for dropout sampling.
-Afterwards, chapter \ref{chap:methods} explains how vanilla SSD works, how
-Bayesian SSD extends vanilla SSD, and how the decoding pipelines are
+Afterwards, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how
+Bayesian \gls{SSD} extends \gls{vanilla} SSD, and how the decoding pipelines are
structured.
Chapter \ref{chap:experiments-results} presents the data sets,
the experimental setup, and the results. This is followed by
@@ -421,19 +421,19 @@ be used to identify and reject these false positive cases.
\label{chap:methods}
-This chapter explains the functionality of vanilla SSD, Bayesian SSD, and the decoding pipelines.
+This chapter explains the functionality of \gls{vanilla} SSD, Bayesian SSD, and the decoding pipelines.
\section{Vanilla SSD}
\begin{figure}
\centering
\includegraphics[scale=1.2]{vanilla-ssd}
- \caption{The vanilla SSD network as defined by Liu et al.~\cite{Liu2016}. VGG-16 is the base network, extended with extra feature layers. These predict offsets to anchor boxes with different sizes and aspect ratios. Furthermore, they predict the
+ \caption{The \gls{vanilla} \gls{SSD} network as defined by Liu et al.~\cite{Liu2016}. VGG-16 is the base network, extended with extra feature layers. These predict offsets to anchor boxes with different sizes and aspect ratios. Furthermore, they predict the
corresponding confidences.}
\label{fig:vanilla-ssd}
\end{figure}
-Vanilla SSD is based upon the VGG-16 network (see figure
+Vanilla \gls{SSD} is based upon the VGG-16 network (see figure
\ref{fig:vanilla-ssd}) and adds extra feature layers. The entire
image (always size 300x300) is divided up into anchor boxes. During
training, each of these boxes is mapped to a ground truth box or
@@ -443,7 +443,7 @@ SSD network are the predictions with class confidences, offsets to the
anchor box, anchor box coordinates, and variance. The model loss is a
weighted sum of localisation and confidence loss. As the network
has a fixed number of anchor boxes, every forward pass creates the same
-number of detections---8732 in the case of SSD 300x300.
+number of detections---8732 in the case of \gls{SSD} 300x300.
Notably, the object proposals are made in a single run for an image -
single shot.
@@ -454,13 +454,13 @@ Liu et al.~\cite{Liu2016}.
\section{Bayesian SSD for Model Uncertainty}
Networks trained with dropout are a general approximate Bayesian model~\cite{Gal2017}. As such, they can be used for everything a true
-Bayesian model could be used for. The idea is applied to SSD in this
-thesis: two dropout layers are added to vanilla SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}).
+Bayesian model could be used for. The idea is applied to \gls{SSD} in this
+thesis: two dropout layers are added to \gls{vanilla} SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}).
\begin{figure}
\centering
\includegraphics[scale=1.2]{bayesian-ssd}
- \caption{The Bayesian SSD network as defined by Miller et al.~\cite{Miller2018}. It adds dropout layers after the fc6
+ \caption{The Bayesian \gls{SSD} network as defined by Miller et al.~\cite{Miller2018}. It adds dropout layers after the fc6
and fc7 layers.}
\label{fig:bayesian-ssd}
\end{figure}
@@ -476,51 +476,52 @@ and very low confidences in other classes.
\subsection{Implementation Details}
-For this thesis, an SSD implementation based on Tensorflow~\cite{Abadi2015} and
+For this thesis, an \gls{SSD} implementation based on Tensorflow~\cite{Abadi2015} and
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
was used. It was modified to support entropy thresholding,
partitioning of observations, and dropout
-layers in the SSD model. Entropy thresholding takes place before
+layers in the \gls{SSD} model. Entropy thresholding takes place before
the per-class confidence threshold is applied.
The Bayesian variant was not fine-tuned and operates with the same
-weights that vanilla SSD uses as well.
+weights that \gls{vanilla} \gls{SSD} uses as well.
\section{Decoding Pipelines}
-The raw output of SSD is not very useful: it contains thousands of
+The raw output of \gls{SSD} is not very useful: it contains thousands of
boxes per image. Among them are many boxes with very low confidences
or background classifications, those need to be filtered out to
get any meaningful output of the network. The process of
filtering is called decoding and presented for the three variants
-of SSD used in the thesis.
+of \gls{SSD} used in the thesis.
\subsection{Vanilla SSD}
Liu et al.~\cite{Liu2016} used Caffe for their original SSD
implementation. The decoding process contains largely two
phases: decoding and filtering. Decoding transforms the relative
-coordinates predicted by SSD into absolute coordinates. At this point
+coordinates predicted by \gls{SSD} into absolute coordinates. At this point
the shape of the output per batch is \((batch\_size, \#nr\_boxes, \#nr\_classes + 12)\). The last twelve elements are split into
the four bounding box offsets, the four anchor box coordinates, and
the four variances; there are 8732 boxes.
+\glslocalreset{NMS}
Filtering of these boxes is first done per class:
only the class id, confidence of that class, and the bounding box
coordinates are kept per box. The filtering consists of
-confidence thresholding and a subsequent non-maximum suppression.
-All boxes that pass non-maximum suppression are added to a
+confidence thresholding and a subsequent \gls{NMS}.
+All boxes that pass \gls{NMS} are added to a
per image maxima list. One box could make the confidence threshold
for multiple classes and, hence, be present multiple times in the
maxima list for the image. Lastly, a total of \(k\) boxes with the
highest confidences is kept per image across all classes. The
original implementation uses a confidence threshold of \(0.01\), an
-IOU threshold for non-maximum suppression of \(0.45\) and a top \(k\)
+IOU threshold for \gls{NMS} of \(0.45\) and a top \(k\)
value of 200.
-The vanilla SSD
-per-class confidence threshold and non-maximum suppression has one
-weakness: even if SSD correctly predicts all objects as the
+The \gls{vanilla} SSD
+per-class confidence threshold and \gls{NMS} has one
+weakness: even if \gls{SSD} correctly predicts all objects as the
background class with high confidence, the per-class confidence
threshold of 0.01 will consider predictions with very low
confidences; as background boxes are not present in the maxima
@@ -531,7 +532,7 @@ pass because the background class has high confidence. Subsequently,
a low per-class confidence threshold does not restrict the boxes
either. Therefore, the decoding output is worse than the actual
predictions of the network.
-Bayesian SSD cannot help in this situation because the network
+Bayesian \gls{SSD} cannot help in this situation because the network
is not actually uncertain.
SSD was developed with closed set conditions in mind. A well trained
@@ -543,8 +544,8 @@ confidence threshold is required.
\subsection{Vanilla SSD with Entropy Thresholding}
-Vanilla SSD with entropy tresholding adds an additional component
-to the filtering already done for vanilla SSD. The entropy is
+Vanilla \gls{SSD} with entropy tresholding adds an additional component
+to the filtering already done for \gls{vanilla} SSD. The entropy is
calculated from all \(\#nr\_classes\) softmax scores in a prediction.
Only predictions with a low enough entropy pass the entropy
threshold and move on to the aforementioned per class filtering.
@@ -553,7 +554,7 @@ false positive or false negative cases with high confidence values.
\subsection{Bayesian SSD with Entropy Thresholding}
-Bayesian SSD has the speciality of multiple forward passes. Based
+Bayesian \gls{SSD} has the speciality of multiple forward passes. Based
on the information in the paper, the detections of all forward passes
are grouped per image but not by forward pass. This leads
to the following shape of the network output after all
@@ -585,8 +586,8 @@ varying classifications are averaged into multiple lower confidence
values which should increase the entropy and, hence, flag an
observation for removal.
-The remainder of the filtering follows the vanilla SSD procedure: per-class
-confidence threshold, non-maximum suppression, and a top \(k\) selection
+The remainder of the filtering follows the \gls{vanilla} \gls{SSD} procedure: per-class
+confidence threshold, \gls{NMS}, and a top \(k\) selection
at the end.
\chapter{Experimental Setup and Results}
@@ -627,7 +628,7 @@ process. MS COCO contains landscape and portrait images with (640x480)
and (480x640) as the resolution. This led to a uniform distortion of the
portrait and landscape images respectively. Furthermore,
the colour channels were swapped from RGB to BGR in order to
-comply with the SSD implementation. The BGR requirement stems from
+comply with the \gls{SSD} implementation. The BGR requirement stems from
the usage of Open CV in SSD: the internal channel order for
Open CV is BGR.
@@ -653,28 +654,28 @@ between the classes in the data set.
This section explains the setup for the different conducted
experiments. Each comparison investigates one particular question.
-As a baseline, vanilla SSD with the confidence threshold of 0.01
-and a non-maximum suppression IOU threshold of 0.45 was used.
+As a baseline, \gls{vanilla} \gls{SSD} with the confidence threshold of 0.01
+and a \gls{NMS} IOU threshold of 0.45 was used.
Due to the low number of objects per image in the COCO data set,
-the top \(k\) value was set to 20. Vanilla SSD with entropy
-thresholding uses the same parameters; compared to vanilla SSD
+the top \(k\) value was set to 20. Vanilla \gls{SSD} with entropy
+thresholding uses the same parameters; compared to \gls{vanilla} SSD
without entropy thresholding, it showcases the relevance of
-entropy thresholding for vanilla SSD.
+entropy thresholding for \gls{vanilla} SSD.
-Vanilla SSD was also run with 0.2 confidence threshold and compared
-to vanilla SSD with 0.01 confidence threshold; this comparison
+Vanilla \gls{SSD} was also run with 0.2 confidence threshold and compared
+to \gls{vanilla} \gls{SSD} with 0.01 confidence threshold; this comparison
investigates the effect of the per class confidence threshold
on the object detection performance.
-Bayesian SSD was run with 0.2 confidence threshold and compared
-to vanilla SSD with 0.2 confidence threshold. Coupled with the
+Bayesian \gls{SSD} was run with 0.2 confidence threshold and compared
+to \gls{vanilla} \gls{SSD} with 0.2 confidence threshold. Coupled with the
entropy threshold, this comparison reveals how uncertain the network
is. If it is very certain the dropout sampling should have no
significant impact on the result. Furthermore, in two cases the
-dropout was turned off to isolate the impact of non-maximum suppression
+dropout was turned off to isolate the impact of \gls{NMS}
on the result.
-Both, vanilla SSD with entropy thresholding and Bayesian SSD with
+Both, \gls{vanilla} \gls{SSD} with entropy thresholding and Bayesian \gls{SSD} with
entropy thresholding, were tested for entropy thresholds ranging
from 0.1 to 2.4 inclusive as specified in Miller et al.~\cite{Miller2018}.
@@ -701,25 +702,25 @@ in the next chapter.
Forward & max & abs OSE & Recall & Precision\\
Passes & \(F_1\) Score & \multicolumn{3}{c}{at max \(F_1\) point} \\
\hline
- vanilla SSD - 0.01 conf & 0.255 & 3176 & 0.214 & 0.318 \\
- vanilla SSD - 0.2 conf & \textbf{0.376} & 2939 & \textbf{0.382} & 0.372 \\
- SSD with Entropy test - 0.01 conf & 0.255 & 3168 & 0.214 & 0.318 \\
- % entropy thresh: 2.4 for vanilla SSD is best
+ \gls{vanilla} \gls{SSD} - 0.01 conf & 0.255 & 3176 & 0.214 & 0.318 \\
+ \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.376} & 2939 & \textbf{0.382} & 0.372 \\
+ \gls{SSD} with Entropy test - 0.01 conf & 0.255 & 3168 & 0.214 & 0.318 \\
+ % entropy thresh: 2.4 for \gls{vanilla} \gls{SSD} is best
\hline
- Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.209 & 2709 & 0.300 & 0.161 \\
- no dropout - 0.2 conf - NMS \; 10 & 0.371 & \textbf{2335} & 0.365 & \textbf{0.378} \\
- 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.359 & 2584 & 0.363 & 0.357 \\
- 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.325 & 2759 & 0.342 & 0.311 \\
+ Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.209 & 2709 & 0.300 & 0.161 \\
+ no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.371 & \textbf{2335} & 0.365 & \textbf{0.378} \\
+ 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.359 & 2584 & 0.363 & 0.357 \\
+ 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.325 & 2759 & 0.342 & 0.311 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
\hline
\end{tabular}
- \caption{Rounded results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
- their best performing entropy threshold with respect to \(F_1\) score. Vanilla SSD with Entropy test performed best with an
- entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 1.0,
- and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy
+ \caption{Rounded results for micro averaging. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with
+ their best performing entropy threshold with respect to \(F_1\) score. Vanilla \gls{SSD} with Entropy test performed best with an
+ entropy threshold of 2.4, Bayesian \gls{SSD} without \gls{NMS} performed best for 1.0,
+ and Bayesian \gls{SSD} with \gls{NMS} performed best for 1.4 as entropy
threshold.
- Bayesian SSD with dropout enabled and 0.9 keep ratio performed
+ Bayesian \gls{SSD} with dropout enabled and 0.9 keep ratio performed
best for 1.4 as entropy threshold, the run with 0.5 keep ratio performed
best for 1.3 as threshold.}
\label{tab:results-micro}
@@ -739,26 +740,26 @@ in the next chapter.
\end{minipage}
\end{figure}
-Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
+Vanilla \gls{SSD} with a per-class confidence threshold of 0.2 performs best (see
table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score
(0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither
-the vanilla SSD variant with a confidence threshold of 0.01 nor the SSD with
-an entropy test can outperform the 0.2 variant. Among the vanilla SSD variants,
+the \gls{vanilla} \gls{SSD} variant with a confidence threshold of 0.01 nor the \gls{SSD} with
+an entropy test can outperform the 0.2 variant. Among the \gls{vanilla} \gls{SSD} variants,
the 0.2 variant also has the lowest number of open set errors (2939) and the
highest precision (0.372).
-The comparison of the vanilla SSD variants with a confidence threshold of 0.01
+The comparison of the \gls{vanilla} \gls{SSD} variants with a confidence threshold of 0.01
shows no significant impact of an entropy test. Only the open set errors
are lower but in an insignificant way. The rest of the performance metrics is
identical after rounding.
-Bayesian SSD with disabled dropout and without non-maximum suppression
-has the worst performance of all tested variants (vanilla and Bayesian)
+Bayesian \gls{SSD} with disabled dropout and without \gls{NMS}
+has the worst performance of all tested variants (\gls{vanilla} and Bayesian)
with respect to \(F_1\) score (0.209) and precision (0.161). The precision is not only the worst but also significantly lower compared to all other variants.
In comparison to all variants with 0.2 confidence threshold, it has the worst recall (0.300) as well.
-With 2335 open set errors, the Bayesian SSD variant with disabled dropout and
-enabled non-maximum suppression offers the best performance with respect
+With 2335 open set errors, the Bayesian \gls{SSD} variant with disabled dropout and
+enabled \gls{NMS} offers the best performance with respect
to open set errors. It also has the best precision (0.378) of all tested
variants. Furthermore, it provides the best performance among all variants
with multiple forward passes.
@@ -768,11 +769,11 @@ in the lower \(F_1\) scores, higher open set errors, and lower precision
values. Both dropout variants have worse recall (0.363 and 0.342) than
the variant with disabled dropout.
However, all variants with multiple forward passes have lower open set
-errors than all vanilla SSD variants.
+errors than all \gls{vanilla} \gls{SSD} variants.
The relation of \(F_1\) score to absolute open set error can be observed
in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants
-can be seen in figure \ref{fig:precision-recall-micro}. Both vanilla SSD
+can be seen in figure \ref{fig:precision-recall-micro}. Both \gls{vanilla} SSD
variants with 0.01 confidence threshold reach much higher open set errors
and a higher recall. This behaviour is expected as more and worse predictions
are included.
@@ -787,25 +788,25 @@ reported figures, such as the ones in Miller et al.~\cite{Miller2018}
Forward & max & abs OSE & Recall & Precision\\
Passes & \(F_1\) Score & \multicolumn{3}{c}{at max \(F_1\) point} \\
\hline
- vanilla SSD - 0.01 conf & 0.370 & 1426 & 0.328 & 0.424 \\
- vanilla SSD - 0.2 conf & \textbf{0.375} & 1218 & \textbf{0.338} & 0.424 \\
- SSD with Entropy test - 0.01 conf & 0.370 & 1373 & 0.329 & \textbf{0.425} \\
- % entropy thresh: 1.7 for vanilla SSD is best
+ \gls{vanilla} \gls{SSD} - 0.01 conf & 0.370 & 1426 & 0.328 & 0.424 \\
+ \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.375} & 1218 & \textbf{0.338} & 0.424 \\
+ \gls{SSD} with Entropy test - 0.01 conf & 0.370 & 1373 & 0.329 & \textbf{0.425} \\
+ % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best
\hline
- Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.226 & \textbf{809} & 0.229 & 0.224 \\
- no dropout - 0.2 conf - NMS \; 10 & 0.363 & 1057 & 0.321 & 0.420 \\
- 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.355 & 1137 & 0.320 & 0.399 \\
- 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.322 & 1264 & 0.307 & 0.340 \\
+ Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.226 & \textbf{809} & 0.229 & 0.224 \\
+ no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.363 & 1057 & 0.321 & 0.420 \\
+ 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.355 & 1137 & 0.320 & 0.399 \\
+ 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.322 & 1264 & 0.307 & 0.340 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
% 1.7 for 8, 2.0 for 9
\hline
\end{tabular}
- \caption{Rounded results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
- their best performing entropy threshold with respect to \(F_1\) score. Vanilla SSD with Entropy test performed best with an
- entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 1.5,
- and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy
- threshold. Bayesian SSD with dropout enabled and 0.9 keep ratio performed
+ \caption{Rounded results for macro averaging. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with
+ their best performing entropy threshold with respect to \(F_1\) score. Vanilla \gls{SSD} with Entropy test performed best with an
+ entropy threshold of 1.7, Bayesian \gls{SSD} without \gls{NMS} performed best for 1.5,
+ and Bayesian \gls{SSD} with \gls{NMS} performed best for 1.5 as entropy
+ threshold. Bayesian \gls{SSD} with dropout enabled and 0.9 keep ratio performed
best for 1.7 as entropy threshold, the run with 0.5 keep ratio performed
best for 2.0 as threshold.}
\label{tab:results-macro}
@@ -825,36 +826,36 @@ reported figures, such as the ones in Miller et al.~\cite{Miller2018}
\end{minipage}
\end{figure}
-Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
+Vanilla \gls{SSD} with a per-class confidence threshold of 0.2 performs best (see
table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score
(0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD
with an entropy test slightly outperforms the 0.2 variant with respect to
precision (0.425). Additionally, this is the best precision overall. Among
-the vanilla SSD variants, the 0.2 variant also has the lowest
+the \gls{vanilla} \gls{SSD} variants, the 0.2 variant also has the lowest
number of open set errors (1218).
-The comparison of the vanilla SSD variants with a confidence threshold of 0.01
+The comparison of the \gls{vanilla} \gls{SSD} variants with a confidence threshold of 0.01
shows no significant impact of an entropy test. Only the open set errors
are lower but in an insignificant way. The rest of the performance metrics is
almost identical after rounding.
-The results for Bayesian SSD show a significant impact of non-maximum suppression or the lack thereof: maximum \(F_1\) score of 0.363 (with NMS) to 0.226
+The results for Bayesian \gls{SSD} show a significant impact of \gls{NMS} or the lack thereof: maximum \(F_1\) score of 0.363 (with NMS) to 0.226
(without NMS). Dropout was disabled in both cases, making them effectively a
-vanilla SSD run with multiple forward passes.
+\gls{vanilla} \gls{SSD} run with multiple forward passes.
-With 809 open set errors, the Bayesian SSD variant with disabled dropout and
-without non-maximum suppression offers the best performance with respect
-to open set errors. The variant without dropout and enabled non-maximum suppression has the best \(F_1\) score (0.363), the best
+With 809 open set errors, the Bayesian \gls{SSD} variant with disabled dropout and
+without \gls{NMS} offers the best performance with respect
+to open set errors. The variant without dropout and enabled \gls{NMS} has the best \(F_1\) score (0.363), the best
precision (0.420) and the best recall (0.321) of all Bayesian variants.
Dropout decreases the performance of the network, this can be seen
in the lower \(F_1\) scores, higher open set errors, and lower precision and
-recall values. However, all variants with multiple forward passes have lower open set errors than all vanilla SSD
+recall values. However, all variants with multiple forward passes have lower open set errors than all \gls{vanilla} SSD
variants.
The relation of \(F_1\) score to absolute open set error can be observed
in figure \ref{fig:ose-f1-macro}. Precision-recall curves for all variants
-can be seen in figure \ref{fig:precision-recall-macro}. Both vanilla SSD
+can be seen in figure \ref{fig:precision-recall-macro}. Both \gls{vanilla} SSD
variants with 0.01 confidence threshold reach much higher open set errors
and a higher recall. This behaviour is expected as more and worse predictions
are included.
@@ -884,35 +885,35 @@ they had the exact same performance before rounding.
Forward & max & Recall & Precision\\
Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
\hline
- vanilla SSD - 0.01 conf & 0.460 & \textbf{0.405} & 0.532 \\
- vanilla SSD - 0.2 conf & \textbf{0.460} & \textbf{0.405} & \textbf{0.533} \\
- SSD with Entropy test - 0.01 conf & 0.460 & 0.405 & 0.532 \\
- % entropy thresh: 1.7 for vanilla SSD is best
+ \gls{vanilla} \gls{SSD} - 0.01 conf & 0.460 & \textbf{0.405} & 0.532 \\
+ \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.460} & \textbf{0.405} & \textbf{0.533} \\
+ \gls{SSD} with Entropy test - 0.01 conf & 0.460 & 0.405 & 0.532 \\
+ % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best
\hline
- Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.272 & 0.292 & 0.256 \\
- no dropout - 0.2 conf - NMS \; 10 & 0.451 & 0.403 & 0.514 \\
- 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.447 & 0.401 & 0.505 \\
- 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.410 & 0.368 & 0.465 \\
+ Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.272 & 0.292 & 0.256 \\
+ no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.451 & 0.403 & 0.514 \\
+ 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.447 & 0.401 & 0.505 \\
+ 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.410 & 0.368 & 0.465 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
% 1.7 for 8, 2.0 for 9
\hline
\end{tabular}
- \caption{Rounded results for persons class. SSD with Entropy test and Bayesian SSD are represented with
+ \caption{Rounded results for persons class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with
their best performing macro averaging entropy threshold with respect to \(F_1\) score.}
\label{tab:results-persons}
\end{table}
It is clearly visible that the overall trend continues in the individual
-classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two vanilla SSD variants with only 0.01 confidence
+classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two \gls{vanilla} \gls{SSD} variants with only 0.01 confidence
threshold perform better than in the averaged results presented earlier.
-Only in the chairs class, a Bayesian SSD variant performs better (in
-precision) than any of the vanilla SSD variants. Moreover, there are
-multiple classes where two or all of the vanilla SSD variants perform
+Only in the chairs class, a Bayesian \gls{SSD} variant performs better (in
+precision) than any of the \gls{vanilla} \gls{SSD} variants. Moreover, there are
+multiple classes where two or all of the \gls{vanilla} \gls{SSD} variants perform
equally well. When compared with the macro averaged results,
giraffes and persons perform better across the board. Cars have a higher
precision than average but lower recall values for all but the Bayesian
-SSD variant without NMS and dropout. Chairs and bottles perform
+SSD variant without \gls{NMS} and dropout. Chairs and bottles perform
worse than average.
\begin{table}[tbp]
@@ -921,21 +922,21 @@ worse than average.
Forward & max & Recall & Precision\\
Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
\hline
- vanilla SSD - 0.01 conf & 0.364 & \textbf{0.305} & 0.452 \\
- vanilla SSD - 0.2 conf & 0.363 & 0.294 & \textbf{0.476} \\
- SSD with Entropy test - 0.01 conf & \textbf{0.364} & \textbf{0.305} & 0.453 \\
- % entropy thresh: 1.7 for vanilla SSD is best
+ \gls{vanilla} \gls{SSD} - 0.01 conf & 0.364 & \textbf{0.305} & 0.452 \\
+ \gls{vanilla} \gls{SSD} - 0.2 conf & 0.363 & 0.294 & \textbf{0.476} \\
+ \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.364} & \textbf{0.305} & 0.453 \\
+ % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best
\hline
- Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.236 & 0.244 & 0.229 \\
- no dropout - 0.2 conf - NMS \; 10 & 0.336 & 0.266 & 0.460 \\
- 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.332 & 0.262 & 0.454 \\
- 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.309 & 0.264 & 0.374 \\
+ Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.236 & 0.244 & 0.229 \\
+ no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.336 & 0.266 & 0.460 \\
+ 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.332 & 0.262 & 0.454 \\
+ 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.309 & 0.264 & 0.374 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
% 1.7 for 8, 2.0 for 9
\hline
\end{tabular}
- \caption{Rounded results for cars class. SSD with Entropy test and Bayesian SSD are represented with
+ \caption{Rounded results for cars class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with
their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
\label{tab:results-cars}
\end{table}
@@ -946,21 +947,21 @@ worse than average.
Forward & max & Recall & Precision\\
Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
\hline
- vanilla SSD - 0.01 conf & 0.287 & \textbf{0.251} & 0.335 \\
- vanilla SSD - 0.2 conf & 0.283 & 0.242 & 0.341 \\
- SSD with Entropy test - 0.01 conf & \textbf{0.288} & \textbf{0.251} & 0.338 \\
- % entropy thresh: 1.7 for vanilla SSD is best
+ \gls{vanilla} \gls{SSD} - 0.01 conf & 0.287 & \textbf{0.251} & 0.335 \\
+ \gls{vanilla} \gls{SSD} - 0.2 conf & 0.283 & 0.242 & 0.341 \\
+ \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.288} & \textbf{0.251} & 0.338 \\
+ % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best
\hline
- Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.172 & 0.168 & 0.178 \\
- no dropout - 0.2 conf - NMS \; 10 & 0.280 & 0.229 & \textbf{0.360} \\
- 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.274 & 0.228 & 0.343 \\
- 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.240 & 0.220 & 0.265 \\
+ Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.172 & 0.168 & 0.178 \\
+ no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.280 & 0.229 & \textbf{0.360} \\
+ 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.274 & 0.228 & 0.343 \\
+ 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.240 & 0.220 & 0.265 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
% 1.7 for 8, 2.0 for 9
\hline
\end{tabular}
- \caption{Rounded results for chairs class. SSD with Entropy test and Bayesian SSD are represented with
+ \caption{Rounded results for chairs class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with
their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
\label{tab:results-chairs}
\end{table}
@@ -972,21 +973,21 @@ worse than average.
Forward & max & Recall & Precision\\
Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
\hline
- vanilla SSD - 0.01 conf & 0.233 & \textbf{0.175} & 0.348 \\
- vanilla SSD - 0.2 conf & 0.231 & 0.173 & \textbf{0.350} \\
- SSD with Entropy test - 0.01 conf & \textbf{0.233} & \textbf{0.175} & 0.350 \\
- % entropy thresh: 1.7 for vanilla SSD is best
+ \gls{vanilla} \gls{SSD} - 0.01 conf & 0.233 & \textbf{0.175} & 0.348 \\
+ \gls{vanilla} \gls{SSD} - 0.2 conf & 0.231 & 0.173 & \textbf{0.350} \\
+ \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.233} & \textbf{0.175} & 0.350 \\
+ % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best
\hline
- Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.160 & 0.140 & 0.188 \\
- no dropout - 0.2 conf - NMS \; 10 & 0.224 & 0.170 & 0.328 \\
- 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.220 & 0.170 & 0.311 \\
- 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.202 & 0.172 & 0.245 \\
+ Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.160 & 0.140 & 0.188 \\
+ no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.224 & 0.170 & 0.328 \\
+ 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.220 & 0.170 & 0.311 \\
+ 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.202 & 0.172 & 0.245 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
% 1.7 for 8, 2.0 for 9
\hline
\end{tabular}
- \caption{Rounded results for bottles class. SSD with Entropy test and Bayesian SSD are represented with
+ \caption{Rounded results for bottles class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with
their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
\label{tab:results-bottles}
\end{table}
@@ -997,21 +998,21 @@ worse than average.
Forward & max & Recall & Precision\\
Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
\hline
- vanilla SSD - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
- vanilla SSD - 0.2 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
- SSD with Entropy test - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
- % entropy thresh: 1.7 for vanilla SSD is best
+ \gls{vanilla} \gls{SSD} - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
+ \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
+ \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
+ % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best
\hline
- Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.415 & 0.414 & 0.417 \\
- no dropout - 0.2 conf - NMS \; 10 & 0.647 & 0.642 & 0.654 \\
- 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.637 & 0.634 & 0.642 \\
- 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.586 & 0.578 & 0.596 \\
+ Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.415 & 0.414 & 0.417 \\
+ no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.647 & 0.642 & 0.654 \\
+ 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.637 & 0.634 & 0.642 \\
+ 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.586 & 0.578 & 0.596 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
% 1.7 for 8, 2.0 for 9
\hline
\end{tabular}
- \caption{Rounded results for giraffe class. SSD with Entropy test and Bayesian SSD are represented with
+ \caption{Rounded results for giraffe class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with
their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
\label{tab:results-giraffes}
\end{table}
@@ -1020,47 +1021,47 @@ worse than average.
% TODO: expand
-This subsection compares vanilla SSD
-with Bayesian SSD with respect to specific images that illustrate
+This subsection compares \gls{vanilla} SSD
+with Bayesian \gls{SSD} with respect to specific images that illustrate
similarities and differences between both approaches. For this
comparison, a 0.2 confidence threshold is applied. Furthermore, Bayesian
-SSD uses non-maximum suppression and dropout with 0.9 keep ratio.
+SSD uses \gls{NMS} and dropout with 0.9 keep ratio.
\begin{figure}
\begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_vanilla}
- \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from vanilla SSD.}
+ \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from \gls{vanilla} SSD.}
\label{fig:stop-sign-truck-vanilla}
\end{minipage}%
\hfill
\begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_bayesian}
- \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian SSD with 0.9 keep ratio.}
+ \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian \gls{SSD} with 0.9 keep ratio.}
\label{fig:stop-sign-truck-bayesian}
\end{minipage}
\end{figure}
-The ground truth only contains a stop sign and a truck. The differences between vanilla SSD and Bayesian SSD are almost not visible
-(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by vanilla nor Bayesian SSD, instead both detected a pottet plant and a traffic light. The stop sign is detected by both variants.
+The ground truth only contains a stop sign and a truck. The differences between \gls{vanilla} \gls{SSD} and Bayesian \gls{SSD} are almost not visible
+(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian SSD, instead both detected a pottet plant and a traffic light. The stop sign is detected by both variants.
This behaviour implies problems with detecting objects at the edge
that overwhelmingly lie outside the image frame. Furthermore, the predictions are usually identical.
\begin{figure}
\begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_vanilla}
- \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from vanilla SSD.}
+ \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from \gls{vanilla} SSD.}
\label{fig:cat-laptop-vanilla}
\end{minipage}%
\hfill
\begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_bayesian}
- \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian SSD with 0.9 keep ratio.}
+ \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian \gls{SSD} with 0.9 keep ratio.}
\label{fig:cat-laptop-bayesian}
\end{minipage}
\end{figure}
Another example (see figures \ref{fig:cat-laptop-vanilla} and \ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background on the right
-side. Both variants detect a cat but the vanilla variant detects a dog as well. The laptop and TV are not detected but this is expected since
+side. Both variants detect a cat but the \gls{vanilla} variant detects a dog as well. The laptop and TV are not detected but this is expected since
these classes were not trained.
\chapter{Discussion and Outlook}
@@ -1073,7 +1074,7 @@ questions will be addressed.
\section*{Discussion}
The results clearly do not support the hypothesis: \textit{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it}. With the exception of open set errors, there
-is no area where dropout sampling performs better than vanilla SSD. In the
+is no area where dropout sampling performs better than \gls{vanilla} SSD. In the
remainder of the section the individual results will be interpreted.
\subsection*{Impact of Averaging}
@@ -1085,8 +1086,8 @@ of the plot in both the \(F_1\) versus absolute open set error graph (see figure
the precision-recall curve (see figure \ref{fig:precision-recall-micro}).
This behaviour is caused by a large imbalance of detections between
-the classes. For vanilla SSD with 0.2 confidence threshold there are
-a total of 36,863 detections after non-maximum suppression and top \(k\).
+the classes. For \gls{vanilla} \gls{SSD} with 0.2 confidence threshold there are
+a total of 36,863 detections after \gls{NMS} and top \(k\).
The persons class contributes 14,640 detections or around 40\% to that number. Another strong class is cars with 2,252 detections or around
6\%. In third place come chairs with 1352 detections or around 4\%. This means that three classes have together roughly as many detections
as the remaining 57 classes combined.
@@ -1119,7 +1120,7 @@ averaging was not reported in their paper.
\subsection*{Impact of Entropy}
There is no visible impact of entropy thresholding on the object detection
-performance for vanilla SSD. This indicates that the network has almost no
+performance for \gls{vanilla} SSD. This indicates that the network has almost no
uniform or close to uniform predictions, the vast majority of predictions
has a high confidence in one class---including the background.
However, the entropy plays a larger role for the Bayesian variants---as
@@ -1144,52 +1145,52 @@ threshold indicates a worse performance.
variant & before & after & after \\
& entropy/NMS & entropy/NMS & top \(k\) \\
\hline
- Bay. SSD, no dropout, no NMS & 155,251 & 122,868 & 72,207 \\
- no dropout, NMS & 155,250 & 36,061 & 33,827 \\
+ Bay. SSD, no dropout, no \gls{NMS} & 155,251 & 122,868 & 72,207 \\
+ no dropout, \gls{NMS} & 155,250 & 36,061 & 33,827 \\
\hline
\end{tabular}
- \caption{Comparison of Bayesian SSD variants without dropout with
+ \caption{Comparison of Bayesian \gls{SSD} variants without dropout with
respect to the number of detections before the entropy threshold,
- after it and/or non-maximum suppression, and after top \(k\). The
+ after it and/or \gls{NMS}, and after top \(k\). The
entropy threshold 1.5 was used for both.}
\label{tab:effect-nms}
\end{table}
-Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
-in their implementation of dropout sampling. Therefore, a variant with disabled
-non-maximum suppression (NMS) was tested. The results are somewhat expected:
-non-maximum suppression removes all non-maximum detections that overlap
+Miller et al.~\cite{Miller2018} supposedly did not use \gls{NMS}
+in their implementation of dropout sampling. Therefore, a variant with disabled \glslocalreset{NMS}
+\gls{NMS} was tested. The results are somewhat expected:
+\gls{NMS} removes all non-maximum detections that overlap
with a maximum one. This reduces the number of multiple detections per
ground truth bounding box and therefore the false positives. Without it,
a lot more false positives remain and have a negative impact on precision.
In combination with top \(k\) selection, recall can be affected:
duplicate detections could stay and maxima boxes could be removed.
-The number of observations was measured before and after the combination of entropy threshold and NMS filter: both Bayesian SSD without
-NMS and dropout, and Bayesian SSD with NMS and disabled dropout
-have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with NMS has roughly 23\% of its observations left
+The number of observations was measured before and after the combination of entropy threshold and \gls{NMS} filter: both Bayesian \gls{SSD} without
+NMS and dropout, and Bayesian \gls{SSD} with \gls{NMS} and disabled dropout
+have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with \gls{NMS} has roughly 23\% of its observations left
(see table \ref{tab:effect-nms} for absolute numbers).
-Without NMS 79\% of observations are left. Irrespective of the absolute
-number, this discrepancy clearly shows the impact of non-maximum suppression and also explains a higher count of false positives:
-more than 50\% of the original observations were removed with NMS and
+Without \gls{NMS} 79\% of observations are left. Irrespective of the absolute
+number, this discrepancy clearly shows the impact of \gls{NMS} and also explains a higher count of false positives:
+more than 50\% of the original observations were removed with \gls{NMS} and
stayed without---all of these are very likely to be false positives.
A clear distinction between micro and macro averaging can be observed:
recall is hardly effected with micro averaging (0.300) but goes down equally with macro averaging (0.229). For micro averaging, it does
not matter which class the true positives belong to: every detection
counts the same way. This also means that top \(k\) will have only
-a marginal effect: some true positives might be removed without NMS but overall that does not have a big impact. With macro averaging, however,
+a marginal effect: some true positives might be removed without \gls{NMS} but overall that does not have a big impact. With macro averaging, however,
the class of the true positives matters a lot: for example, if two
true positives are removed from a class with only few true positives
to begin with than their removal will have a drastic influence on
the class recall value and hence the overall result.
The impact of top \(k\) was measured by counting the number of observations
-after top \(k\) has been applied: the variant with NMS keeps about 94\%
-of the observations left after NMS, without NMS only about 59\% of observations
+after top \(k\) has been applied: the variant with \gls{NMS} keeps about 94\%
+of the observations left after NMS, without \gls{NMS} only about 59\% of observations
are kept. This shows a significant impact on the result by top \(k\)
-in the case of disabled non-maximum suppression. Furthermore, some
+in the case of disabled \gls{NMS}. Furthermore, some
classes are hit harder by top \(k\) then others: for example,
dogs keep around 82\% of the observations but persons only 57\%.
This indicates that detected dogs are mostly on images with few detections
@@ -1211,12 +1212,12 @@ recall.
variant & after & after \\
& prediction & observation grouping \\
\hline
- Bay. SSD, no dropout, NMS & 1,677,050 & 155,250 \\
- keep rate 0.9, NMS & 1,617,675 & 549,166 \\
+ Bay. SSD, no dropout, \gls{NMS} & 1,677,050 & 155,250 \\
+ keep rate 0.9, \gls{NMS} & 1,617,675 & 549,166 \\
\hline
\end{tabular}
- \caption{Comparison of Bayesian SSD variants without dropout and with
+ \caption{Comparison of Bayesian \gls{SSD} variants without dropout and with
0.9 keep ratio of dropout with
respect to the number of detections directly after the network
predictions and after the observation grouping.}
@@ -1229,7 +1230,7 @@ dropout and the weights are not prepared for it.
Gal~\cite{Gal2017}
showed that networks \textbf{trained} with dropout are approximate Bayesian
-models. The Bayesian variants of SSD implemented in this thesis are not fine-tuned or trained with dropout, therefore, they are not guaranteed to be such approximate models.
+models. The Bayesian variants of \gls{SSD} implemented in this thesis are not fine-tuned or trained with dropout, therefore, they are not guaranteed to be such approximate models.
But dropout alone does not explain the difference in results. Both variants
with and without dropout have the exact same number of detections coming
@@ -1252,9 +1253,9 @@ has slightly fewer predictions left compared to the one without dropout.
After the grouping, the variant without dropout has on average between
10 and 11 detections grouped into an observation. This is expected as every
forward pass creates the exact same result and these 10 identical detections
-per vanilla SSD detection perfectly overlap. The fact that slightly more than
+per \gls{vanilla} \gls{SSD} detection perfectly overlap. The fact that slightly more than
10 detections are grouped together could explain the marginally better precision
-of the Bayesian variant without dropout compared to vanilla SSD.
+of the Bayesian variant without dropout compared to \gls{vanilla} SSD.
However, on average only three detections are grouped together into an
observation if dropout with 0.9 keep ratio is enabled. This does not
negatively impact recall as true positives do not disappear but offers
@@ -1276,7 +1277,7 @@ from Miller et al. The complete source code or otherwise exhaustive
implementation details of Miller et al. would be required to attempt an answer.
Future work could explore the performance of this implementation when used
-on an SSD variant that was fine-tuned or trained with dropout. In this case, it
+on an \gls{SSD} variant that was fine-tuned or trained with dropout. In this case, it
should also look into the impact of training with both dropout and batch
normalisation.
Other avenues include the application to other data sets or object detection
diff --git a/glossary.tex b/glossary.tex
new file mode 100644
index 0000000..41c1542
--- /dev/null
+++ b/glossary.tex
@@ -0,0 +1,12 @@
+% acronyms
+\newacronym{NMS}{NMS}{non-maximum suppression}
+\newacronym{SSD}{SSD}{Single Shot MultiBox Detector}
+
+% terms
+\newglossaryentry{vanilla}
+{
+ name={vanilla},
+ description={
+ is used to describe the original state of something
+ }
+}
diff --git a/masterthesis.sty b/masterthesis.sty
index 60f25ae..7e20f9d 100644
--- a/masterthesis.sty
+++ b/masterthesis.sty
@@ -102,7 +102,8 @@
\usepackage{makeidx}
\makeindex
-\usepackage[xindy]{glossaries} % for \printglossary
+\usepackage[xindy,toc]{glossaries} % for \printglossary
+\setacronymstyle{long-short}
\makeglossaries
%%% conditional includes
@@ -183,7 +184,7 @@
\newcommand{\finish}{%
%\clearpage
- \printglossary
+ \printglossaries
%\clearpage
\printindex
diff --git a/thesis.tex b/thesis.tex
index 1cc4453..c5506e9 100644
--- a/thesis.tex
+++ b/thesis.tex
@@ -33,6 +33,8 @@
% specify bib resource
\addbibresource{ma.bib}
+\input{glossary.tex}
+
\makeatletter
\g@addto@macro\appendix{%
\renewcommand*{\chapterformat}{%