From 452d97b4b25a36957dd2763b529613303e935614 Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Fri, 27 Sep 2019 16:02:59 +0200 Subject: [PATCH] Added glossary Signed-off-by: Jim Martens --- body.tex | 365 ++++++++++++++++++++++++----------------------- glossary.tex | 12 ++ masterthesis.sty | 5 +- thesis.tex | 2 + 4 files changed, 200 insertions(+), 184 deletions(-) create mode 100644 glossary.tex diff --git a/body.tex b/body.tex index 618608d..286e97a 100644 --- a/body.tex +++ b/body.tex @@ -115,15 +115,15 @@ novelty score. Auto-encoders work well for data sets like MNIST~\cite{Deng2012} but perform poorly on challenging real world data sets like MS COCO~\cite{Lin2014}, complicating any potential comparison between -them and object detection networks like SSD. +them and object detection networks like \gls{SSD}. Therefore, a comparison between model uncertainty with a network like SSD and novelty detection with auto-encoders is considered out of scope for this thesis. -Miller et al.~\cite{Miller2018} used an SSD pre-trained on COCO +Miller et al.~\cite{Miller2018} used an \gls{SSD} pre-trained on COCO without further fine-tuning on the SceneNet RGB-D data set~\cite{McCormac2017} and reported good results regarding -open set error for an SSD variant with dropout sampling and entropy +open set error for an \gls{SSD} variant with dropout sampling and entropy thresholding. If their results are generalisable it should be possible to replicate the relative difference between the variants on the COCO data set. @@ -131,15 +131,15 @@ This leads to the following hypothesis: \emph{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it.} -For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as -baseline to compare against. In particular, vanilla SSD uses +For the purpose of this thesis, I will use the \gls{vanilla} \gls{SSD} (as in: the original SSD) as +baseline to compare against. In particular, \gls{vanilla} \gls{SSD} uses a per-class confidence threshold of 0.01, an IOU threshold of 0.45 -for the non-maximum suppression, and a top \(k\) value of 200. For this +for the \gls{NMS}, and a top \(k\) value of 200. For this thesis, the top \(k\) value was changed to 20 and the confidence threshold of 0.2 was tried as well. -The effect of an entropy threshold is measured against this vanilla +The effect of an entropy threshold is measured against this \gls{vanilla} SSD by applying entropy thresholds from 0.1 to 2.4 inclusive (limits taken from -Miller et al.). Dropout sampling is compared to vanilla SSD +Miller et al.). Dropout sampling is compared to \gls{vanilla} SSD with and without entropy thresholding. \paragraph{Hypothesis} Dropout sampling @@ -150,8 +150,8 @@ conditions compared to object detection without it. First, chapter \ref{chap:background} presents related works and provides the background for dropout sampling. -Afterwards, chapter \ref{chap:methods} explains how vanilla SSD works, how -Bayesian SSD extends vanilla SSD, and how the decoding pipelines are +Afterwards, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how +Bayesian \gls{SSD} extends \gls{vanilla} SSD, and how the decoding pipelines are structured. Chapter \ref{chap:experiments-results} presents the data sets, the experimental setup, and the results. This is followed by @@ -421,19 +421,19 @@ be used to identify and reject these false positive cases. \label{chap:methods} -This chapter explains the functionality of vanilla SSD, Bayesian SSD, and the decoding pipelines. +This chapter explains the functionality of \gls{vanilla} SSD, Bayesian SSD, and the decoding pipelines. \section{Vanilla SSD} \begin{figure} \centering \includegraphics[scale=1.2]{vanilla-ssd} - \caption{The vanilla SSD network as defined by Liu et al.~\cite{Liu2016}. VGG-16 is the base network, extended with extra feature layers. These predict offsets to anchor boxes with different sizes and aspect ratios. Furthermore, they predict the + \caption{The \gls{vanilla} \gls{SSD} network as defined by Liu et al.~\cite{Liu2016}. VGG-16 is the base network, extended with extra feature layers. These predict offsets to anchor boxes with different sizes and aspect ratios. Furthermore, they predict the corresponding confidences.} \label{fig:vanilla-ssd} \end{figure} -Vanilla SSD is based upon the VGG-16 network (see figure +Vanilla \gls{SSD} is based upon the VGG-16 network (see figure \ref{fig:vanilla-ssd}) and adds extra feature layers. The entire image (always size 300x300) is divided up into anchor boxes. During training, each of these boxes is mapped to a ground truth box or @@ -443,7 +443,7 @@ SSD network are the predictions with class confidences, offsets to the anchor box, anchor box coordinates, and variance. The model loss is a weighted sum of localisation and confidence loss. As the network has a fixed number of anchor boxes, every forward pass creates the same -number of detections---8732 in the case of SSD 300x300. +number of detections---8732 in the case of \gls{SSD} 300x300. Notably, the object proposals are made in a single run for an image - single shot. @@ -454,13 +454,13 @@ Liu et al.~\cite{Liu2016}. \section{Bayesian SSD for Model Uncertainty} Networks trained with dropout are a general approximate Bayesian model~\cite{Gal2017}. As such, they can be used for everything a true -Bayesian model could be used for. The idea is applied to SSD in this -thesis: two dropout layers are added to vanilla SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}). +Bayesian model could be used for. The idea is applied to \gls{SSD} in this +thesis: two dropout layers are added to \gls{vanilla} SSD, after the layers fc6 and fc7 respectively (see figure \ref{fig:bayesian-ssd}). \begin{figure} \centering \includegraphics[scale=1.2]{bayesian-ssd} - \caption{The Bayesian SSD network as defined by Miller et al.~\cite{Miller2018}. It adds dropout layers after the fc6 + \caption{The Bayesian \gls{SSD} network as defined by Miller et al.~\cite{Miller2018}. It adds dropout layers after the fc6 and fc7 layers.} \label{fig:bayesian-ssd} \end{figure} @@ -476,51 +476,52 @@ and very low confidences in other classes. \subsection{Implementation Details} -For this thesis, an SSD implementation based on Tensorflow~\cite{Abadi2015} and +For this thesis, an \gls{SSD} implementation based on Tensorflow~\cite{Abadi2015} and Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}} was used. It was modified to support entropy thresholding, partitioning of observations, and dropout -layers in the SSD model. Entropy thresholding takes place before +layers in the \gls{SSD} model. Entropy thresholding takes place before the per-class confidence threshold is applied. The Bayesian variant was not fine-tuned and operates with the same -weights that vanilla SSD uses as well. +weights that \gls{vanilla} \gls{SSD} uses as well. \section{Decoding Pipelines} -The raw output of SSD is not very useful: it contains thousands of +The raw output of \gls{SSD} is not very useful: it contains thousands of boxes per image. Among them are many boxes with very low confidences or background classifications, those need to be filtered out to get any meaningful output of the network. The process of filtering is called decoding and presented for the three variants -of SSD used in the thesis. +of \gls{SSD} used in the thesis. \subsection{Vanilla SSD} Liu et al.~\cite{Liu2016} used Caffe for their original SSD implementation. The decoding process contains largely two phases: decoding and filtering. Decoding transforms the relative -coordinates predicted by SSD into absolute coordinates. At this point +coordinates predicted by \gls{SSD} into absolute coordinates. At this point the shape of the output per batch is \((batch\_size, \#nr\_boxes, \#nr\_classes + 12)\). The last twelve elements are split into the four bounding box offsets, the four anchor box coordinates, and the four variances; there are 8732 boxes. +\glslocalreset{NMS} Filtering of these boxes is first done per class: only the class id, confidence of that class, and the bounding box coordinates are kept per box. The filtering consists of -confidence thresholding and a subsequent non-maximum suppression. -All boxes that pass non-maximum suppression are added to a +confidence thresholding and a subsequent \gls{NMS}. +All boxes that pass \gls{NMS} are added to a per image maxima list. One box could make the confidence threshold for multiple classes and, hence, be present multiple times in the maxima list for the image. Lastly, a total of \(k\) boxes with the highest confidences is kept per image across all classes. The original implementation uses a confidence threshold of \(0.01\), an -IOU threshold for non-maximum suppression of \(0.45\) and a top \(k\) +IOU threshold for \gls{NMS} of \(0.45\) and a top \(k\) value of 200. -The vanilla SSD -per-class confidence threshold and non-maximum suppression has one -weakness: even if SSD correctly predicts all objects as the +The \gls{vanilla} SSD +per-class confidence threshold and \gls{NMS} has one +weakness: even if \gls{SSD} correctly predicts all objects as the background class with high confidence, the per-class confidence threshold of 0.01 will consider predictions with very low confidences; as background boxes are not present in the maxima @@ -531,7 +532,7 @@ pass because the background class has high confidence. Subsequently, a low per-class confidence threshold does not restrict the boxes either. Therefore, the decoding output is worse than the actual predictions of the network. -Bayesian SSD cannot help in this situation because the network +Bayesian \gls{SSD} cannot help in this situation because the network is not actually uncertain. SSD was developed with closed set conditions in mind. A well trained @@ -543,8 +544,8 @@ confidence threshold is required. \subsection{Vanilla SSD with Entropy Thresholding} -Vanilla SSD with entropy tresholding adds an additional component -to the filtering already done for vanilla SSD. The entropy is +Vanilla \gls{SSD} with entropy tresholding adds an additional component +to the filtering already done for \gls{vanilla} SSD. The entropy is calculated from all \(\#nr\_classes\) softmax scores in a prediction. Only predictions with a low enough entropy pass the entropy threshold and move on to the aforementioned per class filtering. @@ -553,7 +554,7 @@ false positive or false negative cases with high confidence values. \subsection{Bayesian SSD with Entropy Thresholding} -Bayesian SSD has the speciality of multiple forward passes. Based +Bayesian \gls{SSD} has the speciality of multiple forward passes. Based on the information in the paper, the detections of all forward passes are grouped per image but not by forward pass. This leads to the following shape of the network output after all @@ -585,8 +586,8 @@ varying classifications are averaged into multiple lower confidence values which should increase the entropy and, hence, flag an observation for removal. -The remainder of the filtering follows the vanilla SSD procedure: per-class -confidence threshold, non-maximum suppression, and a top \(k\) selection +The remainder of the filtering follows the \gls{vanilla} \gls{SSD} procedure: per-class +confidence threshold, \gls{NMS}, and a top \(k\) selection at the end. \chapter{Experimental Setup and Results} @@ -627,7 +628,7 @@ process. MS COCO contains landscape and portrait images with (640x480) and (480x640) as the resolution. This led to a uniform distortion of the portrait and landscape images respectively. Furthermore, the colour channels were swapped from RGB to BGR in order to -comply with the SSD implementation. The BGR requirement stems from +comply with the \gls{SSD} implementation. The BGR requirement stems from the usage of Open CV in SSD: the internal channel order for Open CV is BGR. @@ -653,28 +654,28 @@ between the classes in the data set. This section explains the setup for the different conducted experiments. Each comparison investigates one particular question. -As a baseline, vanilla SSD with the confidence threshold of 0.01 -and a non-maximum suppression IOU threshold of 0.45 was used. +As a baseline, \gls{vanilla} \gls{SSD} with the confidence threshold of 0.01 +and a \gls{NMS} IOU threshold of 0.45 was used. Due to the low number of objects per image in the COCO data set, -the top \(k\) value was set to 20. Vanilla SSD with entropy -thresholding uses the same parameters; compared to vanilla SSD +the top \(k\) value was set to 20. Vanilla \gls{SSD} with entropy +thresholding uses the same parameters; compared to \gls{vanilla} SSD without entropy thresholding, it showcases the relevance of -entropy thresholding for vanilla SSD. +entropy thresholding for \gls{vanilla} SSD. -Vanilla SSD was also run with 0.2 confidence threshold and compared -to vanilla SSD with 0.01 confidence threshold; this comparison +Vanilla \gls{SSD} was also run with 0.2 confidence threshold and compared +to \gls{vanilla} \gls{SSD} with 0.01 confidence threshold; this comparison investigates the effect of the per class confidence threshold on the object detection performance. -Bayesian SSD was run with 0.2 confidence threshold and compared -to vanilla SSD with 0.2 confidence threshold. Coupled with the +Bayesian \gls{SSD} was run with 0.2 confidence threshold and compared +to \gls{vanilla} \gls{SSD} with 0.2 confidence threshold. Coupled with the entropy threshold, this comparison reveals how uncertain the network is. If it is very certain the dropout sampling should have no significant impact on the result. Furthermore, in two cases the -dropout was turned off to isolate the impact of non-maximum suppression +dropout was turned off to isolate the impact of \gls{NMS} on the result. -Both, vanilla SSD with entropy thresholding and Bayesian SSD with +Both, \gls{vanilla} \gls{SSD} with entropy thresholding and Bayesian \gls{SSD} with entropy thresholding, were tested for entropy thresholds ranging from 0.1 to 2.4 inclusive as specified in Miller et al.~\cite{Miller2018}. @@ -701,25 +702,25 @@ in the next chapter. Forward & max & abs OSE & Recall & Precision\\ Passes & \(F_1\) Score & \multicolumn{3}{c}{at max \(F_1\) point} \\ \hline - vanilla SSD - 0.01 conf & 0.255 & 3176 & 0.214 & 0.318 \\ - vanilla SSD - 0.2 conf & \textbf{0.376} & 2939 & \textbf{0.382} & 0.372 \\ - SSD with Entropy test - 0.01 conf & 0.255 & 3168 & 0.214 & 0.318 \\ - % entropy thresh: 2.4 for vanilla SSD is best + \gls{vanilla} \gls{SSD} - 0.01 conf & 0.255 & 3176 & 0.214 & 0.318 \\ + \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.376} & 2939 & \textbf{0.382} & 0.372 \\ + \gls{SSD} with Entropy test - 0.01 conf & 0.255 & 3168 & 0.214 & 0.318 \\ + % entropy thresh: 2.4 for \gls{vanilla} \gls{SSD} is best \hline - Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.209 & 2709 & 0.300 & 0.161 \\ - no dropout - 0.2 conf - NMS \; 10 & 0.371 & \textbf{2335} & 0.365 & \textbf{0.378} \\ - 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.359 & 2584 & 0.363 & 0.357 \\ - 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.325 & 2759 & 0.342 & 0.311 \\ + Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.209 & 2709 & 0.300 & 0.161 \\ + no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.371 & \textbf{2335} & 0.365 & \textbf{0.378} \\ + 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.359 & 2584 & 0.363 & 0.357 \\ + 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.325 & 2759 & 0.342 & 0.311 \\ % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9 \hline \end{tabular} - \caption{Rounded results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with - their best performing entropy threshold with respect to \(F_1\) score. Vanilla SSD with Entropy test performed best with an - entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 1.0, - and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy + \caption{Rounded results for micro averaging. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with + their best performing entropy threshold with respect to \(F_1\) score. Vanilla \gls{SSD} with Entropy test performed best with an + entropy threshold of 2.4, Bayesian \gls{SSD} without \gls{NMS} performed best for 1.0, + and Bayesian \gls{SSD} with \gls{NMS} performed best for 1.4 as entropy threshold. - Bayesian SSD with dropout enabled and 0.9 keep ratio performed + Bayesian \gls{SSD} with dropout enabled and 0.9 keep ratio performed best for 1.4 as entropy threshold, the run with 0.5 keep ratio performed best for 1.3 as threshold.} \label{tab:results-micro} @@ -739,26 +740,26 @@ in the next chapter. \end{minipage} \end{figure} -Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see +Vanilla \gls{SSD} with a per-class confidence threshold of 0.2 performs best (see table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score (0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither -the vanilla SSD variant with a confidence threshold of 0.01 nor the SSD with -an entropy test can outperform the 0.2 variant. Among the vanilla SSD variants, +the \gls{vanilla} \gls{SSD} variant with a confidence threshold of 0.01 nor the \gls{SSD} with +an entropy test can outperform the 0.2 variant. Among the \gls{vanilla} \gls{SSD} variants, the 0.2 variant also has the lowest number of open set errors (2939) and the highest precision (0.372). -The comparison of the vanilla SSD variants with a confidence threshold of 0.01 +The comparison of the \gls{vanilla} \gls{SSD} variants with a confidence threshold of 0.01 shows no significant impact of an entropy test. Only the open set errors are lower but in an insignificant way. The rest of the performance metrics is identical after rounding. -Bayesian SSD with disabled dropout and without non-maximum suppression -has the worst performance of all tested variants (vanilla and Bayesian) +Bayesian \gls{SSD} with disabled dropout and without \gls{NMS} +has the worst performance of all tested variants (\gls{vanilla} and Bayesian) with respect to \(F_1\) score (0.209) and precision (0.161). The precision is not only the worst but also significantly lower compared to all other variants. In comparison to all variants with 0.2 confidence threshold, it has the worst recall (0.300) as well. -With 2335 open set errors, the Bayesian SSD variant with disabled dropout and -enabled non-maximum suppression offers the best performance with respect +With 2335 open set errors, the Bayesian \gls{SSD} variant with disabled dropout and +enabled \gls{NMS} offers the best performance with respect to open set errors. It also has the best precision (0.378) of all tested variants. Furthermore, it provides the best performance among all variants with multiple forward passes. @@ -768,11 +769,11 @@ in the lower \(F_1\) scores, higher open set errors, and lower precision values. Both dropout variants have worse recall (0.363 and 0.342) than the variant with disabled dropout. However, all variants with multiple forward passes have lower open set -errors than all vanilla SSD variants. +errors than all \gls{vanilla} \gls{SSD} variants. The relation of \(F_1\) score to absolute open set error can be observed in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants -can be seen in figure \ref{fig:precision-recall-micro}. Both vanilla SSD +can be seen in figure \ref{fig:precision-recall-micro}. Both \gls{vanilla} SSD variants with 0.01 confidence threshold reach much higher open set errors and a higher recall. This behaviour is expected as more and worse predictions are included. @@ -787,25 +788,25 @@ reported figures, such as the ones in Miller et al.~\cite{Miller2018} Forward & max & abs OSE & Recall & Precision\\ Passes & \(F_1\) Score & \multicolumn{3}{c}{at max \(F_1\) point} \\ \hline - vanilla SSD - 0.01 conf & 0.370 & 1426 & 0.328 & 0.424 \\ - vanilla SSD - 0.2 conf & \textbf{0.375} & 1218 & \textbf{0.338} & 0.424 \\ - SSD with Entropy test - 0.01 conf & 0.370 & 1373 & 0.329 & \textbf{0.425} \\ - % entropy thresh: 1.7 for vanilla SSD is best + \gls{vanilla} \gls{SSD} - 0.01 conf & 0.370 & 1426 & 0.328 & 0.424 \\ + \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.375} & 1218 & \textbf{0.338} & 0.424 \\ + \gls{SSD} with Entropy test - 0.01 conf & 0.370 & 1373 & 0.329 & \textbf{0.425} \\ + % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best \hline - Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.226 & \textbf{809} & 0.229 & 0.224 \\ - no dropout - 0.2 conf - NMS \; 10 & 0.363 & 1057 & 0.321 & 0.420 \\ - 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.355 & 1137 & 0.320 & 0.399 \\ - 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.322 & 1264 & 0.307 & 0.340 \\ + Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.226 & \textbf{809} & 0.229 & 0.224 \\ + no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.363 & 1057 & 0.321 & 0.420 \\ + 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.355 & 1137 & 0.320 & 0.399 \\ + 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.322 & 1264 & 0.307 & 0.340 \\ % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7 % 1.7 for 8, 2.0 for 9 \hline \end{tabular} - \caption{Rounded results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with - their best performing entropy threshold with respect to \(F_1\) score. Vanilla SSD with Entropy test performed best with an - entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 1.5, - and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy - threshold. Bayesian SSD with dropout enabled and 0.9 keep ratio performed + \caption{Rounded results for macro averaging. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with + their best performing entropy threshold with respect to \(F_1\) score. Vanilla \gls{SSD} with Entropy test performed best with an + entropy threshold of 1.7, Bayesian \gls{SSD} without \gls{NMS} performed best for 1.5, + and Bayesian \gls{SSD} with \gls{NMS} performed best for 1.5 as entropy + threshold. Bayesian \gls{SSD} with dropout enabled and 0.9 keep ratio performed best for 1.7 as entropy threshold, the run with 0.5 keep ratio performed best for 2.0 as threshold.} \label{tab:results-macro} @@ -825,36 +826,36 @@ reported figures, such as the ones in Miller et al.~\cite{Miller2018} \end{minipage} \end{figure} -Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see +Vanilla \gls{SSD} with a per-class confidence threshold of 0.2 performs best (see table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score (0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD with an entropy test slightly outperforms the 0.2 variant with respect to precision (0.425). Additionally, this is the best precision overall. Among -the vanilla SSD variants, the 0.2 variant also has the lowest +the \gls{vanilla} \gls{SSD} variants, the 0.2 variant also has the lowest number of open set errors (1218). -The comparison of the vanilla SSD variants with a confidence threshold of 0.01 +The comparison of the \gls{vanilla} \gls{SSD} variants with a confidence threshold of 0.01 shows no significant impact of an entropy test. Only the open set errors are lower but in an insignificant way. The rest of the performance metrics is almost identical after rounding. -The results for Bayesian SSD show a significant impact of non-maximum suppression or the lack thereof: maximum \(F_1\) score of 0.363 (with NMS) to 0.226 +The results for Bayesian \gls{SSD} show a significant impact of \gls{NMS} or the lack thereof: maximum \(F_1\) score of 0.363 (with NMS) to 0.226 (without NMS). Dropout was disabled in both cases, making them effectively a -vanilla SSD run with multiple forward passes. +\gls{vanilla} \gls{SSD} run with multiple forward passes. -With 809 open set errors, the Bayesian SSD variant with disabled dropout and -without non-maximum suppression offers the best performance with respect -to open set errors. The variant without dropout and enabled non-maximum suppression has the best \(F_1\) score (0.363), the best +With 809 open set errors, the Bayesian \gls{SSD} variant with disabled dropout and +without \gls{NMS} offers the best performance with respect +to open set errors. The variant without dropout and enabled \gls{NMS} has the best \(F_1\) score (0.363), the best precision (0.420) and the best recall (0.321) of all Bayesian variants. Dropout decreases the performance of the network, this can be seen in the lower \(F_1\) scores, higher open set errors, and lower precision and -recall values. However, all variants with multiple forward passes have lower open set errors than all vanilla SSD +recall values. However, all variants with multiple forward passes have lower open set errors than all \gls{vanilla} SSD variants. The relation of \(F_1\) score to absolute open set error can be observed in figure \ref{fig:ose-f1-macro}. Precision-recall curves for all variants -can be seen in figure \ref{fig:precision-recall-macro}. Both vanilla SSD +can be seen in figure \ref{fig:precision-recall-macro}. Both \gls{vanilla} SSD variants with 0.01 confidence threshold reach much higher open set errors and a higher recall. This behaviour is expected as more and worse predictions are included. @@ -884,35 +885,35 @@ they had the exact same performance before rounding. Forward & max & Recall & Precision\\ Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\ \hline - vanilla SSD - 0.01 conf & 0.460 & \textbf{0.405} & 0.532 \\ - vanilla SSD - 0.2 conf & \textbf{0.460} & \textbf{0.405} & \textbf{0.533} \\ - SSD with Entropy test - 0.01 conf & 0.460 & 0.405 & 0.532 \\ - % entropy thresh: 1.7 for vanilla SSD is best + \gls{vanilla} \gls{SSD} - 0.01 conf & 0.460 & \textbf{0.405} & 0.532 \\ + \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.460} & \textbf{0.405} & \textbf{0.533} \\ + \gls{SSD} with Entropy test - 0.01 conf & 0.460 & 0.405 & 0.532 \\ + % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best \hline - Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.272 & 0.292 & 0.256 \\ - no dropout - 0.2 conf - NMS \; 10 & 0.451 & 0.403 & 0.514 \\ - 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.447 & 0.401 & 0.505 \\ - 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.410 & 0.368 & 0.465 \\ + Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.272 & 0.292 & 0.256 \\ + no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.451 & 0.403 & 0.514 \\ + 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.447 & 0.401 & 0.505 \\ + 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.410 & 0.368 & 0.465 \\ % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7 % 1.7 for 8, 2.0 for 9 \hline \end{tabular} - \caption{Rounded results for persons class. SSD with Entropy test and Bayesian SSD are represented with + \caption{Rounded results for persons class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with their best performing macro averaging entropy threshold with respect to \(F_1\) score.} \label{tab:results-persons} \end{table} It is clearly visible that the overall trend continues in the individual -classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two vanilla SSD variants with only 0.01 confidence +classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two \gls{vanilla} \gls{SSD} variants with only 0.01 confidence threshold perform better than in the averaged results presented earlier. -Only in the chairs class, a Bayesian SSD variant performs better (in -precision) than any of the vanilla SSD variants. Moreover, there are -multiple classes where two or all of the vanilla SSD variants perform +Only in the chairs class, a Bayesian \gls{SSD} variant performs better (in +precision) than any of the \gls{vanilla} \gls{SSD} variants. Moreover, there are +multiple classes where two or all of the \gls{vanilla} \gls{SSD} variants perform equally well. When compared with the macro averaged results, giraffes and persons perform better across the board. Cars have a higher precision than average but lower recall values for all but the Bayesian -SSD variant without NMS and dropout. Chairs and bottles perform +SSD variant without \gls{NMS} and dropout. Chairs and bottles perform worse than average. \begin{table}[tbp] @@ -921,21 +922,21 @@ worse than average. Forward & max & Recall & Precision\\ Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\ \hline - vanilla SSD - 0.01 conf & 0.364 & \textbf{0.305} & 0.452 \\ - vanilla SSD - 0.2 conf & 0.363 & 0.294 & \textbf{0.476} \\ - SSD with Entropy test - 0.01 conf & \textbf{0.364} & \textbf{0.305} & 0.453 \\ - % entropy thresh: 1.7 for vanilla SSD is best + \gls{vanilla} \gls{SSD} - 0.01 conf & 0.364 & \textbf{0.305} & 0.452 \\ + \gls{vanilla} \gls{SSD} - 0.2 conf & 0.363 & 0.294 & \textbf{0.476} \\ + \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.364} & \textbf{0.305} & 0.453 \\ + % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best \hline - Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.236 & 0.244 & 0.229 \\ - no dropout - 0.2 conf - NMS \; 10 & 0.336 & 0.266 & 0.460 \\ - 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.332 & 0.262 & 0.454 \\ - 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.309 & 0.264 & 0.374 \\ + Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.236 & 0.244 & 0.229 \\ + no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.336 & 0.266 & 0.460 \\ + 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.332 & 0.262 & 0.454 \\ + 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.309 & 0.264 & 0.374 \\ % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7 % 1.7 for 8, 2.0 for 9 \hline \end{tabular} - \caption{Rounded results for cars class. SSD with Entropy test and Bayesian SSD are represented with + \caption{Rounded results for cars class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with their best performing macro averaging entropy threshold with respect to \(F_1\) score. } \label{tab:results-cars} \end{table} @@ -946,21 +947,21 @@ worse than average. Forward & max & Recall & Precision\\ Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\ \hline - vanilla SSD - 0.01 conf & 0.287 & \textbf{0.251} & 0.335 \\ - vanilla SSD - 0.2 conf & 0.283 & 0.242 & 0.341 \\ - SSD with Entropy test - 0.01 conf & \textbf{0.288} & \textbf{0.251} & 0.338 \\ - % entropy thresh: 1.7 for vanilla SSD is best + \gls{vanilla} \gls{SSD} - 0.01 conf & 0.287 & \textbf{0.251} & 0.335 \\ + \gls{vanilla} \gls{SSD} - 0.2 conf & 0.283 & 0.242 & 0.341 \\ + \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.288} & \textbf{0.251} & 0.338 \\ + % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best \hline - Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.172 & 0.168 & 0.178 \\ - no dropout - 0.2 conf - NMS \; 10 & 0.280 & 0.229 & \textbf{0.360} \\ - 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.274 & 0.228 & 0.343 \\ - 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.240 & 0.220 & 0.265 \\ + Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.172 & 0.168 & 0.178 \\ + no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.280 & 0.229 & \textbf{0.360} \\ + 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.274 & 0.228 & 0.343 \\ + 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.240 & 0.220 & 0.265 \\ % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7 % 1.7 for 8, 2.0 for 9 \hline \end{tabular} - \caption{Rounded results for chairs class. SSD with Entropy test and Bayesian SSD are represented with + \caption{Rounded results for chairs class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with their best performing macro averaging entropy threshold with respect to \(F_1\) score. } \label{tab:results-chairs} \end{table} @@ -972,21 +973,21 @@ worse than average. Forward & max & Recall & Precision\\ Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\ \hline - vanilla SSD - 0.01 conf & 0.233 & \textbf{0.175} & 0.348 \\ - vanilla SSD - 0.2 conf & 0.231 & 0.173 & \textbf{0.350} \\ - SSD with Entropy test - 0.01 conf & \textbf{0.233} & \textbf{0.175} & 0.350 \\ - % entropy thresh: 1.7 for vanilla SSD is best + \gls{vanilla} \gls{SSD} - 0.01 conf & 0.233 & \textbf{0.175} & 0.348 \\ + \gls{vanilla} \gls{SSD} - 0.2 conf & 0.231 & 0.173 & \textbf{0.350} \\ + \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.233} & \textbf{0.175} & 0.350 \\ + % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best \hline - Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.160 & 0.140 & 0.188 \\ - no dropout - 0.2 conf - NMS \; 10 & 0.224 & 0.170 & 0.328 \\ - 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.220 & 0.170 & 0.311 \\ - 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.202 & 0.172 & 0.245 \\ + Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.160 & 0.140 & 0.188 \\ + no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.224 & 0.170 & 0.328 \\ + 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.220 & 0.170 & 0.311 \\ + 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.202 & 0.172 & 0.245 \\ % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7 % 1.7 for 8, 2.0 for 9 \hline \end{tabular} - \caption{Rounded results for bottles class. SSD with Entropy test and Bayesian SSD are represented with + \caption{Rounded results for bottles class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with their best performing macro averaging entropy threshold with respect to \(F_1\) score. } \label{tab:results-bottles} \end{table} @@ -997,21 +998,21 @@ worse than average. Forward & max & Recall & Precision\\ Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\ \hline - vanilla SSD - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\ - vanilla SSD - 0.2 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\ - SSD with Entropy test - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\ - % entropy thresh: 1.7 for vanilla SSD is best + \gls{vanilla} \gls{SSD} - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\ + \gls{vanilla} \gls{SSD} - 0.2 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\ + \gls{SSD} with Entropy test - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\ + % entropy thresh: 1.7 for \gls{vanilla} \gls{SSD} is best \hline - Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.415 & 0.414 & 0.417 \\ - no dropout - 0.2 conf - NMS \; 10 & 0.647 & 0.642 & 0.654 \\ - 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.637 & 0.634 & 0.642 \\ - 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.586 & 0.578 & 0.596 \\ + Bay. \gls{SSD} - no DO - 0.2 conf - no \gls{NMS} \; 10 & 0.415 & 0.414 & 0.417 \\ + no dropout - 0.2 conf - \gls{NMS} \; 10 & 0.647 & 0.642 & 0.654 \\ + 0.9 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.637 & 0.634 & 0.642 \\ + 0.5 keep ratio - 0.2 conf - \gls{NMS} \; 10 & 0.586 & 0.578 & 0.596 \\ % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7 % 1.7 for 8, 2.0 for 9 \hline \end{tabular} - \caption{Rounded results for giraffe class. SSD with Entropy test and Bayesian SSD are represented with + \caption{Rounded results for giraffe class. \gls{SSD} with Entropy test and Bayesian \gls{SSD} are represented with their best performing macro averaging entropy threshold with respect to \(F_1\) score. } \label{tab:results-giraffes} \end{table} @@ -1020,47 +1021,47 @@ worse than average. % TODO: expand -This subsection compares vanilla SSD -with Bayesian SSD with respect to specific images that illustrate +This subsection compares \gls{vanilla} SSD +with Bayesian \gls{SSD} with respect to specific images that illustrate similarities and differences between both approaches. For this comparison, a 0.2 confidence threshold is applied. Furthermore, Bayesian -SSD uses non-maximum suppression and dropout with 0.9 keep ratio. +SSD uses \gls{NMS} and dropout with 0.9 keep ratio. \begin{figure} \begin{minipage}[t]{0.48\textwidth} \includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_vanilla} - \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from vanilla SSD.} + \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from \gls{vanilla} SSD.} \label{fig:stop-sign-truck-vanilla} \end{minipage}% \hfill \begin{minipage}[t]{0.48\textwidth} \includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_bayesian} - \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian SSD with 0.9 keep ratio.} + \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian \gls{SSD} with 0.9 keep ratio.} \label{fig:stop-sign-truck-bayesian} \end{minipage} \end{figure} -The ground truth only contains a stop sign and a truck. The differences between vanilla SSD and Bayesian SSD are almost not visible -(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by vanilla nor Bayesian SSD, instead both detected a pottet plant and a traffic light. The stop sign is detected by both variants. +The ground truth only contains a stop sign and a truck. The differences between \gls{vanilla} \gls{SSD} and Bayesian \gls{SSD} are almost not visible +(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian SSD, instead both detected a pottet plant and a traffic light. The stop sign is detected by both variants. This behaviour implies problems with detecting objects at the edge that overwhelmingly lie outside the image frame. Furthermore, the predictions are usually identical. \begin{figure} \begin{minipage}[t]{0.48\textwidth} \includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_vanilla} - \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from vanilla SSD.} + \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from \gls{vanilla} SSD.} \label{fig:cat-laptop-vanilla} \end{minipage}% \hfill \begin{minipage}[t]{0.48\textwidth} \includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_bayesian} - \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian SSD with 0.9 keep ratio.} + \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian \gls{SSD} with 0.9 keep ratio.} \label{fig:cat-laptop-bayesian} \end{minipage} \end{figure} Another example (see figures \ref{fig:cat-laptop-vanilla} and \ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background on the right -side. Both variants detect a cat but the vanilla variant detects a dog as well. The laptop and TV are not detected but this is expected since +side. Both variants detect a cat but the \gls{vanilla} variant detects a dog as well. The laptop and TV are not detected but this is expected since these classes were not trained. \chapter{Discussion and Outlook} @@ -1073,7 +1074,7 @@ questions will be addressed. \section*{Discussion} The results clearly do not support the hypothesis: \textit{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it}. With the exception of open set errors, there -is no area where dropout sampling performs better than vanilla SSD. In the +is no area where dropout sampling performs better than \gls{vanilla} SSD. In the remainder of the section the individual results will be interpreted. \subsection*{Impact of Averaging} @@ -1085,8 +1086,8 @@ of the plot in both the \(F_1\) versus absolute open set error graph (see figure the precision-recall curve (see figure \ref{fig:precision-recall-micro}). This behaviour is caused by a large imbalance of detections between -the classes. For vanilla SSD with 0.2 confidence threshold there are -a total of 36,863 detections after non-maximum suppression and top \(k\). +the classes. For \gls{vanilla} \gls{SSD} with 0.2 confidence threshold there are +a total of 36,863 detections after \gls{NMS} and top \(k\). The persons class contributes 14,640 detections or around 40\% to that number. Another strong class is cars with 2,252 detections or around 6\%. In third place come chairs with 1352 detections or around 4\%. This means that three classes have together roughly as many detections as the remaining 57 classes combined. @@ -1119,7 +1120,7 @@ averaging was not reported in their paper. \subsection*{Impact of Entropy} There is no visible impact of entropy thresholding on the object detection -performance for vanilla SSD. This indicates that the network has almost no +performance for \gls{vanilla} SSD. This indicates that the network has almost no uniform or close to uniform predictions, the vast majority of predictions has a high confidence in one class---including the background. However, the entropy plays a larger role for the Bayesian variants---as @@ -1144,52 +1145,52 @@ threshold indicates a worse performance. variant & before & after & after \\ & entropy/NMS & entropy/NMS & top \(k\) \\ \hline - Bay. SSD, no dropout, no NMS & 155,251 & 122,868 & 72,207 \\ - no dropout, NMS & 155,250 & 36,061 & 33,827 \\ + Bay. SSD, no dropout, no \gls{NMS} & 155,251 & 122,868 & 72,207 \\ + no dropout, \gls{NMS} & 155,250 & 36,061 & 33,827 \\ \hline \end{tabular} - \caption{Comparison of Bayesian SSD variants without dropout with + \caption{Comparison of Bayesian \gls{SSD} variants without dropout with respect to the number of detections before the entropy threshold, - after it and/or non-maximum suppression, and after top \(k\). The + after it and/or \gls{NMS}, and after top \(k\). The entropy threshold 1.5 was used for both.} \label{tab:effect-nms} \end{table} -Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression -in their implementation of dropout sampling. Therefore, a variant with disabled -non-maximum suppression (NMS) was tested. The results are somewhat expected: -non-maximum suppression removes all non-maximum detections that overlap +Miller et al.~\cite{Miller2018} supposedly did not use \gls{NMS} +in their implementation of dropout sampling. Therefore, a variant with disabled \glslocalreset{NMS} +\gls{NMS} was tested. The results are somewhat expected: +\gls{NMS} removes all non-maximum detections that overlap with a maximum one. This reduces the number of multiple detections per ground truth bounding box and therefore the false positives. Without it, a lot more false positives remain and have a negative impact on precision. In combination with top \(k\) selection, recall can be affected: duplicate detections could stay and maxima boxes could be removed. -The number of observations was measured before and after the combination of entropy threshold and NMS filter: both Bayesian SSD without -NMS and dropout, and Bayesian SSD with NMS and disabled dropout -have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with NMS has roughly 23\% of its observations left +The number of observations was measured before and after the combination of entropy threshold and \gls{NMS} filter: both Bayesian \gls{SSD} without +NMS and dropout, and Bayesian \gls{SSD} with \gls{NMS} and disabled dropout +have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with \gls{NMS} has roughly 23\% of its observations left (see table \ref{tab:effect-nms} for absolute numbers). -Without NMS 79\% of observations are left. Irrespective of the absolute -number, this discrepancy clearly shows the impact of non-maximum suppression and also explains a higher count of false positives: -more than 50\% of the original observations were removed with NMS and +Without \gls{NMS} 79\% of observations are left. Irrespective of the absolute +number, this discrepancy clearly shows the impact of \gls{NMS} and also explains a higher count of false positives: +more than 50\% of the original observations were removed with \gls{NMS} and stayed without---all of these are very likely to be false positives. A clear distinction between micro and macro averaging can be observed: recall is hardly effected with micro averaging (0.300) but goes down equally with macro averaging (0.229). For micro averaging, it does not matter which class the true positives belong to: every detection counts the same way. This also means that top \(k\) will have only -a marginal effect: some true positives might be removed without NMS but overall that does not have a big impact. With macro averaging, however, +a marginal effect: some true positives might be removed without \gls{NMS} but overall that does not have a big impact. With macro averaging, however, the class of the true positives matters a lot: for example, if two true positives are removed from a class with only few true positives to begin with than their removal will have a drastic influence on the class recall value and hence the overall result. The impact of top \(k\) was measured by counting the number of observations -after top \(k\) has been applied: the variant with NMS keeps about 94\% -of the observations left after NMS, without NMS only about 59\% of observations +after top \(k\) has been applied: the variant with \gls{NMS} keeps about 94\% +of the observations left after NMS, without \gls{NMS} only about 59\% of observations are kept. This shows a significant impact on the result by top \(k\) -in the case of disabled non-maximum suppression. Furthermore, some +in the case of disabled \gls{NMS}. Furthermore, some classes are hit harder by top \(k\) then others: for example, dogs keep around 82\% of the observations but persons only 57\%. This indicates that detected dogs are mostly on images with few detections @@ -1211,12 +1212,12 @@ recall. variant & after & after \\ & prediction & observation grouping \\ \hline - Bay. SSD, no dropout, NMS & 1,677,050 & 155,250 \\ - keep rate 0.9, NMS & 1,617,675 & 549,166 \\ + Bay. SSD, no dropout, \gls{NMS} & 1,677,050 & 155,250 \\ + keep rate 0.9, \gls{NMS} & 1,617,675 & 549,166 \\ \hline \end{tabular} - \caption{Comparison of Bayesian SSD variants without dropout and with + \caption{Comparison of Bayesian \gls{SSD} variants without dropout and with 0.9 keep ratio of dropout with respect to the number of detections directly after the network predictions and after the observation grouping.} @@ -1229,7 +1230,7 @@ dropout and the weights are not prepared for it. Gal~\cite{Gal2017} showed that networks \textbf{trained} with dropout are approximate Bayesian -models. The Bayesian variants of SSD implemented in this thesis are not fine-tuned or trained with dropout, therefore, they are not guaranteed to be such approximate models. +models. The Bayesian variants of \gls{SSD} implemented in this thesis are not fine-tuned or trained with dropout, therefore, they are not guaranteed to be such approximate models. But dropout alone does not explain the difference in results. Both variants with and without dropout have the exact same number of detections coming @@ -1252,9 +1253,9 @@ has slightly fewer predictions left compared to the one without dropout. After the grouping, the variant without dropout has on average between 10 and 11 detections grouped into an observation. This is expected as every forward pass creates the exact same result and these 10 identical detections -per vanilla SSD detection perfectly overlap. The fact that slightly more than +per \gls{vanilla} \gls{SSD} detection perfectly overlap. The fact that slightly more than 10 detections are grouped together could explain the marginally better precision -of the Bayesian variant without dropout compared to vanilla SSD. +of the Bayesian variant without dropout compared to \gls{vanilla} SSD. However, on average only three detections are grouped together into an observation if dropout with 0.9 keep ratio is enabled. This does not negatively impact recall as true positives do not disappear but offers @@ -1276,7 +1277,7 @@ from Miller et al. The complete source code or otherwise exhaustive implementation details of Miller et al. would be required to attempt an answer. Future work could explore the performance of this implementation when used -on an SSD variant that was fine-tuned or trained with dropout. In this case, it +on an \gls{SSD} variant that was fine-tuned or trained with dropout. In this case, it should also look into the impact of training with both dropout and batch normalisation. Other avenues include the application to other data sets or object detection diff --git a/glossary.tex b/glossary.tex new file mode 100644 index 0000000..41c1542 --- /dev/null +++ b/glossary.tex @@ -0,0 +1,12 @@ +% acronyms +\newacronym{NMS}{NMS}{non-maximum suppression} +\newacronym{SSD}{SSD}{Single Shot MultiBox Detector} + +% terms +\newglossaryentry{vanilla} +{ + name={vanilla}, + description={ + is used to describe the original state of something + } +} diff --git a/masterthesis.sty b/masterthesis.sty index 60f25ae..7e20f9d 100644 --- a/masterthesis.sty +++ b/masterthesis.sty @@ -102,7 +102,8 @@ \usepackage{makeidx} \makeindex -\usepackage[xindy]{glossaries} % for \printglossary +\usepackage[xindy,toc]{glossaries} % for \printglossary +\setacronymstyle{long-short} \makeglossaries %%% conditional includes @@ -183,7 +184,7 @@ \newcommand{\finish}{% %\clearpage - \printglossary + \printglossaries %\clearpage \printindex diff --git a/thesis.tex b/thesis.tex index 1cc4453..c5506e9 100644 --- a/thesis.tex +++ b/thesis.tex @@ -33,6 +33,8 @@ % specify bib resource \addbibresource{ma.bib} +\input{glossary.tex} + \makeatletter \g@addto@macro\appendix{% \renewcommand*{\chapterformat}{%