Improved a variety of smaller things

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
Jim Martens 2019-09-24 12:22:13 +02:00
parent 7813cafc32
commit 2f9623e3d5
1 changed files with 54 additions and 57 deletions

111
body.tex
View File

@ -7,9 +7,9 @@ providing technical details.
\subsection*{Motivation} \subsection*{Motivation}
Famous examples like the automatic soap dispenser which does not Famous examples like the automatic soap dispenser, which does not
recognise the hand of a black person but dispenses soap when presented recognise the hand of a black person but dispenses soap when presented
with a paper towel raise the question of bias in computer with a paper towel, raise the question of bias in computer
systems~\cite{Friedman1996}. Related to this ethical question regarding systems~\cite{Friedman1996}. Related to this ethical question regarding
the design of so called algorithms is the question of the design of so called algorithms is the question of
algorithmic accountability~\cite{Diakopoulos2014}. algorithmic accountability~\cite{Diakopoulos2014}.
@ -132,26 +132,28 @@ conditions compared to object detection without it.}
For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as For the purpose of this thesis, I will use the vanilla SSD (as in: the original SSD) as
baseline to compare against. In particular, vanilla SSD uses baseline to compare against. In particular, vanilla SSD uses
a per-class confidence threshold of 0.01, an IOU threshold of 0.45 a per-class confidence threshold of 0.01, an IOU threshold of 0.45
for the non-maximum suppression, and a top k value of 200. for the non-maximum suppression, and a top \(k\) value of 200. For this
thesis, the top \(k\) value was changed to 20 and the confidence threshold
of 0.2 was tried as well.
The effect of an entropy threshold is measured against this vanilla The effect of an entropy threshold is measured against this vanilla
SSD by applying entropy thresholds from 0.1 to 2.4 inclusive (limits taken from SSD by applying entropy thresholds from 0.1 to 2.4 inclusive (limits taken from
Miller et al.). Dropout sampling is compared to vanilla SSD, both Miller et al.). Dropout sampling is compared to vanilla SSD
with and without entropy thresholding. with and without entropy thresholding.
\paragraph{Hypothesis} Dropout sampling \paragraph{Hypothesis} Dropout sampling
delivers better object detection performance under open set delivers better object detection performance under open set
conditions compared to object detection without it. conditions compared to object detection without it.
\subsection*{Reader's guide} \subsection*{Reader's Guide}
First, chapter \ref{chap:background} presents related works and First, chapter \ref{chap:background} presents related works and
provides the background for dropout sampling a.k.a Bayesian SSD. provides the background for dropout sampling.
Afterwards, chapter \ref{chap:methods} explains how the Bayesian SSD Afterwards, chapter \ref{chap:methods} explains how vanilla SSD works, how
works and how the decoding pipelines are structured. Bayesian SSD extends vanilla SSD, and how the decoding pipelines are
structured.
Chapter \ref{chap:experiments-results} presents the data sets, Chapter \ref{chap:experiments-results} presents the data sets,
the experimental setup, and the results. This is followed by the experimental setup, and the results. This is followed by
chapter \ref{chap:discussion}, focusing on chapter \ref{chap:discussion}, focusing on the discussion and closing.
the discussion and closing.
Therefore, the contribution is found in chapters \ref{chap:methods}, Therefore, the contribution is found in chapters \ref{chap:methods},
\ref{chap:experiments-results}, and \ref{chap:discussion}. \ref{chap:experiments-results}, and \ref{chap:discussion}.
@ -162,8 +164,7 @@ Therefore, the contribution is found in chapters \ref{chap:methods},
This chapter will begin with an overview over previous works This chapter will begin with an overview over previous works
in the field of this thesis. Afterwards the theoretical foundations in the field of this thesis. Afterwards the theoretical foundations
of the work of Miller et al.~\cite{Miller2018} will of dropout sampling will be explained.
be explained.
\section{Related Works} \section{Related Works}
@ -176,7 +177,7 @@ reconstruction-based novelty detection as it deals only with neural network
approaches. Therefore, the other types of novelty detection will only be approaches. Therefore, the other types of novelty detection will only be
briefly introduced. briefly introduced.
\subsection{Overview over types of novelty detection} \subsection{Overview over types of Novelty Detection}
Probabilistic approaches estimate the generative probability density function (pdf) Probabilistic approaches estimate the generative probability density function (pdf)
of the data. It is assumed that the training data is generated from an underlying of the data. It is assumed that the training data is generated from an underlying
@ -208,7 +209,7 @@ difference in the metric when removed from the data set. This subset is consider
to consist of novel data. For example, Filippone and Sanguinetti \cite{Filippone2011} provide to consist of novel data. For example, Filippone and Sanguinetti \cite{Filippone2011} provide
a recent approach. a recent approach.
\subsection{Reconstruction-based novelty detection} \subsection{Reconstruction-based Novelty Detection}
Reconstruction-based approaches use the reconstruction error in one form Reconstruction-based approaches use the reconstruction error in one form
or another to calculate the novelty score. This can be auto-encoders that or another to calculate the novelty score. This can be auto-encoders that
@ -224,7 +225,7 @@ Novelty detection for object detection is intricately linked with
open set conditions: the test data can contain unknown classes. open set conditions: the test data can contain unknown classes.
Bishop~\cite{Bishop1994} investigated the correlation between Bishop~\cite{Bishop1994} investigated the correlation between
the degree of novel input data and the reliability of network the degree of novel input data and the reliability of network
outputs. outputs, and introduced a quantitative way to measure novelty.
The Bayesian approach provides a theoretical foundation for The Bayesian approach provides a theoretical foundation for
modelling uncertainty \cite{Ghahramani2015}. modelling uncertainty \cite{Ghahramani2015}.
@ -259,20 +260,17 @@ Li et al.~\cite{Li2019} investigated the problem of poor performance
when combining dropout and batch normalisation: Dropout shifts the variance when combining dropout and batch normalisation: Dropout shifts the variance
of a neural unit when switching from train to test, batch normalisation of a neural unit when switching from train to test, batch normalisation
does not change the variance. This inconsistency leads to a variance shift which does not change the variance. This inconsistency leads to a variance shift which
can have a larger or smaller impact based on the network used. For example, can have a larger or smaller impact based on the network used.
adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like
Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses
batch normalisation.
Non-Bayesian approaches have been developed as well. Usually, they compare with Non-Bayesian approaches have been developed as well. Usually, they compare with
MC dropout and show better performance. MC dropout and show better performance.
Postels et al.~\cite{Postels2019} provided a sampling-free approach for Postels et al.~\cite{Postels2019} provided a sampling-free approach for
uncertainty estimation that does not affect training and approximates the uncertainty estimation that does not affect training and approximates the
sampling on test time. They compared it to MC dropout and found less computational sampling at test time. They compared it to MC dropout and found less computational
overhead with better results. overhead with better results.
Lakshminarayanan et al.~\cite{Lakshminarayanan2017} Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
implemented a predictive uncertainty estimation using deep ensembles. implemented a predictive uncertainty estimation using deep ensembles.
Compared to MC dropout, it showed better results. Compared to MC dropout, it shows better results.
Geifman et al.~\cite{Geifman2018} Geifman et al.~\cite{Geifman2018}
introduced an uncertainty estimation algorithm for non-Bayesian deep introduced an uncertainty estimation algorithm for non-Bayesian deep
neural classification that estimates the uncertainty of highly neural classification that estimates the uncertainty of highly
@ -288,10 +286,10 @@ are important as well. Mukhoti and Gal~\cite{Mukhoti2018}
contributed metrics to measure uncertainty for semantic contributed metrics to measure uncertainty for semantic
segmentation. Wu et al.~\cite{Wu2019} introduced two innovations segmentation. Wu et al.~\cite{Wu2019} introduced two innovations
that turn variational Bayes into a robust tool for Bayesian that turn variational Bayes into a robust tool for Bayesian
networks: a novel deterministic method to approximate networks: first, a novel deterministic method to approximate
moments in neural networks which eliminates gradient variance, and moments in neural networks which eliminates gradient variance, and
a hierarchical prior for parameters and an empirical Bayes procedure to select second, a hierarchical prior for parameters and an empirical Bayes
prior variances. procedure to select prior variances.
\section{Background for Dropout Sampling} \section{Background for Dropout Sampling}
@ -342,7 +340,7 @@ over the network weights, for example a Gaussian prior distribution:
\(\mathbf{W}\) are the weights and \(I\) symbolises that every \(\mathbf{W}\) are the weights and \(I\) symbolises that every
weight is drawn from an independent and identical distribution. The weight is drawn from an independent and identical distribution. The
training of the network determines a plausible set of weights by training of the network determines a plausible set of weights by
evaluating the posterior (probability output) over the weights given evaluating the probability output (posterior) over the weights given
the training data \(\mathbf{T}\): \(p(\mathbf{W}|\mathbf{T})\). the training data \(\mathbf{T}\): \(p(\mathbf{W}|\mathbf{T})\).
However, this However, this
evaluation cannot be performed in any reasonable evaluation cannot be performed in any reasonable
@ -369,7 +367,7 @@ training data \(\mathbf{T}\):
p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i
\end{equation} \end{equation}
With this dropout sampling technique \(n\) model weights With this dropout sampling technique, \(n\) model weights
\(\widetilde{\mathbf{W}}_i\) are sampled from the posterior \(\widetilde{\mathbf{W}}_i\) are sampled from the posterior
\(p(\mathbf{W}|\mathbf{T})\). The class probability \(p(\mathbf{W}|\mathbf{T})\). The class probability
\(p(y|\mathcal{I}, \mathbf{T})\) is a probability vector \(p(y|\mathcal{I}, \mathbf{T})\) is a probability vector
@ -437,8 +435,8 @@ Vanilla SSD is based upon the VGG-16 network (see figure
\ref{fig:vanilla-ssd}) and adds extra feature layers. The entire \ref{fig:vanilla-ssd}) and adds extra feature layers. The entire
image (always size 300x300) is divided up into anchor boxes. During image (always size 300x300) is divided up into anchor boxes. During
training, each of these boxes is mapped to a ground truth box or training, each of these boxes is mapped to a ground truth box or
background. For every anchor box the offset to background. For every anchor box both the offset to
the object, and the class confidences are calculated. The output of the the object and the class confidences are calculated. The output of the
SSD network are the predictions with class confidences, offsets to the SSD network are the predictions with class confidences, offsets to the
anchor box, anchor box coordinates, and variance. The model loss is a anchor box, anchor box coordinates, and variance. The model loss is a
weighted sum of localisation and confidence loss. As the network weighted sum of localisation and confidence loss. As the network
@ -567,7 +565,7 @@ Additionally, all detections with a background prediction of 0.8 or higher are d
The remaining detections are partitioned into observations to The remaining detections are partitioned into observations to
further reduce the size of the output, and further reduce the size of the output, and
to identify uncertainty. This is accomplished by calculating the to identify uncertainty. This is accomplished by calculating the
mutual IOU of every detection with all other detections. Detections mutual IOU score of every detection with all other detections. Detections
with a mutual IOU score of 0.95 or higher are partitioned into an with a mutual IOU score of 0.95 or higher are partitioned into an
observation. Next, the softmax scores and bounding box coordinates of observation. Next, the softmax scores and bounding box coordinates of
all detections in an observation are averaged. all detections in an observation are averaged.
@ -596,7 +594,7 @@ at the end.
This chapter explains the used data sets, how the experiments were This chapter explains the used data sets, how the experiments were
set up, and what the results are. set up, and what the results are.
\section{Data sets} \section{Data Sets}
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
80 classes, from airplanes to toothbrushes many classes are present. 80 classes, from airplanes to toothbrushes many classes are present.
@ -615,7 +613,7 @@ impossible values: bounding box height or width lower than zero,
\(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\), \(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\),
\(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\), \(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\),
and image height lower than \(y_{max}\). In the last two cases the and image height lower than \(y_{max}\). In the last two cases the
bounding box width or height was set to (image width - \(x_{min}\)) or bounding box width and height were set to (image width - \(x_{min}\)) and
(image height - \(y_{min}\)) respectively; (image height - \(y_{min}\)) respectively;
in the other cases the annotation was skipped. in the other cases the annotation was skipped.
If the bounding box width or height afterwards is If the bounding box width or height afterwards is
@ -623,8 +621,9 @@ lower than or equal to zero the annotation was skipped.
SSD accepts 300x300 input images, the MS COCO data set images were SSD accepts 300x300 input images, the MS COCO data set images were
resized to this resolution; the aspect ratio was not kept in the resized to this resolution; the aspect ratio was not kept in the
process. As all images of MS COCO have the same resolution, process. MS COCO contains landscape and portrait images with (640x480)
this led to a uniform distortion of the images. Furthermore, and (480x640) as the resolution. This led to a uniform distortion of the
portrait and landscape images respectively. Furthermore,
the colour channels were swapped from RGB to BGR in order to the colour channels were swapped from RGB to BGR in order to
comply with the SSD implementation. The BGR requirement stems from comply with the SSD implementation. The BGR requirement stems from
the usage of Open CV in SSD: the internal channel order for the usage of Open CV in SSD: the internal channel order for
@ -661,7 +660,7 @@ on the object detection performance.
Bayesian SSD was run with 0.2 confidence threshold and compared Bayesian SSD was run with 0.2 confidence threshold and compared
to vanilla SSD with 0.2 confidence threshold. Coupled with the to vanilla SSD with 0.2 confidence threshold. Coupled with the
entropy threshold, this comparison shows how uncertain the network entropy threshold, this comparison reveals how uncertain the network
is. If it is very certain the dropout sampling should have no is. If it is very certain the dropout sampling should have no
significant impact on the result. Furthermore, in two cases the significant impact on the result. Furthermore, in two cases the
dropout was turned off to isolate the impact of non-maximum suppression dropout was turned off to isolate the impact of non-maximum suppression
@ -701,7 +700,7 @@ in the next chapter.
\hline \hline
Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.209 & 2709 & 0.300 & 0.161 \\ Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.209 & 2709 & 0.300 & 0.161 \\
no dropout - 0.2 conf - NMS \; 10 & 0.371 & \textbf{2335} & 0.365 & \textbf{0.378} \\ no dropout - 0.2 conf - NMS \; 10 & 0.371 & \textbf{2335} & 0.365 & \textbf{0.378} \\
0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.360 & 2595 & 0.367 & 0.353 \\ 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.359 & 2584 & 0.363 & 0.357 \\
0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.325 & 2759 & 0.342 & 0.311 \\ 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.325 & 2759 & 0.342 & 0.311 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9 % 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
@ -754,15 +753,14 @@ With 2335 open set errors, the Bayesian SSD variant with disabled dropout and
enabled non-maximum suppression offers the best performance with respect enabled non-maximum suppression offers the best performance with respect
to open set errors. It also has the best precision (0.378) of all tested to open set errors. It also has the best precision (0.378) of all tested
variants. Furthermore, it provides the best performance among all variants variants. Furthermore, it provides the best performance among all variants
with multiple forward passes except for recall. with multiple forward passes.
Dropout decreases the performance of the network, this can be seen Dropout decreases the performance of the network, this can be seen
in the lower \(F_1\) scores, higher open set errors, and lower precision in the lower \(F_1\) scores, higher open set errors, and lower precision
values. The variant with 0.9 keep ratio outperforms all other Bayesian values. Both dropout variants have worse recall (0.363 and 0.342) than
variants with respect to recall (0.367). The variant with 0.5 keep the variant with disabled dropout.
ratio has worse recall (0.342) than the variant with disabled dropout. However, all variants with multiple forward passes have lower open set
However, all variants with multiple forward passes have lower open set errors errors than all vanilla SSD variants.
than all vanilla SSD variants.
The relation of \(F_1\) score to absolute open set error can be observed The relation of \(F_1\) score to absolute open set error can be observed
in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants
@ -788,7 +786,7 @@ reported figures, such as the ones in Miller et al.~\cite{Miller2018}
\hline \hline
Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.226 & \textbf{809} & 0.229 & 0.224 \\ Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.226 & \textbf{809} & 0.229 & 0.224 \\
no dropout - 0.2 conf - NMS \; 10 & 0.363 & 1057 & 0.321 & 0.420 \\ no dropout - 0.2 conf - NMS \; 10 & 0.363 & 1057 & 0.321 & 0.420 \\
0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.354 & 1150 & 0.321 & 0.396 \\ 0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.355 & 1137 & 0.320 & 0.399 \\
0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.322 & 1264 & 0.307 & 0.340 \\ 0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.322 & 1264 & 0.307 & 0.340 \\
% entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3 % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
% entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7 % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
@ -838,14 +836,12 @@ vanilla SSD run with multiple forward passes.
With 809 open set errors, the Bayesian SSD variant with disabled dropout and With 809 open set errors, the Bayesian SSD variant with disabled dropout and
without non-maximum suppression offers the best performance with respect without non-maximum suppression offers the best performance with respect
to open set errors. The variant without dropout and enabled non-maximum suppression has the best \(F_1\) score (0.363) and best to open set errors. The variant without dropout and enabled non-maximum suppression has the best \(F_1\) score (0.363), the best
precision (0.420) of all Bayesian variants, and ties with the 0.9 keep ratio precision (0.420) and the best recall (0.321) of all Bayesian variants.
variant on recall (0.321).
Dropout decreases the performance of the network, this can be seen Dropout decreases the performance of the network, this can be seen
in the lower \(F_1\) scores, higher open set errors, and lower precision and in the lower \(F_1\) scores, higher open set errors, and lower precision and
recall values. However, all variants with multiple forward passes and recall values. However, all variants with multiple forward passes have lower open set errors than all vanilla SSD
non-maximum suppression have lower open set errors than all vanilla SSD
variants. variants.
The relation of \(F_1\) score to absolute open set error can be observed The relation of \(F_1\) score to absolute open set error can be observed
@ -864,19 +860,19 @@ reported figures, such as the ones in Miller et al.~\cite{Miller2018}
This subsection compares vanilla SSD This subsection compares vanilla SSD
with Bayesian SSD with respect to specific images that illustrate with Bayesian SSD with respect to specific images that illustrate
similarities and differences between both approaches. For this similarities and differences between both approaches. For this
comparison, 0.2 confidence threshold is applied. Furthermore, Bayesian comparison, a 0.2 confidence threshold is applied. Furthermore, Bayesian
SSD uses non-maximum suppression and dropout with 0.9 keep ratio. SSD uses non-maximum suppression and dropout with 0.9 keep ratio.
\begin{figure} \begin{figure}
\begin{minipage}[t]{0.48\textwidth} \begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_vanilla} \includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_vanilla}
\caption{Image with stop sign and truck at right edge. Ground truth in blue and predictions in red. Predictions are from vanilla SSD.} \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from vanilla SSD.}
\label{fig:stop-sign-truck-vanilla} \label{fig:stop-sign-truck-vanilla}
\end{minipage}% \end{minipage}%
\hfill \hfill
\begin{minipage}[t]{0.48\textwidth} \begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_bayesian} \includegraphics[width=\textwidth]{COCO_val2014_000000336587_bboxes_bayesian}
\caption{Image with stop sign and truck at right edge. Ground truth in blue and predictions in red. Predictions are from Bayesian SSD with 0.9 keep ratio.} \caption{Image with stop sign and truck at right edge. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian SSD with 0.9 keep ratio.}
\label{fig:stop-sign-truck-bayesian} \label{fig:stop-sign-truck-bayesian}
\end{minipage} \end{minipage}
\end{figure} \end{figure}
@ -889,13 +885,13 @@ that overwhelmingly lie outside the image frame. Furthermore, the predictions ar
\begin{figure} \begin{figure}
\begin{minipage}[t]{0.48\textwidth} \begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_vanilla} \includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_vanilla}
\caption{Image with a cat and laptop/TV. Ground truth in blue and predictions in red and rounded to three digits. Predictions are from vanilla SSD.} \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from vanilla SSD.}
\label{fig:cat-laptop-vanilla} \label{fig:cat-laptop-vanilla}
\end{minipage}% \end{minipage}%
\hfill \hfill
\begin{minipage}[t]{0.48\textwidth} \begin{minipage}[t]{0.48\textwidth}
\includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_bayesian} \includegraphics[width=\textwidth]{COCO_val2014_000000403817_bboxes_bayesian}
\caption{Image with a cat and laptop/TV. Ground truth in blue and predictions in red and rounded to three digits. Predictions are from Bayesian SSD with 0.9 keep ratio.} \caption{Image with a cat and laptop/TV. Ground truth in blue, predictions in red, and rounded to three digits. Predictions are from Bayesian SSD with 0.9 keep ratio.}
\label{fig:cat-laptop-bayesian} \label{fig:cat-laptop-bayesian}
\end{minipage} \end{minipage}
\end{figure} \end{figure}
@ -917,7 +913,7 @@ The results clearly do not support the hypothesis: \textit{Dropout sampling deli
is no area where dropout sampling performs better than vanilla SSD. In the is no area where dropout sampling performs better than vanilla SSD. In the
remainder of the section the individual results will be interpreted. remainder of the section the individual results will be interpreted.
\subsection*{Impact of averaging} \subsection*{Impact of Averaging}
Micro and macro averaging create largely similar results. Notably, micro Micro and macro averaging create largely similar results. Notably, micro
averaging has a significant performance increase towards the end averaging has a significant performance increase towards the end
@ -945,7 +941,7 @@ threshold is not the largest threshold tested. A lower threshold likely
eliminated some false positives from the result set. On the other hand a eliminated some false positives from the result set. On the other hand a
too low threshold likely eliminated true positives as well. too low threshold likely eliminated true positives as well.
\subsection*{Non-maximum suppression and top \(k\)} \subsection*{Non-Maximum Suppression and Top \(k\)}
Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
in their implementation of dropout sampling. Therefore, a variant with disabled in their implementation of dropout sampling. Therefore, a variant with disabled
@ -957,7 +953,7 @@ a lot more false positives remain and have a negative impact on precision.
In combination with top \(k\) selection, recall can be affected: In combination with top \(k\) selection, recall can be affected:
duplicate detections could stay and maxima boxes could be removed. duplicate detections could stay and maxima boxes could be removed.
The number of observations was measured before and after the entropy threshold/NMS filter: both Bayesian SSD without The number of observations was measured before and after the combination of entropy threshold and NMS filter: both Bayesian SSD without
NMS and dropout, and Bayesian SSD with NMS and disabled dropout NMS and dropout, and Bayesian SSD with NMS and disabled dropout
have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with NMS has roughly 23\% of its observations left. have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with NMS has roughly 23\% of its observations left.
Without NMS 79\% of observations are left. Irrespective of the absolute Without NMS 79\% of observations are left. Irrespective of the absolute
@ -988,7 +984,8 @@ kept by top \(k\). However, persons are likely often on images
with many detections and/or have too low confidences. with many detections and/or have too low confidences.
In this example, the likelihood for true positives to be removed in In this example, the likelihood for true positives to be removed in
the person category is quite high. For dogs, the probability is far lower. the person category is quite high. For dogs, the probability is far lower.
This goes back to micro and macro averaging and their impact on recall. This is a good example for micro and macro averaging, and their impact on
recall.
\subsection*{Dropout Sampling and Observations} \subsection*{Dropout Sampling and Observations}
@ -1043,7 +1040,7 @@ questions that cannot be answered in this thesis. This thesis offers
one possible implementation of dropout sampling that technically works. one possible implementation of dropout sampling that technically works.
However, this thesis cannot answer why this implementation differs significantly However, this thesis cannot answer why this implementation differs significantly
from Miller et al. The complete source code or otherwise exhaustive from Miller et al. The complete source code or otherwise exhaustive
implementation details would be required to attempt an answer. implementation details of Miller et al. would be required to attempt an answer.
Future work could explore the performance of this implementation when used Future work could explore the performance of this implementation when used
on an SSD variant that was fine-tuned or trained with dropout. In this case, it on an SSD variant that was fine-tuned or trained with dropout. In this case, it