Improved thesis based upon feedback
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
bca14cd8b4
commit
dc976932f8
|
@ -1,12 +1,12 @@
|
|||
\clearpage
|
||||
\section*{Acknowledgement}
|
||||
|
||||
I would like to thank for the continued support, suggestions, and advise
|
||||
from my super-visor Prof. Dr. Simone Frintrop and co-supervisor Dr.
|
||||
I would like to thank for the continued support, suggestions, and advice
|
||||
from my supervisor Prof. Dr. Simone Frintrop and co-supervisor Dr.
|
||||
Mikko Lauri.
|
||||
|
||||
Additionally, I would like to thank my friends and family for the continued
|
||||
support and sometimes helpful questions. Especially in some hard times
|
||||
Additionally, I would like to thank my friends and family for their continued
|
||||
support and helpful questions. Especially during some hard times
|
||||
their support was invaluable.
|
||||
|
||||
Furthermore, I am grateful for the Fridays for Future movement
|
||||
|
|
22
appendix.tex
22
appendix.tex
|
@ -1,23 +1,23 @@
|
|||
\chapter{Software and Source Code Design}
|
||||
|
||||
The source code of many published papers is either not available
|
||||
or seems like an afterthought: it is poorly documented, difficult
|
||||
or is of bad quality: it is poorly documented, difficult
|
||||
to integrate into your own work, and often does not follow common
|
||||
software development best practices. Moreover, with Tensorflow,
|
||||
PyTorch, and Caffe there are at least three machine learning
|
||||
frameworks. Every research team seems to prefer another framework
|
||||
and sometimes even develops their own; this makes it difficult
|
||||
frameworks. Every research team seems to prefer another framework,
|
||||
and, occasionally, even develops their own; this makes it difficult
|
||||
to combine the work of different authors.
|
||||
In addition to all this, most papers do not contain proper information
|
||||
regarding the implementation details, making it difficult to
|
||||
accurately replicate them if their source code is not available.
|
||||
In addition to this, most papers do not contain proper information
|
||||
regarding implementation details, making it difficult to
|
||||
accurately replicate their results, if their source code is not available.
|
||||
|
||||
Therefore, it was clear to me: I will release my source code and
|
||||
make it available as Python package on the PyPi package index.
|
||||
Therefore, I will release my source code and
|
||||
make it available as a Python package on the PyPi package index.
|
||||
This makes it possible for other researchers to simply install
|
||||
a package and use the API to interact with my code. Additionally,
|
||||
the code has been designed to be future proof and work with
|
||||
the announced Tensorflow 2.0 by supporting eager mode.
|
||||
the code has been designed to be future proof, and work with
|
||||
the announced Tensorflow 2.0, by supporting eager mode.
|
||||
|
||||
Furthermore, it is configurable, well documented, and conforms largely
|
||||
to the clean code guidelines: evolvability and extendability among
|
||||
|
@ -38,7 +38,7 @@ can be found in plotting.py, and the ssd.py module contains
|
|||
code to train the SSD and later predict with it.
|
||||
|
||||
Lastly, the SSD implementation from a third party repository
|
||||
has been modified to work inside a Python package architecture and
|
||||
has been modified to work inside a Python package architecture, and
|
||||
with eager mode. It is stored as a Git submodule inside the package
|
||||
repository.
|
||||
|
||||
|
|
159
body.tex
159
body.tex
|
@ -21,7 +21,7 @@ black boxes and prevents any answers to questions of causality.
|
|||
|
||||
However, these questions of causality are of enormous consequence when
|
||||
results of neural networks are used to make life changing decisions:
|
||||
Is a correlation enough to bring forth negative consequences
|
||||
is a correlation enough to bring forth negative consequences
|
||||
for a particular person? And if so, what is the possible defence
|
||||
against math? Similar questions can be raised when looking at computer
|
||||
vision networks that might be used together with so called smart
|
||||
|
@ -29,14 +29,14 @@ vision networks that might be used together with so called smart
|
|||
|
||||
This leads to the need for neural networks to explain their results.
|
||||
Such an explanation must come from the network or an attached piece
|
||||
of technology to allow adoption in mass. Obviously, this setting
|
||||
poses the question, how such an endeavour can be achieved.
|
||||
of technology to allow mass adoption. Obviously, this setting
|
||||
poses the question of how such an endeavour can be achieved.
|
||||
|
||||
For neural networks there are fundamentally two types of tasks:
|
||||
For neural networks there are fundamentally two types of problems:
|
||||
regression and classification. Regression deals with any case
|
||||
where the goal for the network is to come close to an ideal
|
||||
function that connects all data points. Classification, however,
|
||||
describes tasks where the network is supposed to identify the
|
||||
describes problems where the network is supposed to identify the
|
||||
class of any given input. In this thesis, I will work with both.
|
||||
|
||||
\subsection*{Object Detection in Open Set Conditions}
|
||||
|
@ -54,53 +54,51 @@ class of any given input. In this thesis, I will work with both.
|
|||
|
||||
More specifically, I will look at object detection in the open set
|
||||
conditions (see figure \ref{fig:open-set}).
|
||||
In non-technical words this effectively describes
|
||||
the kind of situation you encounter with \gls{CCTV} or robots
|
||||
outside of a laboratory. Both use cameras that record
|
||||
images. Subsequently, a neural network analyses the image
|
||||
and returns a list of detected and classified objects that it
|
||||
found in the image. The problem here is that networks can only
|
||||
In non-technical terms this effectively describes
|
||||
the conditions \gls{CCTV} and robots outside of a laboratory operate in. In both cases images are recorded with cameras. In order to detect objects, a neural network has to analyse the images
|
||||
and return a list of detected and classified objects that it
|
||||
finds in the images. The problem here is that networks can only
|
||||
classify what they know. If presented with an object type that
|
||||
the network was not trained with, as happens frequently in real
|
||||
environments, it will still classify the object and might even
|
||||
have a high confidence in doing so. This is an example for a
|
||||
false positive. Anyone who uses the results of
|
||||
such a network could falsely assume that a high confidence always
|
||||
such a network could falsely assume that a high confidence
|
||||
means the classification is very likely correct. If one uses
|
||||
a proprietary system one might not even be able to find out
|
||||
that the network was never trained on a particular type of object.
|
||||
Therefore, it would be impossible for one to identify the output
|
||||
of the network as false positive.
|
||||
of the network as a false positive.
|
||||
|
||||
This reaffirms the need for automatic explanation. Such a system
|
||||
should recognise by itself that the given object is unknown and
|
||||
hence mark any classification result of the network as meaningless.
|
||||
mark any classification result of the network as meaningless.
|
||||
Technically there are two slightly different approaches that deal
|
||||
with this type of task: model uncertainty and novelty detection.
|
||||
|
||||
Model uncertainty can be measured, for example, with dropout sampling.
|
||||
Dropout layers are usually used only during training but
|
||||
Miller et al.~\cite{Miller2018} use them also during testing
|
||||
to achieve different results for the same image making use of
|
||||
Dropout layers are usually used only during training, but
|
||||
Miller et al.~\cite{Miller2018} also use them during testing
|
||||
to achieve different results for the same image---making use of
|
||||
multiple forward passes. The output scores for the forward passes
|
||||
of the same image are then averaged. If the averaged class
|
||||
probabilities resemble a uniform distribution (every class has
|
||||
the same probability) this symbolises maximum uncertainty. Conversely,
|
||||
if there is one very high probability with every other being very
|
||||
low this signifies a low uncertainty. An unknown object is more
|
||||
likely to cause high uncertainty which allows for an identification
|
||||
low, this signifies a low uncertainty. An unknown object is more
|
||||
likely to cause high uncertainty, which allows for an identification
|
||||
of false positive cases.
|
||||
|
||||
Novelty detection is another approach to solve the task.
|
||||
Novelty detection is another approach to solve the problem.
|
||||
In the realm of neural networks it is usually done with the help of
|
||||
auto-encoders that solve a regression task of finding an
|
||||
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have
|
||||
internally at least two components: an encoder, and a decoder or
|
||||
auto-encoders that try to solve a regression problem of finding an
|
||||
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have,
|
||||
internally, at least two components: an encoder, and a decoder or
|
||||
generator. The job of the encoder is to find an encoding that
|
||||
compresses the input as good as possible while simultaneously
|
||||
compresses the input as well as possible, while simultaneously
|
||||
being as loss-free as possible. The decoder takes this latent
|
||||
representation of the input and has to find a decompression
|
||||
that reconstructs the input as accurate as possible. During
|
||||
representation of the input, and has to find a decompression
|
||||
that reconstructs the input as accurately as possible. During
|
||||
training these auto-encoders learn to reproduce a certain group
|
||||
of object classes. The actual novelty detection takes place
|
||||
during testing: given an image, and the output and loss of the
|
||||
|
@ -148,25 +146,24 @@ conditions compared to object detection without it.
|
|||
|
||||
\subsection*{Reader's Guide}
|
||||
|
||||
First, chapter \ref{chap:background} presents related works and
|
||||
First, chapter \ref{chap:background} presents related works, and
|
||||
provides the background for dropout sampling.
|
||||
Afterwards, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how
|
||||
Thereafter, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how
|
||||
Bayesian \gls{SSD} extends \gls{vanilla} \gls{SSD}, and how the decoding pipelines are
|
||||
structured.
|
||||
Chapter \ref{chap:experiments-results} presents the data sets,
|
||||
the experimental setup, and the results. This is followed by
|
||||
chapter \ref{chap:discussion}, focusing on the discussion and closing.
|
||||
|
||||
Therefore, the contribution is found in chapters \ref{chap:methods},
|
||||
The contribution of this thesis is found in chapters \ref{chap:methods},
|
||||
\ref{chap:experiments-results}, and \ref{chap:discussion}.
|
||||
|
||||
\chapter{Background}
|
||||
|
||||
\label{chap:background}
|
||||
|
||||
This chapter begins with an overview over previous works
|
||||
in the field of this thesis. Afterwards the theoretical foundations
|
||||
of dropout sampling are explained.
|
||||
This chapter begins with an overview of previous works, followed by an explanation of the theoretical
|
||||
foundations of dropout sampling.
|
||||
|
||||
\section{Related Works}
|
||||
|
||||
|
@ -176,7 +173,7 @@ methods published over the previous decade. They showcase probabilistic,
|
|||
distance-based, reconstruction-based, domain-based, and information-theoretic
|
||||
novelty detection. Based on their categorisation, this thesis falls under
|
||||
reconstruction-based novelty detection as it deals only with neural network
|
||||
approaches. Therefore, the other types of novelty detection will only be
|
||||
approaches. The other types of novelty detection will, therefore, only be
|
||||
introduced briefly.
|
||||
|
||||
\subsection{Overview over types of Novelty Detection}
|
||||
|
@ -197,16 +194,16 @@ Both methods are similar to estimating the
|
|||
\gls{pdf} of data, they use well-defined distance metrics to compute the distance
|
||||
between two data points.
|
||||
|
||||
Domain-based novelty detection describes the boundary of the known data, rather
|
||||
than the data itself. Unknown data is identified by its position relative to
|
||||
the boundary. A common implementation for this are support vector machines
|
||||
(e.g. implemented by Song et al. \cite{Song2002}).
|
||||
Domain-based novelty detection describes the boundary of the known data,
|
||||
rather than the data itself. Unknown data is identified by its position
|
||||
relative to the boundary. Support vector machines (e.g. implemented by
|
||||
Song et al. \cite{Song2002}) are a common implementation of this.
|
||||
|
||||
Information-theoretic novelty detection computes the information content
|
||||
of a data set, for example, with metrics like \gls{entropy}. Such metrics assume
|
||||
that novel data inside the data set significantly alters the information
|
||||
content of an otherwise normal data set. First, the metrics are calculated over the
|
||||
whole data set. Afterwards, a subset is identified that causes the biggest
|
||||
whole data set. Second, a subset is identified that causes the biggest
|
||||
difference in the metric when removed from the data set. This subset is considered
|
||||
to consist of novel data. For example, Filippone and Sanguinetti \cite{Filippone2011} provide
|
||||
a recent approach.
|
||||
|
@ -214,7 +211,7 @@ a recent approach.
|
|||
\subsection{Reconstruction-based Novelty Detection}
|
||||
|
||||
Reconstruction-based approaches use the reconstruction error in one form
|
||||
or another to calculate the novelty score. This can be auto-encoders that
|
||||
or another to calculate the novelty score. These can be auto-encoders that
|
||||
literally reconstruct the input but it also includes \gls{MLP} networks which try
|
||||
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiate
|
||||
between neural network-based approaches and subspace methods. The first are
|
||||
|
@ -242,7 +239,7 @@ Gal and Ghahramani~\cite{Gal2016} show that dropout training is a
|
|||
Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
|
||||
shows that dropout training actually corresponds to a general approximate
|
||||
Bayesian model. This means every network trained with dropout is an
|
||||
approximate Bayesian model. During inference the dropout remains active,
|
||||
approximate Bayesian model. During inference the dropout remains active:
|
||||
this form of inference is called \gls{MCDO}.
|
||||
Miller et al.~\cite{Miller2018} build upon the work of Gal and Ghahramani: they
|
||||
use \gls{MCDO} under open-set conditions for object detection.
|
||||
|
@ -261,14 +258,13 @@ Consequently, this technique can be applied to any network that utilises
|
|||
standard batch normalisation.
|
||||
Li et al.~\cite{Li2019} investigate the problem of poor performance
|
||||
when combining dropout and batch normalisation: dropout shifts the variance
|
||||
of a neural unit when switching from train to test, batch normalisation
|
||||
of a neural unit when switching from train to test; batch normalisation
|
||||
does not change the variance. This inconsistency leads to a variance shift which
|
||||
can have a larger or smaller impact based on the network used.
|
||||
|
||||
Non-Bayesian approaches have been developed as well. Usually, they compare with
|
||||
\gls{MCDO} and show better performance.
|
||||
Non-Bayesian approaches have also been developed. Usually they are compared with \gls{MCDO} and show better performance.
|
||||
Postels et al.~\cite{Postels2019} provide a sampling-free approach for
|
||||
uncertainty estimation that does not affect training and approximates the
|
||||
uncertainty estimation that does not affect training, and approximates the
|
||||
sampling at test time. They compare it to \gls{MCDO} and find less computational
|
||||
overhead with better results.
|
||||
Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
|
||||
|
@ -279,7 +275,7 @@ introduce an uncertainty estimation algorithm for non-Bayesian deep
|
|||
neural classification that estimates the uncertainty of highly
|
||||
confident points using earlier snapshots of the trained model and improves,
|
||||
among others, the approach introduced by Lakshminarayanan et al.
|
||||
Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
|
||||
Sensoy et al.~\cite{Sensoy2018} explicitly model prediction uncertainty:
|
||||
a \gls{Dirichlet distribution} is placed over the class probabilities. Consequently,
|
||||
the predictions of a neural network are treated as subjective opinions.
|
||||
|
||||
|
@ -348,21 +344,21 @@ training of the network determines a plausible set of weights by
|
|||
evaluating the probability output (\gls{posterior}) over the weights given
|
||||
the training data \(\mathbf{T}\): \(p(\mathbf{W}|\mathbf{T})\).
|
||||
However, this
|
||||
evaluation cannot be performed in any reasonable
|
||||
evaluation cannot be performed in any reasonable amount of
|
||||
time. Therefore approximation techniques are
|
||||
required. In those techniques the \gls{posterior} is fitted with a
|
||||
simple distribution \(q^{*}_{\theta}(\mathbf{W})\). The original
|
||||
and intractable problem of averaging over all weights in the network
|
||||
is replaced with an optimisation task, where the parameters of the
|
||||
simple distribution are optimised over~\cite{Kendall2017}.
|
||||
is replaced with an optimisation task: the parameters of the
|
||||
simple distribution are optimised~\cite{Kendall2017}.
|
||||
|
||||
\subsubsection*{Dropout Variational Inference}
|
||||
|
||||
Kendall and Gal~\cite{Kendall2017} show an approximation for
|
||||
classfication and recognition tasks. Dropout variational inference
|
||||
is a practical approximation technique by adding dropout layers
|
||||
in front of every weight layer and using them also during test
|
||||
time to sample from the approximate \gls{posterior}. Effectively, this
|
||||
in front of every weight layer and also using them during test
|
||||
time to sample from the approximate \gls{posterior}. In effect, this
|
||||
results in the approximation of the class probability
|
||||
\(p(y|\mathcal{I}, \mathbf{T})\) by performing \(n\) forward
|
||||
passes through the network and averaging the so obtained softmax
|
||||
|
@ -479,7 +475,7 @@ and very low confidences in other classes.
|
|||
\subsection{Implementation Details}
|
||||
|
||||
For this thesis, an \gls{SSD} implementation based on Tensorflow~\cite{Abadi2015} and
|
||||
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
|
||||
Keras~\cite{Chollet2015}
|
||||
is used. It has been modified to support \gls{entropy} thresholding,
|
||||
partitioning of observations, and dropout
|
||||
layers in the \gls{SSD} model. Entropy thresholding takes place before
|
||||
|
@ -517,7 +513,7 @@ confidence thresholding and a subsequent \gls{NMS}.
|
|||
All boxes that pass \gls{NMS} are added to a
|
||||
per image maxima list. One box could make the confidence threshold
|
||||
for multiple classes and, hence, be present multiple times in the
|
||||
maxima list for the image. Lastly, a total of \(k\) boxes with the
|
||||
maxima list for the image. In the end, a total of \(k\) boxes with the
|
||||
highest confidences is kept per image across all classes. The
|
||||
original implementation uses a confidence threshold of \(0.01\), an
|
||||
IOU threshold for \gls{NMS} of \(0.45\) and a top \(k\)
|
||||
|
@ -548,7 +544,7 @@ confidence threshold is required.
|
|||
|
||||
\subsection{Vanilla SSD with Entropy Thresholding}
|
||||
|
||||
Vanilla \gls{SSD} with \gls{entropy} tresholding adds an additional component
|
||||
Vanilla \gls{SSD} with \gls{entropy} thresholding adds an additional component
|
||||
to the filtering already done for \gls{vanilla} \gls{SSD}. The \gls{entropy} is
|
||||
calculated from all \(\#nr\_classes\) softmax scores in a prediction.
|
||||
Only predictions with a low enough \gls{entropy} pass the \gls{entropy}
|
||||
|
@ -558,8 +554,8 @@ false positive or false negative cases with high confidence values.
|
|||
|
||||
\subsection{Bayesian SSD with Entropy Thresholding}
|
||||
|
||||
Bayesian \gls{SSD} has the speciality of multiple forward passes. Based
|
||||
on the information in the paper, the detections of all forward passes
|
||||
Bayesian \gls{SSD} uses multiple forward passes. Based
|
||||
on the information from Miller et al.~\cite{Miller2018}, the detections of all forward passes
|
||||
are grouped per image but not by forward pass. This leads
|
||||
to the following shape of the network output after all
|
||||
forward passes: \((batch\_size, \#nr\_boxes \, \cdot \, \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
|
||||
|
@ -576,7 +572,7 @@ mutual IOU score of every detection with all other detections. Detections
|
|||
with a mutual IOU score of 0.95 or higher are partitioned into an
|
||||
observation. Next, the softmax scores and bounding box coordinates of
|
||||
all detections in an observation are averaged.
|
||||
There can be a different number of observations for every image which
|
||||
There can be a different number of observations for every image, which
|
||||
destroys homogenity and prevents batch-wise calculation of the
|
||||
results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
|
||||
|
||||
|
@ -598,14 +594,14 @@ at the end.
|
|||
|
||||
\label{chap:experiments-results}
|
||||
|
||||
This chapter explains the used data sets, how the experiments have been
|
||||
set up, and what the results are.
|
||||
This chapter explains the data sets used, and how the experiments have been
|
||||
set up. Furthermore, it presents the results.
|
||||
|
||||
\section{Data Sets}
|
||||
|
||||
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
|
||||
80 classes, their range is illustrated by two classes: airplanes and toothbrushes.
|
||||
The images are taken by camera from the real world, ground truth
|
||||
The images are real world images, ground truth
|
||||
is provided for all images. The data set supports object detection,
|
||||
keypoint detection, and panoptic segmentation (scene segmentation).
|
||||
|
||||
|
@ -779,7 +775,7 @@ The relation of \(F_1\) score to absolute open set error can be observed
|
|||
in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants
|
||||
can be seen in figure \ref{fig:precision-recall-micro}. Both \gls{vanilla} \gls{SSD}
|
||||
variants with 0.01 confidence threshold reach a much higher open set error
|
||||
and a higher recall. This behaviour is expected as more and worse predictions
|
||||
and a higher recall. This behaviour is to be expected as more and worse predictions
|
||||
are included.
|
||||
All plotted variants show a similar behaviour that is in line with previously
|
||||
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
||||
|
@ -861,7 +857,7 @@ The relation of \(F_1\) score to absolute open set error can be observed
|
|||
in figure \ref{fig:ose-f1-macro}. Precision-recall curves for all variants
|
||||
can be seen in figure \ref{fig:precision-recall-macro}. Both \gls{vanilla} \gls{SSD}
|
||||
variants with 0.01 confidence threshold reach a much higher open set error
|
||||
and a higher recall. This behaviour is expected as more and worse predictions
|
||||
and a higher recall. This behaviour is to be expected as more and worse predictions
|
||||
are included.
|
||||
All plotted variants show a similar behaviour that is in line with previously
|
||||
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
||||
|
@ -878,9 +874,9 @@ only 0.7\% of the ground truth. With this share, it is below
|
|||
the average of roughly 0.9\% for each of the 56 classes that make up the
|
||||
second half of the ground truth.
|
||||
|
||||
In some cases, multiple variants have seemingly the same performance
|
||||
but only one or some of them are marked bold. This is informed by
|
||||
differences prior to rounding. If two or more variants are marked bold
|
||||
In some cases, multiple variants have apparently the same performance
|
||||
but only one or some of them are marked bold. This is caused by
|
||||
differences prior to rounding: if two or more variants are marked bold
|
||||
they had the exact same performance before rounding.
|
||||
|
||||
\begin{table}[tbp]
|
||||
|
@ -909,11 +905,9 @@ they had the exact same performance before rounding.
|
|||
\end{table}
|
||||
|
||||
The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
|
||||
best in the persons class with a max \(F_1\) score of 0.460, as well as
|
||||
recall of 0.405 and precision of 0.533 at the max \(F_1\) score.
|
||||
It shares the first place in recall with the \gls{vanilla} \gls{SSD}
|
||||
variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants
|
||||
perform worse than the \gls{vanilla} \gls{SSD} variants (see table
|
||||
best in the persons class: it has a max \(F_1\) score of 0.460, consisting of a recall of 0.405 and a precision of 0.533.
|
||||
The variant shares the first place in recall with the \gls{vanilla} \gls{SSD}
|
||||
variant that uses a 0.01 confidence threshold. All Bayesian \gls{SSD} variants perform worse than the \gls{vanilla} \gls{SSD} variants (see table
|
||||
\ref{tab:results-persons}). With respect to the macro averaged result,
|
||||
all variants perform better than the average of all classes.
|
||||
|
||||
|
@ -951,7 +945,7 @@ variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep
|
|||
ratio have a better precision (0.460 and 0.454 respectively) than the
|
||||
\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
|
||||
0.453). With respect to the macro averaged result, all variants have
|
||||
a better precision than the average and the Bayesian variant without
|
||||
a better precision than the average. The Bayesian variant without
|
||||
\gls{NMS} and dropout also has a better recall and \(F_1\) score.
|
||||
|
||||
\begin{table}[tbp]
|
||||
|
@ -983,7 +977,7 @@ The best \(F_1\) score (0.288) and recall (0.251) for the chairs class
|
|||
belongs to \gls{vanilla} \gls{SSD} with \gls{entropy} threshold. Precision
|
||||
is mastered by Bayesian \gls{SSD} with \gls{NMS} and disabled dropout (0.360).
|
||||
The variant with 0.9 keep ratio has the second-highest precision (0.343)
|
||||
of all variants. Both in \(F_1\) score and recall all Bayesian variants
|
||||
of all variants. Both in \(F_1\) score and recall, all Bayesian variants
|
||||
are worse than the \gls{vanilla} variants. Compared with the macro averaged
|
||||
results, all variants perform worse than the average.
|
||||
|
||||
|
@ -1077,7 +1071,7 @@ ratio.
|
|||
\end{figure}
|
||||
|
||||
The ground truth only contains a stop sign and a truck. The differences between \gls{vanilla} \gls{SSD} and Bayesian \gls{SSD} are almost not visible
|
||||
(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian \gls{SSD}, instead both detected a pottet plant and a traffic light. The stop sign is detected by both variants.
|
||||
(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian \gls{SSD}, instead both detected a "potted plant" and a traffic light. The stop sign is detected by both variants.
|
||||
This behaviour implies problems with detecting objects at the edge
|
||||
that overwhelmingly lie outside the image frame. Furthermore, the predictions are usually identical.
|
||||
|
||||
|
@ -1095,9 +1089,11 @@ that overwhelmingly lie outside the image frame. Furthermore, the predictions ar
|
|||
\end{minipage}
|
||||
\end{figure}
|
||||
|
||||
Another example (see figures \ref{fig:cat-laptop-vanilla} and \ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background on the right
|
||||
side. Both variants detect a cat but the \gls{vanilla} variant detects a dog as well. The laptop and TV are not detected but this is expected since
|
||||
these classes have not been trained.
|
||||
Another example (see figures \ref{fig:cat-laptop-vanilla} and
|
||||
\ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background
|
||||
on the right side. Both variants detect a cat but the \gls{vanilla}
|
||||
variant detects a dog as well. The laptop and TV are not detected but this
|
||||
is to be expected since these classes have not been trained.
|
||||
|
||||
\chapter{Discussion and Outlook}
|
||||
|
||||
|
@ -1153,7 +1149,7 @@ open set error continues to rise a bit.
|
|||
There is no visible impact of \gls{entropy} thresholding on the object detection
|
||||
performance for \gls{vanilla} \gls{SSD}. This indicates that the network has almost no
|
||||
uniform or close to uniform predictions, the vast majority of predictions
|
||||
has a high confidence in one class---including the background.
|
||||
have a high confidence in one class---including the background.
|
||||
However, the \gls{entropy} plays a larger role for the Bayesian variants---as
|
||||
expected: the best performing thresholds are 1.0, 1.3, and 1.4 for micro averaging,
|
||||
and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
|
||||
|
@ -1190,7 +1186,7 @@ threshold indicates a worse performance.
|
|||
|
||||
Miller et al.~\cite{Miller2018} supposedly do not use \gls{NMS}
|
||||
in their implementation of dropout sampling. Therefore, a variant with disabled \glslocalreset{NMS}
|
||||
\gls{NMS} has been tested. The results are somewhat expected:
|
||||
\gls{NMS} has been tested. The results are somewhat as expected:
|
||||
\gls{NMS} removes all non-maximum detections that overlap
|
||||
with a maximum one. This reduces the number of multiple detections per
|
||||
ground truth bounding box and therefore the false positives. Without it,
|
||||
|
@ -1208,7 +1204,7 @@ more than 50\% of the original observations are removed with \gls{NMS} and
|
|||
stay without---all of these are very likely to be false positives.
|
||||
|
||||
A clear distinction between micro and macro averaging can be observed:
|
||||
recall is hardly effected with micro averaging (0.300) but goes down equally with macro averaging (0.229). For micro averaging, it does
|
||||
recall is hardly affected with micro averaging (0.300) but goes down noticeably with macro averaging (0.229). For micro averaging, it does
|
||||
not matter which class the true positives belong to: every detection
|
||||
counts the same way. This also means that top \(k\) will have only
|
||||
a marginal effect: some true positives might be removed without \gls{NMS} but overall that does not have a big impact. With macro averaging, however,
|
||||
|
@ -1256,7 +1252,7 @@ recall.
|
|||
\end{table}
|
||||
|
||||
The dropout variants have largely worse performance than the Bayesian variants
|
||||
without dropout. This is expected as the network was not trained with
|
||||
without dropout. This is to be expected as the network was not trained with
|
||||
dropout and the weights are not prepared for it.
|
||||
|
||||
Gal~\cite{Gal2017}
|
||||
|
@ -1282,7 +1278,7 @@ more than 430 million detections remain (see table \ref{tab:effect-dropout} for
|
|||
has slightly fewer predictions left compared to the one without dropout.
|
||||
|
||||
After the grouping, the variant without dropout has on average between
|
||||
10 and 11 detections grouped into an observation. This is expected as every
|
||||
10 and 11 detections grouped into an observation. This is to be expected as every
|
||||
forward pass creates the exact same result and these ten identical detections
|
||||
per \gls{vanilla} \gls{SSD} detection perfectly overlap. The fact that slightly more than
|
||||
ten detections are grouped together could explain the marginally better precision
|
||||
|
@ -1316,5 +1312,4 @@ networks.
|
|||
|
||||
To facilitate future work based on this thesis, the source code will be
|
||||
made available and an installable Python package will be uploaded to the
|
||||
PyPi package index. In the appendices can be found more details about the
|
||||
source code implementation as well as more figures.
|
||||
PyPi package index. More details about the source code implementation and additional figures can be found in the appendices.
|
||||
|
|
9
ma.bib
9
ma.bib
|
@ -909,4 +909,13 @@ to construct explicit models for non-normal classes. Application includes infere
|
|||
timestamp = {2019.09.09},
|
||||
}
|
||||
|
||||
@Misc{Chollet2015,
|
||||
author = {Chollet, Fran\c{c}ois and others},
|
||||
title = {Keras},
|
||||
year = {2015},
|
||||
howpublished = {\url{https://keras.io}},
|
||||
owner = {jim},
|
||||
timestamp = {2019.10.04},
|
||||
}
|
||||
|
||||
@Comment{jabref-meta: databaseType:biblatex;}
|
||||
|
|
Loading…
Reference in New Issue