Improved thesis based upon feedback
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
bca14cd8b4
commit
dc976932f8
|
@ -1,12 +1,12 @@
|
||||||
\clearpage
|
\clearpage
|
||||||
\section*{Acknowledgement}
|
\section*{Acknowledgement}
|
||||||
|
|
||||||
I would like to thank for the continued support, suggestions, and advise
|
I would like to thank for the continued support, suggestions, and advice
|
||||||
from my super-visor Prof. Dr. Simone Frintrop and co-supervisor Dr.
|
from my supervisor Prof. Dr. Simone Frintrop and co-supervisor Dr.
|
||||||
Mikko Lauri.
|
Mikko Lauri.
|
||||||
|
|
||||||
Additionally, I would like to thank my friends and family for the continued
|
Additionally, I would like to thank my friends and family for their continued
|
||||||
support and sometimes helpful questions. Especially in some hard times
|
support and helpful questions. Especially during some hard times
|
||||||
their support was invaluable.
|
their support was invaluable.
|
||||||
|
|
||||||
Furthermore, I am grateful for the Fridays for Future movement
|
Furthermore, I am grateful for the Fridays for Future movement
|
||||||
|
|
22
appendix.tex
22
appendix.tex
|
@ -1,23 +1,23 @@
|
||||||
\chapter{Software and Source Code Design}
|
\chapter{Software and Source Code Design}
|
||||||
|
|
||||||
The source code of many published papers is either not available
|
The source code of many published papers is either not available
|
||||||
or seems like an afterthought: it is poorly documented, difficult
|
or is of bad quality: it is poorly documented, difficult
|
||||||
to integrate into your own work, and often does not follow common
|
to integrate into your own work, and often does not follow common
|
||||||
software development best practices. Moreover, with Tensorflow,
|
software development best practices. Moreover, with Tensorflow,
|
||||||
PyTorch, and Caffe there are at least three machine learning
|
PyTorch, and Caffe there are at least three machine learning
|
||||||
frameworks. Every research team seems to prefer another framework
|
frameworks. Every research team seems to prefer another framework,
|
||||||
and sometimes even develops their own; this makes it difficult
|
and, occasionally, even develops their own; this makes it difficult
|
||||||
to combine the work of different authors.
|
to combine the work of different authors.
|
||||||
In addition to all this, most papers do not contain proper information
|
In addition to this, most papers do not contain proper information
|
||||||
regarding the implementation details, making it difficult to
|
regarding implementation details, making it difficult to
|
||||||
accurately replicate them if their source code is not available.
|
accurately replicate their results, if their source code is not available.
|
||||||
|
|
||||||
Therefore, it was clear to me: I will release my source code and
|
Therefore, I will release my source code and
|
||||||
make it available as Python package on the PyPi package index.
|
make it available as a Python package on the PyPi package index.
|
||||||
This makes it possible for other researchers to simply install
|
This makes it possible for other researchers to simply install
|
||||||
a package and use the API to interact with my code. Additionally,
|
a package and use the API to interact with my code. Additionally,
|
||||||
the code has been designed to be future proof and work with
|
the code has been designed to be future proof, and work with
|
||||||
the announced Tensorflow 2.0 by supporting eager mode.
|
the announced Tensorflow 2.0, by supporting eager mode.
|
||||||
|
|
||||||
Furthermore, it is configurable, well documented, and conforms largely
|
Furthermore, it is configurable, well documented, and conforms largely
|
||||||
to the clean code guidelines: evolvability and extendability among
|
to the clean code guidelines: evolvability and extendability among
|
||||||
|
@ -38,7 +38,7 @@ can be found in plotting.py, and the ssd.py module contains
|
||||||
code to train the SSD and later predict with it.
|
code to train the SSD and later predict with it.
|
||||||
|
|
||||||
Lastly, the SSD implementation from a third party repository
|
Lastly, the SSD implementation from a third party repository
|
||||||
has been modified to work inside a Python package architecture and
|
has been modified to work inside a Python package architecture, and
|
||||||
with eager mode. It is stored as a Git submodule inside the package
|
with eager mode. It is stored as a Git submodule inside the package
|
||||||
repository.
|
repository.
|
||||||
|
|
||||||
|
|
159
body.tex
159
body.tex
|
@ -21,7 +21,7 @@ black boxes and prevents any answers to questions of causality.
|
||||||
|
|
||||||
However, these questions of causality are of enormous consequence when
|
However, these questions of causality are of enormous consequence when
|
||||||
results of neural networks are used to make life changing decisions:
|
results of neural networks are used to make life changing decisions:
|
||||||
Is a correlation enough to bring forth negative consequences
|
is a correlation enough to bring forth negative consequences
|
||||||
for a particular person? And if so, what is the possible defence
|
for a particular person? And if so, what is the possible defence
|
||||||
against math? Similar questions can be raised when looking at computer
|
against math? Similar questions can be raised when looking at computer
|
||||||
vision networks that might be used together with so called smart
|
vision networks that might be used together with so called smart
|
||||||
|
@ -29,14 +29,14 @@ vision networks that might be used together with so called smart
|
||||||
|
|
||||||
This leads to the need for neural networks to explain their results.
|
This leads to the need for neural networks to explain their results.
|
||||||
Such an explanation must come from the network or an attached piece
|
Such an explanation must come from the network or an attached piece
|
||||||
of technology to allow adoption in mass. Obviously, this setting
|
of technology to allow mass adoption. Obviously, this setting
|
||||||
poses the question, how such an endeavour can be achieved.
|
poses the question of how such an endeavour can be achieved.
|
||||||
|
|
||||||
For neural networks there are fundamentally two types of tasks:
|
For neural networks there are fundamentally two types of problems:
|
||||||
regression and classification. Regression deals with any case
|
regression and classification. Regression deals with any case
|
||||||
where the goal for the network is to come close to an ideal
|
where the goal for the network is to come close to an ideal
|
||||||
function that connects all data points. Classification, however,
|
function that connects all data points. Classification, however,
|
||||||
describes tasks where the network is supposed to identify the
|
describes problems where the network is supposed to identify the
|
||||||
class of any given input. In this thesis, I will work with both.
|
class of any given input. In this thesis, I will work with both.
|
||||||
|
|
||||||
\subsection*{Object Detection in Open Set Conditions}
|
\subsection*{Object Detection in Open Set Conditions}
|
||||||
|
@ -54,53 +54,51 @@ class of any given input. In this thesis, I will work with both.
|
||||||
|
|
||||||
More specifically, I will look at object detection in the open set
|
More specifically, I will look at object detection in the open set
|
||||||
conditions (see figure \ref{fig:open-set}).
|
conditions (see figure \ref{fig:open-set}).
|
||||||
In non-technical words this effectively describes
|
In non-technical terms this effectively describes
|
||||||
the kind of situation you encounter with \gls{CCTV} or robots
|
the conditions \gls{CCTV} and robots outside of a laboratory operate in. In both cases images are recorded with cameras. In order to detect objects, a neural network has to analyse the images
|
||||||
outside of a laboratory. Both use cameras that record
|
and return a list of detected and classified objects that it
|
||||||
images. Subsequently, a neural network analyses the image
|
finds in the images. The problem here is that networks can only
|
||||||
and returns a list of detected and classified objects that it
|
|
||||||
found in the image. The problem here is that networks can only
|
|
||||||
classify what they know. If presented with an object type that
|
classify what they know. If presented with an object type that
|
||||||
the network was not trained with, as happens frequently in real
|
the network was not trained with, as happens frequently in real
|
||||||
environments, it will still classify the object and might even
|
environments, it will still classify the object and might even
|
||||||
have a high confidence in doing so. This is an example for a
|
have a high confidence in doing so. This is an example for a
|
||||||
false positive. Anyone who uses the results of
|
false positive. Anyone who uses the results of
|
||||||
such a network could falsely assume that a high confidence always
|
such a network could falsely assume that a high confidence
|
||||||
means the classification is very likely correct. If one uses
|
means the classification is very likely correct. If one uses
|
||||||
a proprietary system one might not even be able to find out
|
a proprietary system one might not even be able to find out
|
||||||
that the network was never trained on a particular type of object.
|
that the network was never trained on a particular type of object.
|
||||||
Therefore, it would be impossible for one to identify the output
|
Therefore, it would be impossible for one to identify the output
|
||||||
of the network as false positive.
|
of the network as a false positive.
|
||||||
|
|
||||||
This reaffirms the need for automatic explanation. Such a system
|
This reaffirms the need for automatic explanation. Such a system
|
||||||
should recognise by itself that the given object is unknown and
|
should recognise by itself that the given object is unknown and
|
||||||
hence mark any classification result of the network as meaningless.
|
mark any classification result of the network as meaningless.
|
||||||
Technically there are two slightly different approaches that deal
|
Technically there are two slightly different approaches that deal
|
||||||
with this type of task: model uncertainty and novelty detection.
|
with this type of task: model uncertainty and novelty detection.
|
||||||
|
|
||||||
Model uncertainty can be measured, for example, with dropout sampling.
|
Model uncertainty can be measured, for example, with dropout sampling.
|
||||||
Dropout layers are usually used only during training but
|
Dropout layers are usually used only during training, but
|
||||||
Miller et al.~\cite{Miller2018} use them also during testing
|
Miller et al.~\cite{Miller2018} also use them during testing
|
||||||
to achieve different results for the same image making use of
|
to achieve different results for the same image---making use of
|
||||||
multiple forward passes. The output scores for the forward passes
|
multiple forward passes. The output scores for the forward passes
|
||||||
of the same image are then averaged. If the averaged class
|
of the same image are then averaged. If the averaged class
|
||||||
probabilities resemble a uniform distribution (every class has
|
probabilities resemble a uniform distribution (every class has
|
||||||
the same probability) this symbolises maximum uncertainty. Conversely,
|
the same probability) this symbolises maximum uncertainty. Conversely,
|
||||||
if there is one very high probability with every other being very
|
if there is one very high probability with every other being very
|
||||||
low this signifies a low uncertainty. An unknown object is more
|
low, this signifies a low uncertainty. An unknown object is more
|
||||||
likely to cause high uncertainty which allows for an identification
|
likely to cause high uncertainty, which allows for an identification
|
||||||
of false positive cases.
|
of false positive cases.
|
||||||
|
|
||||||
Novelty detection is another approach to solve the task.
|
Novelty detection is another approach to solve the problem.
|
||||||
In the realm of neural networks it is usually done with the help of
|
In the realm of neural networks it is usually done with the help of
|
||||||
auto-encoders that solve a regression task of finding an
|
auto-encoders that try to solve a regression problem of finding an
|
||||||
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have
|
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have,
|
||||||
internally at least two components: an encoder, and a decoder or
|
internally, at least two components: an encoder, and a decoder or
|
||||||
generator. The job of the encoder is to find an encoding that
|
generator. The job of the encoder is to find an encoding that
|
||||||
compresses the input as good as possible while simultaneously
|
compresses the input as well as possible, while simultaneously
|
||||||
being as loss-free as possible. The decoder takes this latent
|
being as loss-free as possible. The decoder takes this latent
|
||||||
representation of the input and has to find a decompression
|
representation of the input, and has to find a decompression
|
||||||
that reconstructs the input as accurate as possible. During
|
that reconstructs the input as accurately as possible. During
|
||||||
training these auto-encoders learn to reproduce a certain group
|
training these auto-encoders learn to reproduce a certain group
|
||||||
of object classes. The actual novelty detection takes place
|
of object classes. The actual novelty detection takes place
|
||||||
during testing: given an image, and the output and loss of the
|
during testing: given an image, and the output and loss of the
|
||||||
|
@ -148,25 +146,24 @@ conditions compared to object detection without it.
|
||||||
|
|
||||||
\subsection*{Reader's Guide}
|
\subsection*{Reader's Guide}
|
||||||
|
|
||||||
First, chapter \ref{chap:background} presents related works and
|
First, chapter \ref{chap:background} presents related works, and
|
||||||
provides the background for dropout sampling.
|
provides the background for dropout sampling.
|
||||||
Afterwards, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how
|
Thereafter, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how
|
||||||
Bayesian \gls{SSD} extends \gls{vanilla} \gls{SSD}, and how the decoding pipelines are
|
Bayesian \gls{SSD} extends \gls{vanilla} \gls{SSD}, and how the decoding pipelines are
|
||||||
structured.
|
structured.
|
||||||
Chapter \ref{chap:experiments-results} presents the data sets,
|
Chapter \ref{chap:experiments-results} presents the data sets,
|
||||||
the experimental setup, and the results. This is followed by
|
the experimental setup, and the results. This is followed by
|
||||||
chapter \ref{chap:discussion}, focusing on the discussion and closing.
|
chapter \ref{chap:discussion}, focusing on the discussion and closing.
|
||||||
|
|
||||||
Therefore, the contribution is found in chapters \ref{chap:methods},
|
The contribution of this thesis is found in chapters \ref{chap:methods},
|
||||||
\ref{chap:experiments-results}, and \ref{chap:discussion}.
|
\ref{chap:experiments-results}, and \ref{chap:discussion}.
|
||||||
|
|
||||||
\chapter{Background}
|
\chapter{Background}
|
||||||
|
|
||||||
\label{chap:background}
|
\label{chap:background}
|
||||||
|
|
||||||
This chapter begins with an overview over previous works
|
This chapter begins with an overview of previous works, followed by an explanation of the theoretical
|
||||||
in the field of this thesis. Afterwards the theoretical foundations
|
foundations of dropout sampling.
|
||||||
of dropout sampling are explained.
|
|
||||||
|
|
||||||
\section{Related Works}
|
\section{Related Works}
|
||||||
|
|
||||||
|
@ -176,7 +173,7 @@ methods published over the previous decade. They showcase probabilistic,
|
||||||
distance-based, reconstruction-based, domain-based, and information-theoretic
|
distance-based, reconstruction-based, domain-based, and information-theoretic
|
||||||
novelty detection. Based on their categorisation, this thesis falls under
|
novelty detection. Based on their categorisation, this thesis falls under
|
||||||
reconstruction-based novelty detection as it deals only with neural network
|
reconstruction-based novelty detection as it deals only with neural network
|
||||||
approaches. Therefore, the other types of novelty detection will only be
|
approaches. The other types of novelty detection will, therefore, only be
|
||||||
introduced briefly.
|
introduced briefly.
|
||||||
|
|
||||||
\subsection{Overview over types of Novelty Detection}
|
\subsection{Overview over types of Novelty Detection}
|
||||||
|
@ -197,16 +194,16 @@ Both methods are similar to estimating the
|
||||||
\gls{pdf} of data, they use well-defined distance metrics to compute the distance
|
\gls{pdf} of data, they use well-defined distance metrics to compute the distance
|
||||||
between two data points.
|
between two data points.
|
||||||
|
|
||||||
Domain-based novelty detection describes the boundary of the known data, rather
|
Domain-based novelty detection describes the boundary of the known data,
|
||||||
than the data itself. Unknown data is identified by its position relative to
|
rather than the data itself. Unknown data is identified by its position
|
||||||
the boundary. A common implementation for this are support vector machines
|
relative to the boundary. Support vector machines (e.g. implemented by
|
||||||
(e.g. implemented by Song et al. \cite{Song2002}).
|
Song et al. \cite{Song2002}) are a common implementation of this.
|
||||||
|
|
||||||
Information-theoretic novelty detection computes the information content
|
Information-theoretic novelty detection computes the information content
|
||||||
of a data set, for example, with metrics like \gls{entropy}. Such metrics assume
|
of a data set, for example, with metrics like \gls{entropy}. Such metrics assume
|
||||||
that novel data inside the data set significantly alters the information
|
that novel data inside the data set significantly alters the information
|
||||||
content of an otherwise normal data set. First, the metrics are calculated over the
|
content of an otherwise normal data set. First, the metrics are calculated over the
|
||||||
whole data set. Afterwards, a subset is identified that causes the biggest
|
whole data set. Second, a subset is identified that causes the biggest
|
||||||
difference in the metric when removed from the data set. This subset is considered
|
difference in the metric when removed from the data set. This subset is considered
|
||||||
to consist of novel data. For example, Filippone and Sanguinetti \cite{Filippone2011} provide
|
to consist of novel data. For example, Filippone and Sanguinetti \cite{Filippone2011} provide
|
||||||
a recent approach.
|
a recent approach.
|
||||||
|
@ -214,7 +211,7 @@ a recent approach.
|
||||||
\subsection{Reconstruction-based Novelty Detection}
|
\subsection{Reconstruction-based Novelty Detection}
|
||||||
|
|
||||||
Reconstruction-based approaches use the reconstruction error in one form
|
Reconstruction-based approaches use the reconstruction error in one form
|
||||||
or another to calculate the novelty score. This can be auto-encoders that
|
or another to calculate the novelty score. These can be auto-encoders that
|
||||||
literally reconstruct the input but it also includes \gls{MLP} networks which try
|
literally reconstruct the input but it also includes \gls{MLP} networks which try
|
||||||
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiate
|
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiate
|
||||||
between neural network-based approaches and subspace methods. The first are
|
between neural network-based approaches and subspace methods. The first are
|
||||||
|
@ -242,7 +239,7 @@ Gal and Ghahramani~\cite{Gal2016} show that dropout training is a
|
||||||
Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
|
Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
|
||||||
shows that dropout training actually corresponds to a general approximate
|
shows that dropout training actually corresponds to a general approximate
|
||||||
Bayesian model. This means every network trained with dropout is an
|
Bayesian model. This means every network trained with dropout is an
|
||||||
approximate Bayesian model. During inference the dropout remains active,
|
approximate Bayesian model. During inference the dropout remains active:
|
||||||
this form of inference is called \gls{MCDO}.
|
this form of inference is called \gls{MCDO}.
|
||||||
Miller et al.~\cite{Miller2018} build upon the work of Gal and Ghahramani: they
|
Miller et al.~\cite{Miller2018} build upon the work of Gal and Ghahramani: they
|
||||||
use \gls{MCDO} under open-set conditions for object detection.
|
use \gls{MCDO} under open-set conditions for object detection.
|
||||||
|
@ -261,14 +258,13 @@ Consequently, this technique can be applied to any network that utilises
|
||||||
standard batch normalisation.
|
standard batch normalisation.
|
||||||
Li et al.~\cite{Li2019} investigate the problem of poor performance
|
Li et al.~\cite{Li2019} investigate the problem of poor performance
|
||||||
when combining dropout and batch normalisation: dropout shifts the variance
|
when combining dropout and batch normalisation: dropout shifts the variance
|
||||||
of a neural unit when switching from train to test, batch normalisation
|
of a neural unit when switching from train to test; batch normalisation
|
||||||
does not change the variance. This inconsistency leads to a variance shift which
|
does not change the variance. This inconsistency leads to a variance shift which
|
||||||
can have a larger or smaller impact based on the network used.
|
can have a larger or smaller impact based on the network used.
|
||||||
|
|
||||||
Non-Bayesian approaches have been developed as well. Usually, they compare with
|
Non-Bayesian approaches have also been developed. Usually they are compared with \gls{MCDO} and show better performance.
|
||||||
\gls{MCDO} and show better performance.
|
|
||||||
Postels et al.~\cite{Postels2019} provide a sampling-free approach for
|
Postels et al.~\cite{Postels2019} provide a sampling-free approach for
|
||||||
uncertainty estimation that does not affect training and approximates the
|
uncertainty estimation that does not affect training, and approximates the
|
||||||
sampling at test time. They compare it to \gls{MCDO} and find less computational
|
sampling at test time. They compare it to \gls{MCDO} and find less computational
|
||||||
overhead with better results.
|
overhead with better results.
|
||||||
Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
|
Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
|
||||||
|
@ -279,7 +275,7 @@ introduce an uncertainty estimation algorithm for non-Bayesian deep
|
||||||
neural classification that estimates the uncertainty of highly
|
neural classification that estimates the uncertainty of highly
|
||||||
confident points using earlier snapshots of the trained model and improves,
|
confident points using earlier snapshots of the trained model and improves,
|
||||||
among others, the approach introduced by Lakshminarayanan et al.
|
among others, the approach introduced by Lakshminarayanan et al.
|
||||||
Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
|
Sensoy et al.~\cite{Sensoy2018} explicitly model prediction uncertainty:
|
||||||
a \gls{Dirichlet distribution} is placed over the class probabilities. Consequently,
|
a \gls{Dirichlet distribution} is placed over the class probabilities. Consequently,
|
||||||
the predictions of a neural network are treated as subjective opinions.
|
the predictions of a neural network are treated as subjective opinions.
|
||||||
|
|
||||||
|
@ -348,21 +344,21 @@ training of the network determines a plausible set of weights by
|
||||||
evaluating the probability output (\gls{posterior}) over the weights given
|
evaluating the probability output (\gls{posterior}) over the weights given
|
||||||
the training data \(\mathbf{T}\): \(p(\mathbf{W}|\mathbf{T})\).
|
the training data \(\mathbf{T}\): \(p(\mathbf{W}|\mathbf{T})\).
|
||||||
However, this
|
However, this
|
||||||
evaluation cannot be performed in any reasonable
|
evaluation cannot be performed in any reasonable amount of
|
||||||
time. Therefore approximation techniques are
|
time. Therefore approximation techniques are
|
||||||
required. In those techniques the \gls{posterior} is fitted with a
|
required. In those techniques the \gls{posterior} is fitted with a
|
||||||
simple distribution \(q^{*}_{\theta}(\mathbf{W})\). The original
|
simple distribution \(q^{*}_{\theta}(\mathbf{W})\). The original
|
||||||
and intractable problem of averaging over all weights in the network
|
and intractable problem of averaging over all weights in the network
|
||||||
is replaced with an optimisation task, where the parameters of the
|
is replaced with an optimisation task: the parameters of the
|
||||||
simple distribution are optimised over~\cite{Kendall2017}.
|
simple distribution are optimised~\cite{Kendall2017}.
|
||||||
|
|
||||||
\subsubsection*{Dropout Variational Inference}
|
\subsubsection*{Dropout Variational Inference}
|
||||||
|
|
||||||
Kendall and Gal~\cite{Kendall2017} show an approximation for
|
Kendall and Gal~\cite{Kendall2017} show an approximation for
|
||||||
classfication and recognition tasks. Dropout variational inference
|
classfication and recognition tasks. Dropout variational inference
|
||||||
is a practical approximation technique by adding dropout layers
|
is a practical approximation technique by adding dropout layers
|
||||||
in front of every weight layer and using them also during test
|
in front of every weight layer and also using them during test
|
||||||
time to sample from the approximate \gls{posterior}. Effectively, this
|
time to sample from the approximate \gls{posterior}. In effect, this
|
||||||
results in the approximation of the class probability
|
results in the approximation of the class probability
|
||||||
\(p(y|\mathcal{I}, \mathbf{T})\) by performing \(n\) forward
|
\(p(y|\mathcal{I}, \mathbf{T})\) by performing \(n\) forward
|
||||||
passes through the network and averaging the so obtained softmax
|
passes through the network and averaging the so obtained softmax
|
||||||
|
@ -479,7 +475,7 @@ and very low confidences in other classes.
|
||||||
\subsection{Implementation Details}
|
\subsection{Implementation Details}
|
||||||
|
|
||||||
For this thesis, an \gls{SSD} implementation based on Tensorflow~\cite{Abadi2015} and
|
For this thesis, an \gls{SSD} implementation based on Tensorflow~\cite{Abadi2015} and
|
||||||
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
|
Keras~\cite{Chollet2015}
|
||||||
is used. It has been modified to support \gls{entropy} thresholding,
|
is used. It has been modified to support \gls{entropy} thresholding,
|
||||||
partitioning of observations, and dropout
|
partitioning of observations, and dropout
|
||||||
layers in the \gls{SSD} model. Entropy thresholding takes place before
|
layers in the \gls{SSD} model. Entropy thresholding takes place before
|
||||||
|
@ -517,7 +513,7 @@ confidence thresholding and a subsequent \gls{NMS}.
|
||||||
All boxes that pass \gls{NMS} are added to a
|
All boxes that pass \gls{NMS} are added to a
|
||||||
per image maxima list. One box could make the confidence threshold
|
per image maxima list. One box could make the confidence threshold
|
||||||
for multiple classes and, hence, be present multiple times in the
|
for multiple classes and, hence, be present multiple times in the
|
||||||
maxima list for the image. Lastly, a total of \(k\) boxes with the
|
maxima list for the image. In the end, a total of \(k\) boxes with the
|
||||||
highest confidences is kept per image across all classes. The
|
highest confidences is kept per image across all classes. The
|
||||||
original implementation uses a confidence threshold of \(0.01\), an
|
original implementation uses a confidence threshold of \(0.01\), an
|
||||||
IOU threshold for \gls{NMS} of \(0.45\) and a top \(k\)
|
IOU threshold for \gls{NMS} of \(0.45\) and a top \(k\)
|
||||||
|
@ -548,7 +544,7 @@ confidence threshold is required.
|
||||||
|
|
||||||
\subsection{Vanilla SSD with Entropy Thresholding}
|
\subsection{Vanilla SSD with Entropy Thresholding}
|
||||||
|
|
||||||
Vanilla \gls{SSD} with \gls{entropy} tresholding adds an additional component
|
Vanilla \gls{SSD} with \gls{entropy} thresholding adds an additional component
|
||||||
to the filtering already done for \gls{vanilla} \gls{SSD}. The \gls{entropy} is
|
to the filtering already done for \gls{vanilla} \gls{SSD}. The \gls{entropy} is
|
||||||
calculated from all \(\#nr\_classes\) softmax scores in a prediction.
|
calculated from all \(\#nr\_classes\) softmax scores in a prediction.
|
||||||
Only predictions with a low enough \gls{entropy} pass the \gls{entropy}
|
Only predictions with a low enough \gls{entropy} pass the \gls{entropy}
|
||||||
|
@ -558,8 +554,8 @@ false positive or false negative cases with high confidence values.
|
||||||
|
|
||||||
\subsection{Bayesian SSD with Entropy Thresholding}
|
\subsection{Bayesian SSD with Entropy Thresholding}
|
||||||
|
|
||||||
Bayesian \gls{SSD} has the speciality of multiple forward passes. Based
|
Bayesian \gls{SSD} uses multiple forward passes. Based
|
||||||
on the information in the paper, the detections of all forward passes
|
on the information from Miller et al.~\cite{Miller2018}, the detections of all forward passes
|
||||||
are grouped per image but not by forward pass. This leads
|
are grouped per image but not by forward pass. This leads
|
||||||
to the following shape of the network output after all
|
to the following shape of the network output after all
|
||||||
forward passes: \((batch\_size, \#nr\_boxes \, \cdot \, \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
|
forward passes: \((batch\_size, \#nr\_boxes \, \cdot \, \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
|
||||||
|
@ -576,7 +572,7 @@ mutual IOU score of every detection with all other detections. Detections
|
||||||
with a mutual IOU score of 0.95 or higher are partitioned into an
|
with a mutual IOU score of 0.95 or higher are partitioned into an
|
||||||
observation. Next, the softmax scores and bounding box coordinates of
|
observation. Next, the softmax scores and bounding box coordinates of
|
||||||
all detections in an observation are averaged.
|
all detections in an observation are averaged.
|
||||||
There can be a different number of observations for every image which
|
There can be a different number of observations for every image, which
|
||||||
destroys homogenity and prevents batch-wise calculation of the
|
destroys homogenity and prevents batch-wise calculation of the
|
||||||
results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
|
results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
|
||||||
|
|
||||||
|
@ -598,14 +594,14 @@ at the end.
|
||||||
|
|
||||||
\label{chap:experiments-results}
|
\label{chap:experiments-results}
|
||||||
|
|
||||||
This chapter explains the used data sets, how the experiments have been
|
This chapter explains the data sets used, and how the experiments have been
|
||||||
set up, and what the results are.
|
set up. Furthermore, it presents the results.
|
||||||
|
|
||||||
\section{Data Sets}
|
\section{Data Sets}
|
||||||
|
|
||||||
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
|
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
|
||||||
80 classes, their range is illustrated by two classes: airplanes and toothbrushes.
|
80 classes, their range is illustrated by two classes: airplanes and toothbrushes.
|
||||||
The images are taken by camera from the real world, ground truth
|
The images are real world images, ground truth
|
||||||
is provided for all images. The data set supports object detection,
|
is provided for all images. The data set supports object detection,
|
||||||
keypoint detection, and panoptic segmentation (scene segmentation).
|
keypoint detection, and panoptic segmentation (scene segmentation).
|
||||||
|
|
||||||
|
@ -779,7 +775,7 @@ The relation of \(F_1\) score to absolute open set error can be observed
|
||||||
in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants
|
in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants
|
||||||
can be seen in figure \ref{fig:precision-recall-micro}. Both \gls{vanilla} \gls{SSD}
|
can be seen in figure \ref{fig:precision-recall-micro}. Both \gls{vanilla} \gls{SSD}
|
||||||
variants with 0.01 confidence threshold reach a much higher open set error
|
variants with 0.01 confidence threshold reach a much higher open set error
|
||||||
and a higher recall. This behaviour is expected as more and worse predictions
|
and a higher recall. This behaviour is to be expected as more and worse predictions
|
||||||
are included.
|
are included.
|
||||||
All plotted variants show a similar behaviour that is in line with previously
|
All plotted variants show a similar behaviour that is in line with previously
|
||||||
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
||||||
|
@ -861,7 +857,7 @@ The relation of \(F_1\) score to absolute open set error can be observed
|
||||||
in figure \ref{fig:ose-f1-macro}. Precision-recall curves for all variants
|
in figure \ref{fig:ose-f1-macro}. Precision-recall curves for all variants
|
||||||
can be seen in figure \ref{fig:precision-recall-macro}. Both \gls{vanilla} \gls{SSD}
|
can be seen in figure \ref{fig:precision-recall-macro}. Both \gls{vanilla} \gls{SSD}
|
||||||
variants with 0.01 confidence threshold reach a much higher open set error
|
variants with 0.01 confidence threshold reach a much higher open set error
|
||||||
and a higher recall. This behaviour is expected as more and worse predictions
|
and a higher recall. This behaviour is to be expected as more and worse predictions
|
||||||
are included.
|
are included.
|
||||||
All plotted variants show a similar behaviour that is in line with previously
|
All plotted variants show a similar behaviour that is in line with previously
|
||||||
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
||||||
|
@ -878,9 +874,9 @@ only 0.7\% of the ground truth. With this share, it is below
|
||||||
the average of roughly 0.9\% for each of the 56 classes that make up the
|
the average of roughly 0.9\% for each of the 56 classes that make up the
|
||||||
second half of the ground truth.
|
second half of the ground truth.
|
||||||
|
|
||||||
In some cases, multiple variants have seemingly the same performance
|
In some cases, multiple variants have apparently the same performance
|
||||||
but only one or some of them are marked bold. This is informed by
|
but only one or some of them are marked bold. This is caused by
|
||||||
differences prior to rounding. If two or more variants are marked bold
|
differences prior to rounding: if two or more variants are marked bold
|
||||||
they had the exact same performance before rounding.
|
they had the exact same performance before rounding.
|
||||||
|
|
||||||
\begin{table}[tbp]
|
\begin{table}[tbp]
|
||||||
|
@ -909,11 +905,9 @@ they had the exact same performance before rounding.
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
|
The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
|
||||||
best in the persons class with a max \(F_1\) score of 0.460, as well as
|
best in the persons class: it has a max \(F_1\) score of 0.460, consisting of a recall of 0.405 and a precision of 0.533.
|
||||||
recall of 0.405 and precision of 0.533 at the max \(F_1\) score.
|
The variant shares the first place in recall with the \gls{vanilla} \gls{SSD}
|
||||||
It shares the first place in recall with the \gls{vanilla} \gls{SSD}
|
variant that uses a 0.01 confidence threshold. All Bayesian \gls{SSD} variants perform worse than the \gls{vanilla} \gls{SSD} variants (see table
|
||||||
variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants
|
|
||||||
perform worse than the \gls{vanilla} \gls{SSD} variants (see table
|
|
||||||
\ref{tab:results-persons}). With respect to the macro averaged result,
|
\ref{tab:results-persons}). With respect to the macro averaged result,
|
||||||
all variants perform better than the average of all classes.
|
all variants perform better than the average of all classes.
|
||||||
|
|
||||||
|
@ -951,7 +945,7 @@ variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep
|
||||||
ratio have a better precision (0.460 and 0.454 respectively) than the
|
ratio have a better precision (0.460 and 0.454 respectively) than the
|
||||||
\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
|
\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
|
||||||
0.453). With respect to the macro averaged result, all variants have
|
0.453). With respect to the macro averaged result, all variants have
|
||||||
a better precision than the average and the Bayesian variant without
|
a better precision than the average. The Bayesian variant without
|
||||||
\gls{NMS} and dropout also has a better recall and \(F_1\) score.
|
\gls{NMS} and dropout also has a better recall and \(F_1\) score.
|
||||||
|
|
||||||
\begin{table}[tbp]
|
\begin{table}[tbp]
|
||||||
|
@ -983,7 +977,7 @@ The best \(F_1\) score (0.288) and recall (0.251) for the chairs class
|
||||||
belongs to \gls{vanilla} \gls{SSD} with \gls{entropy} threshold. Precision
|
belongs to \gls{vanilla} \gls{SSD} with \gls{entropy} threshold. Precision
|
||||||
is mastered by Bayesian \gls{SSD} with \gls{NMS} and disabled dropout (0.360).
|
is mastered by Bayesian \gls{SSD} with \gls{NMS} and disabled dropout (0.360).
|
||||||
The variant with 0.9 keep ratio has the second-highest precision (0.343)
|
The variant with 0.9 keep ratio has the second-highest precision (0.343)
|
||||||
of all variants. Both in \(F_1\) score and recall all Bayesian variants
|
of all variants. Both in \(F_1\) score and recall, all Bayesian variants
|
||||||
are worse than the \gls{vanilla} variants. Compared with the macro averaged
|
are worse than the \gls{vanilla} variants. Compared with the macro averaged
|
||||||
results, all variants perform worse than the average.
|
results, all variants perform worse than the average.
|
||||||
|
|
||||||
|
@ -1077,7 +1071,7 @@ ratio.
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
The ground truth only contains a stop sign and a truck. The differences between \gls{vanilla} \gls{SSD} and Bayesian \gls{SSD} are almost not visible
|
The ground truth only contains a stop sign and a truck. The differences between \gls{vanilla} \gls{SSD} and Bayesian \gls{SSD} are almost not visible
|
||||||
(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian \gls{SSD}, instead both detected a pottet plant and a traffic light. The stop sign is detected by both variants.
|
(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian \gls{SSD}, instead both detected a "potted plant" and a traffic light. The stop sign is detected by both variants.
|
||||||
This behaviour implies problems with detecting objects at the edge
|
This behaviour implies problems with detecting objects at the edge
|
||||||
that overwhelmingly lie outside the image frame. Furthermore, the predictions are usually identical.
|
that overwhelmingly lie outside the image frame. Furthermore, the predictions are usually identical.
|
||||||
|
|
||||||
|
@ -1095,9 +1089,11 @@ that overwhelmingly lie outside the image frame. Furthermore, the predictions ar
|
||||||
\end{minipage}
|
\end{minipage}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Another example (see figures \ref{fig:cat-laptop-vanilla} and \ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background on the right
|
Another example (see figures \ref{fig:cat-laptop-vanilla} and
|
||||||
side. Both variants detect a cat but the \gls{vanilla} variant detects a dog as well. The laptop and TV are not detected but this is expected since
|
\ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background
|
||||||
these classes have not been trained.
|
on the right side. Both variants detect a cat but the \gls{vanilla}
|
||||||
|
variant detects a dog as well. The laptop and TV are not detected but this
|
||||||
|
is to be expected since these classes have not been trained.
|
||||||
|
|
||||||
\chapter{Discussion and Outlook}
|
\chapter{Discussion and Outlook}
|
||||||
|
|
||||||
|
@ -1153,7 +1149,7 @@ open set error continues to rise a bit.
|
||||||
There is no visible impact of \gls{entropy} thresholding on the object detection
|
There is no visible impact of \gls{entropy} thresholding on the object detection
|
||||||
performance for \gls{vanilla} \gls{SSD}. This indicates that the network has almost no
|
performance for \gls{vanilla} \gls{SSD}. This indicates that the network has almost no
|
||||||
uniform or close to uniform predictions, the vast majority of predictions
|
uniform or close to uniform predictions, the vast majority of predictions
|
||||||
has a high confidence in one class---including the background.
|
have a high confidence in one class---including the background.
|
||||||
However, the \gls{entropy} plays a larger role for the Bayesian variants---as
|
However, the \gls{entropy} plays a larger role for the Bayesian variants---as
|
||||||
expected: the best performing thresholds are 1.0, 1.3, and 1.4 for micro averaging,
|
expected: the best performing thresholds are 1.0, 1.3, and 1.4 for micro averaging,
|
||||||
and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
|
and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
|
||||||
|
@ -1190,7 +1186,7 @@ threshold indicates a worse performance.
|
||||||
|
|
||||||
Miller et al.~\cite{Miller2018} supposedly do not use \gls{NMS}
|
Miller et al.~\cite{Miller2018} supposedly do not use \gls{NMS}
|
||||||
in their implementation of dropout sampling. Therefore, a variant with disabled \glslocalreset{NMS}
|
in their implementation of dropout sampling. Therefore, a variant with disabled \glslocalreset{NMS}
|
||||||
\gls{NMS} has been tested. The results are somewhat expected:
|
\gls{NMS} has been tested. The results are somewhat as expected:
|
||||||
\gls{NMS} removes all non-maximum detections that overlap
|
\gls{NMS} removes all non-maximum detections that overlap
|
||||||
with a maximum one. This reduces the number of multiple detections per
|
with a maximum one. This reduces the number of multiple detections per
|
||||||
ground truth bounding box and therefore the false positives. Without it,
|
ground truth bounding box and therefore the false positives. Without it,
|
||||||
|
@ -1208,7 +1204,7 @@ more than 50\% of the original observations are removed with \gls{NMS} and
|
||||||
stay without---all of these are very likely to be false positives.
|
stay without---all of these are very likely to be false positives.
|
||||||
|
|
||||||
A clear distinction between micro and macro averaging can be observed:
|
A clear distinction between micro and macro averaging can be observed:
|
||||||
recall is hardly effected with micro averaging (0.300) but goes down equally with macro averaging (0.229). For micro averaging, it does
|
recall is hardly affected with micro averaging (0.300) but goes down noticeably with macro averaging (0.229). For micro averaging, it does
|
||||||
not matter which class the true positives belong to: every detection
|
not matter which class the true positives belong to: every detection
|
||||||
counts the same way. This also means that top \(k\) will have only
|
counts the same way. This also means that top \(k\) will have only
|
||||||
a marginal effect: some true positives might be removed without \gls{NMS} but overall that does not have a big impact. With macro averaging, however,
|
a marginal effect: some true positives might be removed without \gls{NMS} but overall that does not have a big impact. With macro averaging, however,
|
||||||
|
@ -1256,7 +1252,7 @@ recall.
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
The dropout variants have largely worse performance than the Bayesian variants
|
The dropout variants have largely worse performance than the Bayesian variants
|
||||||
without dropout. This is expected as the network was not trained with
|
without dropout. This is to be expected as the network was not trained with
|
||||||
dropout and the weights are not prepared for it.
|
dropout and the weights are not prepared for it.
|
||||||
|
|
||||||
Gal~\cite{Gal2017}
|
Gal~\cite{Gal2017}
|
||||||
|
@ -1282,7 +1278,7 @@ more than 430 million detections remain (see table \ref{tab:effect-dropout} for
|
||||||
has slightly fewer predictions left compared to the one without dropout.
|
has slightly fewer predictions left compared to the one without dropout.
|
||||||
|
|
||||||
After the grouping, the variant without dropout has on average between
|
After the grouping, the variant without dropout has on average between
|
||||||
10 and 11 detections grouped into an observation. This is expected as every
|
10 and 11 detections grouped into an observation. This is to be expected as every
|
||||||
forward pass creates the exact same result and these ten identical detections
|
forward pass creates the exact same result and these ten identical detections
|
||||||
per \gls{vanilla} \gls{SSD} detection perfectly overlap. The fact that slightly more than
|
per \gls{vanilla} \gls{SSD} detection perfectly overlap. The fact that slightly more than
|
||||||
ten detections are grouped together could explain the marginally better precision
|
ten detections are grouped together could explain the marginally better precision
|
||||||
|
@ -1316,5 +1312,4 @@ networks.
|
||||||
|
|
||||||
To facilitate future work based on this thesis, the source code will be
|
To facilitate future work based on this thesis, the source code will be
|
||||||
made available and an installable Python package will be uploaded to the
|
made available and an installable Python package will be uploaded to the
|
||||||
PyPi package index. In the appendices can be found more details about the
|
PyPi package index. More details about the source code implementation and additional figures can be found in the appendices.
|
||||||
source code implementation as well as more figures.
|
|
||||||
|
|
9
ma.bib
9
ma.bib
|
@ -909,4 +909,13 @@ to construct explicit models for non-normal classes. Application includes infere
|
||||||
timestamp = {2019.09.09},
|
timestamp = {2019.09.09},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Misc{Chollet2015,
|
||||||
|
author = {Chollet, Fran\c{c}ois and others},
|
||||||
|
title = {Keras},
|
||||||
|
year = {2015},
|
||||||
|
howpublished = {\url{https://keras.io}},
|
||||||
|
owner = {jim},
|
||||||
|
timestamp = {2019.10.04},
|
||||||
|
}
|
||||||
|
|
||||||
@Comment{jabref-meta: databaseType:biblatex;}
|
@Comment{jabref-meta: databaseType:biblatex;}
|
||||||
|
|
Loading…
Reference in New Issue