Finished raw version of related works

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-08-27 13:57:51 +02:00
parent b53b3673bf
commit a98f9e8d55

165
body.tex
View File

@ -175,68 +175,131 @@ be explained.
\section{Related Works} \section{Related Works}
The task of novelty detection can be accomplished in a variety of ways.
Pimentel et al.~\cite{Pimentel2014} provide a review of novelty detection
methods published over the previous decade. They showcase probabilistic,
distance-based, reconstruction-based, domain-based, and information-theoretic
novelty detection. Based on their categorisation, this thesis falls under
reconstruction-based novelty detection as it deals only with neural network
approaches. Therefore, the other types of novelty detection will only be
briefly introduced.
\subsection{Overview over types of novelty detection}
Probabilistic approaches estimate the generative probability density function (pdf)
of the data. It is assumed that the training data is generated from an underlying
probability distribution \(D\). This distribution can be estimated with the
training data, the estimate is defined as \(\hat D\) and represents a model
of normality. A novelty threshold is applied to \(\hat D\) in a way that
allows a probabilistic interpretation. Pidhorskyi et al.~\cite{Pidhorskyi2018}
combine a probabilistic approach to novelty detection with auto-encoders.
Distance-based novelty detection uses either nearest neighbour-based approaches
(e.g. ) %TODO citations)
or clustering-based approaches
(e.g. ). % TODO citations
Both methods are similar to estimating the
pdf of data, they use well-defined distance metrics to compute the distance
between two data points.
Domain-based novelty detection describes the boundary of the known data, rather
than the data itself. Unknown data is identified by its position relative to
the boundary. A common implementation for this are support vector machines
(e.g. implemented by ). % TODO citations
Information-theoretic novelty detection computes the information content
of a data set, for example, with metrics like entropy. Such metrics assume
that novel data inside the data set significantly alters the information
content of an otherwise normal data set. First, the metrics are calculated over the
whole data set. Afterwards, a subset is identified that causes the biggest
difference in the metric when removed from the data set. This subset is considered
to consist of novel data. For example, xyz provide a recent approach.
% TODO citations
\subsection{Reconstruction-based novelty detection}
Reconstruction-based approaches use the reconstruction error in one form
or another to calculate the novelty score. This can be auto-encoders that
literally reconstruct the input but it also includes MLP networks which try
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiated
between neural network-based approaches and subspace methods. The first were
further differentiated between MLPs, Hopfield networks, autoassociative networks,
radial basis function, and self-organising networks.
The remainder of this section focuses on MLP-based works, a particular focus will
be on the task of object detection and Bayesian networks.
Novelty detection for object detection is intricately linked with Novelty detection for object detection is intricately linked with
open set conditions: the test data can contain unknown classes. open set conditions: the test data can contain unknown classes.
Bishop~\cite{Bishop1994} investigates the correlation between Bishop~\cite{Bishop1994} investigated the correlation between
the degree of novel input data and the reliability of network the degree of novel input data and the reliability of network
outputs. Pimentel et al.~\cite{Pimentel2014} provide a review outputs.
of novelty detection methods published over the previous decade.
There are two primary pathways that deal with novelty: novelty The Bayesian approach provides a theoretical foundation for
detection using auto-encoders and uncertainty estimation with modelling uncertainty \cite{Ghahramani2015}.
bayesian networks. MacKay~\cite{MacKay1992} provided a practical Bayesian
framework for backpropagation networks. Neal~\cite{Neal1996} built upon
the work of MacKay and explored Bayesian learning for neural networks.
However, these Bayesian neural networks do not scale well. Over the course
of time, two major Bayesian approximations were introduced: one based
on dropout and one based on batch normalisation.
Japkowicz et al.~\cite{Japkowicz1995} introduce a novelty detection Gal and Ghahramani~\cite{Gal2016} showed that dropout training is a
method based on the hippocampus of Gluck and Meyers~\cite{Gluck1993} Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
and use an auto-encoder to recognize novel instances. showed that dropout training actually corresponds to a general approximate
Thompson et al.~\cite{Thompson2002} show that auto-encoders Bayesian model. This means every network trained with dropout is an
can learn "normal" system behaviour implicitly. approximate Bayesian model. During inference the dropout remains active,
Goodfellow et al.~\cite{Goodfellow2014} introduce adversarial this form of inference is called Monte Carlo Dropout (MCDO).
networks: a generator that attempts to trick the discriminator Miller et al.~\cite{Miller2018} built upon the work of Gal and Ghahramani: they
by generating samples indistinguishable from the real data. use MC dropout under open-set conditions for object detection.
Makhzani et al.~\cite{Makhzani2015} build on the work of Goodfellow In a second paper \cite{Miller2018a}, Miller et al. continued their work and
and propose adversarial auto-encoders. Richter and compared merging strategies for sampling-based uncertainty techniques in
Roy~\cite{Richter2017} use an auto-encoder to detect novelty. object detection.
Wang et al.~\cite{Wang2018} base upon Goodfellow's work and Teye et al.~\cite{Teye2018} make the point that most modern networks have
use a generative adversarial network for novelty detection. adopted other regularisation techniques. Ioffe and Szeged~\cite{Ioffe2015}
Sabokrou et al.~\cite{Sabokrou2018} implement an end-to-end introduced batch normalisation which has been adapted widely. Teye et al.
architecture for one-class classification: it consists of two showed how batch normalisation training is similar to dropout and can be
deep networks, with one being the novelty detector and the other viewed as an approximate Bayesian inference. Estimates of the model uncertainty
enhancing inliers and distorting outliers. can be gained with a technique named Monte Carlo Batch Normalisation (MCBN).
Pidhorskyi et al.~\cite{Pidhorskyi2018} take a probabilistic approach Consequently, this technique can be applied to any network that utilises
and compute how likely it is that a sample is generated by the standard batch normalisation.
inlier distribution. Li et al.~\cite{Li2019} investigated the problem of poor performance
when combining dropout and batch normalisation: Dropout shifts the variance
of a neural unit when switching from train to test, batch normalisation
does not change the variance. This inconsistency leads to a variance shift which
can have a larger or smaller impact based on the network used. For example,
adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like
Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses
batch normalisation.
Kendall and Gal~\cite{Kendall2017} provide a Bayesian deep learning Non-Bayesian approaches have been developed as well. Usually, they compare with
framework that combines input-dependent MC dropout and show better performance.
aleatoric\footnote{captures noise inherent in observations} Postels et al.~\cite{Postels2019} provided a sampling-free approach for
uncertainty with epistemic\footnote{uncertainty in the model} uncertainty estimation that does not affect training and approximates the
uncertainty. Lakshminarayanan et al.~\cite{Lakshminarayanan2017} sampling on test time. They compared it to MC dropout and found less computational
implement a predictive uncertainty estimation using deep ensembles overhead with better results.
rather than Bayesian networks. Geifman et al.~\cite{Geifman2018} Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
introduce an uncertainty estimation algorithm for non-Bayesian deep implemented a predictive uncertainty estimation using deep ensembles.
Compared to MC dropout, it showed better results.
Geifman et al.~\cite{Geifman2018}
introduced an uncertainty estimation algorithm for non-Bayesian deep
neural classification that estimates the uncertainty of highly neural classification that estimates the uncertainty of highly
confident points using earlier snapshots of the trained model. confident points using earlier snapshots of the trained model and improves,
Miller et al.~\cite{Miller2018a} compare merging strategies among others, the approach introduced by Lakshminarayanan et al.
for sampling-based uncertainty techniques in object detection. Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
Sensoy et al.~\cite{Sensoy2018} treat prediction confidence a Dirichlet distribution is placed over the class probabilities. Consequently,
as subjective opinions: they place a Dirichlet distribution on it. the predictions of a neural network are treated as subjective opinions.
The trained predictor for a multi-class classification is also a
Dirichlet distribution.
Gal and Ghahramani~\cite{Gal2016} show how dropout can be used In addition to the aforementioned Bayesian and non-Bayesian works,
as a Bayesian approximation. Miller et al.~\cite{Miller2018} there are some Bayesian works that do not quite fit with the rest but
build upon the work of Miller et al.~\cite{Miller2018a} and are important as well. Mukhoti and Gal~\cite{Mukhoti2018}
Gal and Ghahramani: they use dropout sampling under open-set contributed metrics to measure uncertainty for semantic
conditions for object detection. Mukhoti and Gal~\cite{Mukhoti2018} segmentation. Wu et al.~\cite{Wu2019} introduced two innovations
contribute metrics to measure uncertainty for semantic
segmentation. Wu et al.~\cite{Wu2019} introduce two innovations
that turn variational Bayes into a robust tool for Bayesian that turn variational Bayes into a robust tool for Bayesian
networks: they introduce a novel deterministic method to approximate networks: a novel deterministic method to approximate
moments in neural networks which eliminates gradient variance, and moments in neural networks which eliminates gradient variance, and
they introduce a hierarchical prior for parameters and an a hierarchical prior for parameters and an empirical Bayes procedure to select
Empirical Bayes procedure to select prior variances. prior variances.
\section{Background for Bayesian SSD} \section{Background for Bayesian SSD}