Finished raw version of related works
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
165
body.tex
165
body.tex
@ -175,68 +175,131 @@ be explained.
|
|||||||
|
|
||||||
\section{Related Works}
|
\section{Related Works}
|
||||||
|
|
||||||
|
The task of novelty detection can be accomplished in a variety of ways.
|
||||||
|
Pimentel et al.~\cite{Pimentel2014} provide a review of novelty detection
|
||||||
|
methods published over the previous decade. They showcase probabilistic,
|
||||||
|
distance-based, reconstruction-based, domain-based, and information-theoretic
|
||||||
|
novelty detection. Based on their categorisation, this thesis falls under
|
||||||
|
reconstruction-based novelty detection as it deals only with neural network
|
||||||
|
approaches. Therefore, the other types of novelty detection will only be
|
||||||
|
briefly introduced.
|
||||||
|
|
||||||
|
\subsection{Overview over types of novelty detection}
|
||||||
|
|
||||||
|
Probabilistic approaches estimate the generative probability density function (pdf)
|
||||||
|
of the data. It is assumed that the training data is generated from an underlying
|
||||||
|
probability distribution \(D\). This distribution can be estimated with the
|
||||||
|
training data, the estimate is defined as \(\hat D\) and represents a model
|
||||||
|
of normality. A novelty threshold is applied to \(\hat D\) in a way that
|
||||||
|
allows a probabilistic interpretation. Pidhorskyi et al.~\cite{Pidhorskyi2018}
|
||||||
|
combine a probabilistic approach to novelty detection with auto-encoders.
|
||||||
|
|
||||||
|
Distance-based novelty detection uses either nearest neighbour-based approaches
|
||||||
|
(e.g. ) %TODO citations)
|
||||||
|
or clustering-based approaches
|
||||||
|
(e.g. ). % TODO citations
|
||||||
|
Both methods are similar to estimating the
|
||||||
|
pdf of data, they use well-defined distance metrics to compute the distance
|
||||||
|
between two data points.
|
||||||
|
|
||||||
|
Domain-based novelty detection describes the boundary of the known data, rather
|
||||||
|
than the data itself. Unknown data is identified by its position relative to
|
||||||
|
the boundary. A common implementation for this are support vector machines
|
||||||
|
(e.g. implemented by ). % TODO citations
|
||||||
|
|
||||||
|
Information-theoretic novelty detection computes the information content
|
||||||
|
of a data set, for example, with metrics like entropy. Such metrics assume
|
||||||
|
that novel data inside the data set significantly alters the information
|
||||||
|
content of an otherwise normal data set. First, the metrics are calculated over the
|
||||||
|
whole data set. Afterwards, a subset is identified that causes the biggest
|
||||||
|
difference in the metric when removed from the data set. This subset is considered
|
||||||
|
to consist of novel data. For example, xyz provide a recent approach.
|
||||||
|
% TODO citations
|
||||||
|
|
||||||
|
\subsection{Reconstruction-based novelty detection}
|
||||||
|
|
||||||
|
Reconstruction-based approaches use the reconstruction error in one form
|
||||||
|
or another to calculate the novelty score. This can be auto-encoders that
|
||||||
|
literally reconstruct the input but it also includes MLP networks which try
|
||||||
|
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiated
|
||||||
|
between neural network-based approaches and subspace methods. The first were
|
||||||
|
further differentiated between MLPs, Hopfield networks, autoassociative networks,
|
||||||
|
radial basis function, and self-organising networks.
|
||||||
|
The remainder of this section focuses on MLP-based works, a particular focus will
|
||||||
|
be on the task of object detection and Bayesian networks.
|
||||||
|
|
||||||
Novelty detection for object detection is intricately linked with
|
Novelty detection for object detection is intricately linked with
|
||||||
open set conditions: the test data can contain unknown classes.
|
open set conditions: the test data can contain unknown classes.
|
||||||
Bishop~\cite{Bishop1994} investigates the correlation between
|
Bishop~\cite{Bishop1994} investigated the correlation between
|
||||||
the degree of novel input data and the reliability of network
|
the degree of novel input data and the reliability of network
|
||||||
outputs. Pimentel et al.~\cite{Pimentel2014} provide a review
|
outputs.
|
||||||
of novelty detection methods published over the previous decade.
|
|
||||||
|
|
||||||
There are two primary pathways that deal with novelty: novelty
|
The Bayesian approach provides a theoretical foundation for
|
||||||
detection using auto-encoders and uncertainty estimation with
|
modelling uncertainty \cite{Ghahramani2015}.
|
||||||
bayesian networks.
|
MacKay~\cite{MacKay1992} provided a practical Bayesian
|
||||||
|
framework for backpropagation networks. Neal~\cite{Neal1996} built upon
|
||||||
|
the work of MacKay and explored Bayesian learning for neural networks.
|
||||||
|
However, these Bayesian neural networks do not scale well. Over the course
|
||||||
|
of time, two major Bayesian approximations were introduced: one based
|
||||||
|
on dropout and one based on batch normalisation.
|
||||||
|
|
||||||
Japkowicz et al.~\cite{Japkowicz1995} introduce a novelty detection
|
Gal and Ghahramani~\cite{Gal2016} showed that dropout training is a
|
||||||
method based on the hippocampus of Gluck and Meyers~\cite{Gluck1993}
|
Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
|
||||||
and use an auto-encoder to recognize novel instances.
|
showed that dropout training actually corresponds to a general approximate
|
||||||
Thompson et al.~\cite{Thompson2002} show that auto-encoders
|
Bayesian model. This means every network trained with dropout is an
|
||||||
can learn "normal" system behaviour implicitly.
|
approximate Bayesian model. During inference the dropout remains active,
|
||||||
Goodfellow et al.~\cite{Goodfellow2014} introduce adversarial
|
this form of inference is called Monte Carlo Dropout (MCDO).
|
||||||
networks: a generator that attempts to trick the discriminator
|
Miller et al.~\cite{Miller2018} built upon the work of Gal and Ghahramani: they
|
||||||
by generating samples indistinguishable from the real data.
|
use MC dropout under open-set conditions for object detection.
|
||||||
Makhzani et al.~\cite{Makhzani2015} build on the work of Goodfellow
|
In a second paper \cite{Miller2018a}, Miller et al. continued their work and
|
||||||
and propose adversarial auto-encoders. Richter and
|
compared merging strategies for sampling-based uncertainty techniques in
|
||||||
Roy~\cite{Richter2017} use an auto-encoder to detect novelty.
|
object detection.
|
||||||
|
|
||||||
Wang et al.~\cite{Wang2018} base upon Goodfellow's work and
|
Teye et al.~\cite{Teye2018} make the point that most modern networks have
|
||||||
use a generative adversarial network for novelty detection.
|
adopted other regularisation techniques. Ioffe and Szeged~\cite{Ioffe2015}
|
||||||
Sabokrou et al.~\cite{Sabokrou2018} implement an end-to-end
|
introduced batch normalisation which has been adapted widely. Teye et al.
|
||||||
architecture for one-class classification: it consists of two
|
showed how batch normalisation training is similar to dropout and can be
|
||||||
deep networks, with one being the novelty detector and the other
|
viewed as an approximate Bayesian inference. Estimates of the model uncertainty
|
||||||
enhancing inliers and distorting outliers.
|
can be gained with a technique named Monte Carlo Batch Normalisation (MCBN).
|
||||||
Pidhorskyi et al.~\cite{Pidhorskyi2018} take a probabilistic approach
|
Consequently, this technique can be applied to any network that utilises
|
||||||
and compute how likely it is that a sample is generated by the
|
standard batch normalisation.
|
||||||
inlier distribution.
|
Li et al.~\cite{Li2019} investigated the problem of poor performance
|
||||||
|
when combining dropout and batch normalisation: Dropout shifts the variance
|
||||||
|
of a neural unit when switching from train to test, batch normalisation
|
||||||
|
does not change the variance. This inconsistency leads to a variance shift which
|
||||||
|
can have a larger or smaller impact based on the network used. For example,
|
||||||
|
adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like
|
||||||
|
Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses
|
||||||
|
batch normalisation.
|
||||||
|
|
||||||
Kendall and Gal~\cite{Kendall2017} provide a Bayesian deep learning
|
Non-Bayesian approaches have been developed as well. Usually, they compare with
|
||||||
framework that combines input-dependent
|
MC dropout and show better performance.
|
||||||
aleatoric\footnote{captures noise inherent in observations}
|
Postels et al.~\cite{Postels2019} provided a sampling-free approach for
|
||||||
uncertainty with epistemic\footnote{uncertainty in the model}
|
uncertainty estimation that does not affect training and approximates the
|
||||||
uncertainty. Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
|
sampling on test time. They compared it to MC dropout and found less computational
|
||||||
implement a predictive uncertainty estimation using deep ensembles
|
overhead with better results.
|
||||||
rather than Bayesian networks. Geifman et al.~\cite{Geifman2018}
|
Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
|
||||||
introduce an uncertainty estimation algorithm for non-Bayesian deep
|
implemented a predictive uncertainty estimation using deep ensembles.
|
||||||
|
Compared to MC dropout, it showed better results.
|
||||||
|
Geifman et al.~\cite{Geifman2018}
|
||||||
|
introduced an uncertainty estimation algorithm for non-Bayesian deep
|
||||||
neural classification that estimates the uncertainty of highly
|
neural classification that estimates the uncertainty of highly
|
||||||
confident points using earlier snapshots of the trained model.
|
confident points using earlier snapshots of the trained model and improves,
|
||||||
Miller et al.~\cite{Miller2018a} compare merging strategies
|
among others, the approach introduced by Lakshminarayanan et al.
|
||||||
for sampling-based uncertainty techniques in object detection.
|
Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
|
||||||
Sensoy et al.~\cite{Sensoy2018} treat prediction confidence
|
a Dirichlet distribution is placed over the class probabilities. Consequently,
|
||||||
as subjective opinions: they place a Dirichlet distribution on it.
|
the predictions of a neural network are treated as subjective opinions.
|
||||||
The trained predictor for a multi-class classification is also a
|
|
||||||
Dirichlet distribution.
|
|
||||||
|
|
||||||
Gal and Ghahramani~\cite{Gal2016} show how dropout can be used
|
In addition to the aforementioned Bayesian and non-Bayesian works,
|
||||||
as a Bayesian approximation. Miller et al.~\cite{Miller2018}
|
there are some Bayesian works that do not quite fit with the rest but
|
||||||
build upon the work of Miller et al.~\cite{Miller2018a} and
|
are important as well. Mukhoti and Gal~\cite{Mukhoti2018}
|
||||||
Gal and Ghahramani: they use dropout sampling under open-set
|
contributed metrics to measure uncertainty for semantic
|
||||||
conditions for object detection. Mukhoti and Gal~\cite{Mukhoti2018}
|
segmentation. Wu et al.~\cite{Wu2019} introduced two innovations
|
||||||
contribute metrics to measure uncertainty for semantic
|
|
||||||
segmentation. Wu et al.~\cite{Wu2019} introduce two innovations
|
|
||||||
that turn variational Bayes into a robust tool for Bayesian
|
that turn variational Bayes into a robust tool for Bayesian
|
||||||
networks: they introduce a novel deterministic method to approximate
|
networks: a novel deterministic method to approximate
|
||||||
moments in neural networks which eliminates gradient variance, and
|
moments in neural networks which eliminates gradient variance, and
|
||||||
they introduce a hierarchical prior for parameters and an
|
a hierarchical prior for parameters and an empirical Bayes procedure to select
|
||||||
Empirical Bayes procedure to select prior variances.
|
prior variances.
|
||||||
|
|
||||||
\section{Background for Bayesian SSD}
|
\section{Background for Bayesian SSD}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user