Finished raw version of related works
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
b53b3673bf
commit
a98f9e8d55
165
body.tex
165
body.tex
|
@ -175,68 +175,131 @@ be explained.
|
|||
|
||||
\section{Related Works}
|
||||
|
||||
The task of novelty detection can be accomplished in a variety of ways.
|
||||
Pimentel et al.~\cite{Pimentel2014} provide a review of novelty detection
|
||||
methods published over the previous decade. They showcase probabilistic,
|
||||
distance-based, reconstruction-based, domain-based, and information-theoretic
|
||||
novelty detection. Based on their categorisation, this thesis falls under
|
||||
reconstruction-based novelty detection as it deals only with neural network
|
||||
approaches. Therefore, the other types of novelty detection will only be
|
||||
briefly introduced.
|
||||
|
||||
\subsection{Overview over types of novelty detection}
|
||||
|
||||
Probabilistic approaches estimate the generative probability density function (pdf)
|
||||
of the data. It is assumed that the training data is generated from an underlying
|
||||
probability distribution \(D\). This distribution can be estimated with the
|
||||
training data, the estimate is defined as \(\hat D\) and represents a model
|
||||
of normality. A novelty threshold is applied to \(\hat D\) in a way that
|
||||
allows a probabilistic interpretation. Pidhorskyi et al.~\cite{Pidhorskyi2018}
|
||||
combine a probabilistic approach to novelty detection with auto-encoders.
|
||||
|
||||
Distance-based novelty detection uses either nearest neighbour-based approaches
|
||||
(e.g. ) %TODO citations)
|
||||
or clustering-based approaches
|
||||
(e.g. ). % TODO citations
|
||||
Both methods are similar to estimating the
|
||||
pdf of data, they use well-defined distance metrics to compute the distance
|
||||
between two data points.
|
||||
|
||||
Domain-based novelty detection describes the boundary of the known data, rather
|
||||
than the data itself. Unknown data is identified by its position relative to
|
||||
the boundary. A common implementation for this are support vector machines
|
||||
(e.g. implemented by ). % TODO citations
|
||||
|
||||
Information-theoretic novelty detection computes the information content
|
||||
of a data set, for example, with metrics like entropy. Such metrics assume
|
||||
that novel data inside the data set significantly alters the information
|
||||
content of an otherwise normal data set. First, the metrics are calculated over the
|
||||
whole data set. Afterwards, a subset is identified that causes the biggest
|
||||
difference in the metric when removed from the data set. This subset is considered
|
||||
to consist of novel data. For example, xyz provide a recent approach.
|
||||
% TODO citations
|
||||
|
||||
\subsection{Reconstruction-based novelty detection}
|
||||
|
||||
Reconstruction-based approaches use the reconstruction error in one form
|
||||
or another to calculate the novelty score. This can be auto-encoders that
|
||||
literally reconstruct the input but it also includes MLP networks which try
|
||||
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiated
|
||||
between neural network-based approaches and subspace methods. The first were
|
||||
further differentiated between MLPs, Hopfield networks, autoassociative networks,
|
||||
radial basis function, and self-organising networks.
|
||||
The remainder of this section focuses on MLP-based works, a particular focus will
|
||||
be on the task of object detection and Bayesian networks.
|
||||
|
||||
Novelty detection for object detection is intricately linked with
|
||||
open set conditions: the test data can contain unknown classes.
|
||||
Bishop~\cite{Bishop1994} investigates the correlation between
|
||||
Bishop~\cite{Bishop1994} investigated the correlation between
|
||||
the degree of novel input data and the reliability of network
|
||||
outputs. Pimentel et al.~\cite{Pimentel2014} provide a review
|
||||
of novelty detection methods published over the previous decade.
|
||||
outputs.
|
||||
|
||||
There are two primary pathways that deal with novelty: novelty
|
||||
detection using auto-encoders and uncertainty estimation with
|
||||
bayesian networks.
|
||||
The Bayesian approach provides a theoretical foundation for
|
||||
modelling uncertainty \cite{Ghahramani2015}.
|
||||
MacKay~\cite{MacKay1992} provided a practical Bayesian
|
||||
framework for backpropagation networks. Neal~\cite{Neal1996} built upon
|
||||
the work of MacKay and explored Bayesian learning for neural networks.
|
||||
However, these Bayesian neural networks do not scale well. Over the course
|
||||
of time, two major Bayesian approximations were introduced: one based
|
||||
on dropout and one based on batch normalisation.
|
||||
|
||||
Japkowicz et al.~\cite{Japkowicz1995} introduce a novelty detection
|
||||
method based on the hippocampus of Gluck and Meyers~\cite{Gluck1993}
|
||||
and use an auto-encoder to recognize novel instances.
|
||||
Thompson et al.~\cite{Thompson2002} show that auto-encoders
|
||||
can learn "normal" system behaviour implicitly.
|
||||
Goodfellow et al.~\cite{Goodfellow2014} introduce adversarial
|
||||
networks: a generator that attempts to trick the discriminator
|
||||
by generating samples indistinguishable from the real data.
|
||||
Makhzani et al.~\cite{Makhzani2015} build on the work of Goodfellow
|
||||
and propose adversarial auto-encoders. Richter and
|
||||
Roy~\cite{Richter2017} use an auto-encoder to detect novelty.
|
||||
Gal and Ghahramani~\cite{Gal2016} showed that dropout training is a
|
||||
Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
|
||||
showed that dropout training actually corresponds to a general approximate
|
||||
Bayesian model. This means every network trained with dropout is an
|
||||
approximate Bayesian model. During inference the dropout remains active,
|
||||
this form of inference is called Monte Carlo Dropout (MCDO).
|
||||
Miller et al.~\cite{Miller2018} built upon the work of Gal and Ghahramani: they
|
||||
use MC dropout under open-set conditions for object detection.
|
||||
In a second paper \cite{Miller2018a}, Miller et al. continued their work and
|
||||
compared merging strategies for sampling-based uncertainty techniques in
|
||||
object detection.
|
||||
|
||||
Wang et al.~\cite{Wang2018} base upon Goodfellow's work and
|
||||
use a generative adversarial network for novelty detection.
|
||||
Sabokrou et al.~\cite{Sabokrou2018} implement an end-to-end
|
||||
architecture for one-class classification: it consists of two
|
||||
deep networks, with one being the novelty detector and the other
|
||||
enhancing inliers and distorting outliers.
|
||||
Pidhorskyi et al.~\cite{Pidhorskyi2018} take a probabilistic approach
|
||||
and compute how likely it is that a sample is generated by the
|
||||
inlier distribution.
|
||||
Teye et al.~\cite{Teye2018} make the point that most modern networks have
|
||||
adopted other regularisation techniques. Ioffe and Szeged~\cite{Ioffe2015}
|
||||
introduced batch normalisation which has been adapted widely. Teye et al.
|
||||
showed how batch normalisation training is similar to dropout and can be
|
||||
viewed as an approximate Bayesian inference. Estimates of the model uncertainty
|
||||
can be gained with a technique named Monte Carlo Batch Normalisation (MCBN).
|
||||
Consequently, this technique can be applied to any network that utilises
|
||||
standard batch normalisation.
|
||||
Li et al.~\cite{Li2019} investigated the problem of poor performance
|
||||
when combining dropout and batch normalisation: Dropout shifts the variance
|
||||
of a neural unit when switching from train to test, batch normalisation
|
||||
does not change the variance. This inconsistency leads to a variance shift which
|
||||
can have a larger or smaller impact based on the network used. For example,
|
||||
adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like
|
||||
Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses
|
||||
batch normalisation.
|
||||
|
||||
Kendall and Gal~\cite{Kendall2017} provide a Bayesian deep learning
|
||||
framework that combines input-dependent
|
||||
aleatoric\footnote{captures noise inherent in observations}
|
||||
uncertainty with epistemic\footnote{uncertainty in the model}
|
||||
uncertainty. Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
|
||||
implement a predictive uncertainty estimation using deep ensembles
|
||||
rather than Bayesian networks. Geifman et al.~\cite{Geifman2018}
|
||||
introduce an uncertainty estimation algorithm for non-Bayesian deep
|
||||
Non-Bayesian approaches have been developed as well. Usually, they compare with
|
||||
MC dropout and show better performance.
|
||||
Postels et al.~\cite{Postels2019} provided a sampling-free approach for
|
||||
uncertainty estimation that does not affect training and approximates the
|
||||
sampling on test time. They compared it to MC dropout and found less computational
|
||||
overhead with better results.
|
||||
Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
|
||||
implemented a predictive uncertainty estimation using deep ensembles.
|
||||
Compared to MC dropout, it showed better results.
|
||||
Geifman et al.~\cite{Geifman2018}
|
||||
introduced an uncertainty estimation algorithm for non-Bayesian deep
|
||||
neural classification that estimates the uncertainty of highly
|
||||
confident points using earlier snapshots of the trained model.
|
||||
Miller et al.~\cite{Miller2018a} compare merging strategies
|
||||
for sampling-based uncertainty techniques in object detection.
|
||||
Sensoy et al.~\cite{Sensoy2018} treat prediction confidence
|
||||
as subjective opinions: they place a Dirichlet distribution on it.
|
||||
The trained predictor for a multi-class classification is also a
|
||||
Dirichlet distribution.
|
||||
confident points using earlier snapshots of the trained model and improves,
|
||||
among others, the approach introduced by Lakshminarayanan et al.
|
||||
Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
|
||||
a Dirichlet distribution is placed over the class probabilities. Consequently,
|
||||
the predictions of a neural network are treated as subjective opinions.
|
||||
|
||||
Gal and Ghahramani~\cite{Gal2016} show how dropout can be used
|
||||
as a Bayesian approximation. Miller et al.~\cite{Miller2018}
|
||||
build upon the work of Miller et al.~\cite{Miller2018a} and
|
||||
Gal and Ghahramani: they use dropout sampling under open-set
|
||||
conditions for object detection. Mukhoti and Gal~\cite{Mukhoti2018}
|
||||
contribute metrics to measure uncertainty for semantic
|
||||
segmentation. Wu et al.~\cite{Wu2019} introduce two innovations
|
||||
In addition to the aforementioned Bayesian and non-Bayesian works,
|
||||
there are some Bayesian works that do not quite fit with the rest but
|
||||
are important as well. Mukhoti and Gal~\cite{Mukhoti2018}
|
||||
contributed metrics to measure uncertainty for semantic
|
||||
segmentation. Wu et al.~\cite{Wu2019} introduced two innovations
|
||||
that turn variational Bayes into a robust tool for Bayesian
|
||||
networks: they introduce a novel deterministic method to approximate
|
||||
networks: a novel deterministic method to approximate
|
||||
moments in neural networks which eliminates gradient variance, and
|
||||
they introduce a hierarchical prior for parameters and an
|
||||
Empirical Bayes procedure to select prior variances.
|
||||
a hierarchical prior for parameters and an empirical Bayes procedure to select
|
||||
prior variances.
|
||||
|
||||
\section{Background for Bayesian SSD}
|
||||
|
||||
|
|
Loading…
Reference in New Issue