Finished raw version of related works

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
Jim Martens 2019-08-27 13:57:51 +02:00
parent b53b3673bf
commit a98f9e8d55
1 changed files with 114 additions and 51 deletions

165
body.tex
View File

@ -175,68 +175,131 @@ be explained.
\section{Related Works}
The task of novelty detection can be accomplished in a variety of ways.
Pimentel et al.~\cite{Pimentel2014} provide a review of novelty detection
methods published over the previous decade. They showcase probabilistic,
distance-based, reconstruction-based, domain-based, and information-theoretic
novelty detection. Based on their categorisation, this thesis falls under
reconstruction-based novelty detection as it deals only with neural network
approaches. Therefore, the other types of novelty detection will only be
briefly introduced.
\subsection{Overview over types of novelty detection}
Probabilistic approaches estimate the generative probability density function (pdf)
of the data. It is assumed that the training data is generated from an underlying
probability distribution \(D\). This distribution can be estimated with the
training data, the estimate is defined as \(\hat D\) and represents a model
of normality. A novelty threshold is applied to \(\hat D\) in a way that
allows a probabilistic interpretation. Pidhorskyi et al.~\cite{Pidhorskyi2018}
combine a probabilistic approach to novelty detection with auto-encoders.
Distance-based novelty detection uses either nearest neighbour-based approaches
(e.g. ) %TODO citations)
or clustering-based approaches
(e.g. ). % TODO citations
Both methods are similar to estimating the
pdf of data, they use well-defined distance metrics to compute the distance
between two data points.
Domain-based novelty detection describes the boundary of the known data, rather
than the data itself. Unknown data is identified by its position relative to
the boundary. A common implementation for this are support vector machines
(e.g. implemented by ). % TODO citations
Information-theoretic novelty detection computes the information content
of a data set, for example, with metrics like entropy. Such metrics assume
that novel data inside the data set significantly alters the information
content of an otherwise normal data set. First, the metrics are calculated over the
whole data set. Afterwards, a subset is identified that causes the biggest
difference in the metric when removed from the data set. This subset is considered
to consist of novel data. For example, xyz provide a recent approach.
% TODO citations
\subsection{Reconstruction-based novelty detection}
Reconstruction-based approaches use the reconstruction error in one form
or another to calculate the novelty score. This can be auto-encoders that
literally reconstruct the input but it also includes MLP networks which try
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiated
between neural network-based approaches and subspace methods. The first were
further differentiated between MLPs, Hopfield networks, autoassociative networks,
radial basis function, and self-organising networks.
The remainder of this section focuses on MLP-based works, a particular focus will
be on the task of object detection and Bayesian networks.
Novelty detection for object detection is intricately linked with
open set conditions: the test data can contain unknown classes.
Bishop~\cite{Bishop1994} investigates the correlation between
Bishop~\cite{Bishop1994} investigated the correlation between
the degree of novel input data and the reliability of network
outputs. Pimentel et al.~\cite{Pimentel2014} provide a review
of novelty detection methods published over the previous decade.
outputs.
There are two primary pathways that deal with novelty: novelty
detection using auto-encoders and uncertainty estimation with
bayesian networks.
The Bayesian approach provides a theoretical foundation for
modelling uncertainty \cite{Ghahramani2015}.
MacKay~\cite{MacKay1992} provided a practical Bayesian
framework for backpropagation networks. Neal~\cite{Neal1996} built upon
the work of MacKay and explored Bayesian learning for neural networks.
However, these Bayesian neural networks do not scale well. Over the course
of time, two major Bayesian approximations were introduced: one based
on dropout and one based on batch normalisation.
Japkowicz et al.~\cite{Japkowicz1995} introduce a novelty detection
method based on the hippocampus of Gluck and Meyers~\cite{Gluck1993}
and use an auto-encoder to recognize novel instances.
Thompson et al.~\cite{Thompson2002} show that auto-encoders
can learn "normal" system behaviour implicitly.
Goodfellow et al.~\cite{Goodfellow2014} introduce adversarial
networks: a generator that attempts to trick the discriminator
by generating samples indistinguishable from the real data.
Makhzani et al.~\cite{Makhzani2015} build on the work of Goodfellow
and propose adversarial auto-encoders. Richter and
Roy~\cite{Richter2017} use an auto-encoder to detect novelty.
Gal and Ghahramani~\cite{Gal2016} showed that dropout training is a
Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
showed that dropout training actually corresponds to a general approximate
Bayesian model. This means every network trained with dropout is an
approximate Bayesian model. During inference the dropout remains active,
this form of inference is called Monte Carlo Dropout (MCDO).
Miller et al.~\cite{Miller2018} built upon the work of Gal and Ghahramani: they
use MC dropout under open-set conditions for object detection.
In a second paper \cite{Miller2018a}, Miller et al. continued their work and
compared merging strategies for sampling-based uncertainty techniques in
object detection.
Wang et al.~\cite{Wang2018} base upon Goodfellow's work and
use a generative adversarial network for novelty detection.
Sabokrou et al.~\cite{Sabokrou2018} implement an end-to-end
architecture for one-class classification: it consists of two
deep networks, with one being the novelty detector and the other
enhancing inliers and distorting outliers.
Pidhorskyi et al.~\cite{Pidhorskyi2018} take a probabilistic approach
and compute how likely it is that a sample is generated by the
inlier distribution.
Teye et al.~\cite{Teye2018} make the point that most modern networks have
adopted other regularisation techniques. Ioffe and Szeged~\cite{Ioffe2015}
introduced batch normalisation which has been adapted widely. Teye et al.
showed how batch normalisation training is similar to dropout and can be
viewed as an approximate Bayesian inference. Estimates of the model uncertainty
can be gained with a technique named Monte Carlo Batch Normalisation (MCBN).
Consequently, this technique can be applied to any network that utilises
standard batch normalisation.
Li et al.~\cite{Li2019} investigated the problem of poor performance
when combining dropout and batch normalisation: Dropout shifts the variance
of a neural unit when switching from train to test, batch normalisation
does not change the variance. This inconsistency leads to a variance shift which
can have a larger or smaller impact based on the network used. For example,
adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like
Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses
batch normalisation.
Kendall and Gal~\cite{Kendall2017} provide a Bayesian deep learning
framework that combines input-dependent
aleatoric\footnote{captures noise inherent in observations}
uncertainty with epistemic\footnote{uncertainty in the model}
uncertainty. Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
implement a predictive uncertainty estimation using deep ensembles
rather than Bayesian networks. Geifman et al.~\cite{Geifman2018}
introduce an uncertainty estimation algorithm for non-Bayesian deep
Non-Bayesian approaches have been developed as well. Usually, they compare with
MC dropout and show better performance.
Postels et al.~\cite{Postels2019} provided a sampling-free approach for
uncertainty estimation that does not affect training and approximates the
sampling on test time. They compared it to MC dropout and found less computational
overhead with better results.
Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
implemented a predictive uncertainty estimation using deep ensembles.
Compared to MC dropout, it showed better results.
Geifman et al.~\cite{Geifman2018}
introduced an uncertainty estimation algorithm for non-Bayesian deep
neural classification that estimates the uncertainty of highly
confident points using earlier snapshots of the trained model.
Miller et al.~\cite{Miller2018a} compare merging strategies
for sampling-based uncertainty techniques in object detection.
Sensoy et al.~\cite{Sensoy2018} treat prediction confidence
as subjective opinions: they place a Dirichlet distribution on it.
The trained predictor for a multi-class classification is also a
Dirichlet distribution.
confident points using earlier snapshots of the trained model and improves,
among others, the approach introduced by Lakshminarayanan et al.
Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
a Dirichlet distribution is placed over the class probabilities. Consequently,
the predictions of a neural network are treated as subjective opinions.
Gal and Ghahramani~\cite{Gal2016} show how dropout can be used
as a Bayesian approximation. Miller et al.~\cite{Miller2018}
build upon the work of Miller et al.~\cite{Miller2018a} and
Gal and Ghahramani: they use dropout sampling under open-set
conditions for object detection. Mukhoti and Gal~\cite{Mukhoti2018}
contribute metrics to measure uncertainty for semantic
segmentation. Wu et al.~\cite{Wu2019} introduce two innovations
In addition to the aforementioned Bayesian and non-Bayesian works,
there are some Bayesian works that do not quite fit with the rest but
are important as well. Mukhoti and Gal~\cite{Mukhoti2018}
contributed metrics to measure uncertainty for semantic
segmentation. Wu et al.~\cite{Wu2019} introduced two innovations
that turn variational Bayes into a robust tool for Bayesian
networks: they introduce a novel deterministic method to approximate
networks: a novel deterministic method to approximate
moments in neural networks which eliminates gradient variance, and
they introduce a hierarchical prior for parameters and an
Empirical Bayes procedure to select prior variances.
a hierarchical prior for parameters and an empirical Bayes procedure to select
prior variances.
\section{Background for Bayesian SSD}