Finished raw version of related works

Signed-off-by: Jim Martens <github@2martens.de>
2019-08-27 13:57:51 +02:00
parent b53b3673bf
commit a98f9e8d55
1 changed files with 114 additions and 51 deletions
--- a/body.tex
+++ b/body.tex
@ -175,68 +175,131 @@ be explained.
 \section{Related Works}
 The task of novelty detection can be accomplished in a variety of ways.
 Pimentel et al.~\cite{Pimentel2014} provide a review of novelty detection
 methods published over the previous decade. They showcase probabilistic,
 distance-based, reconstruction-based, domain-based, and information-theoretic
 novelty detection. Based on their categorisation, this thesis falls under
 reconstruction-based novelty detection as it deals only with neural network
 approaches. Therefore, the other types of novelty detection will only be
 briefly introduced.
 \subsection{Overview over types of novelty detection}
 Probabilistic approaches estimate the generative probability density function (pdf)
 of the data. It is assumed that the training data is generated from an underlying
 probability distribution \(D\). This distribution can be estimated with the
 training data, the estimate is defined as \(\hat D\) and represents a model
 of normality. A novelty threshold is applied to \(\hat D\) in a way that
 allows a probabilistic interpretation. Pidhorskyi et al.~\cite{Pidhorskyi2018}
 combine a probabilistic approach to novelty detection with auto-encoders.
 Distance-based novelty detection uses either nearest neighbour-based approaches
 (e.g. ) %TODO citations)
 or clustering-based approaches
 (e.g. ). % TODO citations
 Both methods are similar to estimating the
 pdf of data, they use well-defined distance metrics to compute the distance
 between two data points.
 Domain-based novelty detection describes the boundary of the known data, rather
 than the data itself. Unknown data is identified by its position relative to
 the boundary. A common implementation for this are support vector machines
 (e.g. implemented by ). % TODO citations
 Information-theoretic novelty detection computes the information content
 of a data set, for example, with metrics like entropy. Such metrics assume
 that novel data inside the data set significantly alters the information
 content of an otherwise normal data set. First, the metrics are calculated over the
 whole data set. Afterwards, a subset is identified that causes the biggest
 difference in the metric when removed from the data set. This subset is considered
 to consist of novel data. For example, xyz provide a recent approach.
 % TODO citations
 \subsection{Reconstruction-based novelty detection}
 Reconstruction-based approaches use the reconstruction error in one form
 or another to calculate the novelty score. This can be auto-encoders that
 literally reconstruct the input but it also includes MLP networks which try
 to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiated
 between neural network-based approaches and subspace methods. The first were
 further differentiated between MLPs, Hopfield networks, autoassociative networks,
 radial basis function, and self-organising networks.
 The remainder of this section focuses on MLP-based works, a particular focus will
 be on the task of object detection and Bayesian networks.
 Novelty detection for object detection is intricately linked with
 open set conditions: the test data can contain unknown classes.
-Bishop~\cite{Bishop1994} investigates the correlation between
+Bishop~\cite{Bishop1994} investigated the correlation between
 the degree of novel input data and the reliability of network
-outputs. Pimentel et al.~\cite{Pimentel2014} provide a review
+outputs.
 of novelty detection methods published over the previous decade.
-There are two primary pathways that deal with novelty: novelty
+The Bayesian approach provides a theoretical foundation for
-detection using auto-encoders and uncertainty estimation with
+modelling uncertainty \cite{Ghahramani2015}.
-bayesian networks.
+MacKay~\cite{MacKay1992} provided a practical Bayesian
 framework for backpropagation networks. Neal~\cite{Neal1996} built upon
 the work of MacKay and explored Bayesian learning for neural networks.
 However, these Bayesian neural networks do not scale well. Over the course
 of time, two major Bayesian approximations were introduced: one based
 on dropout and one based on batch normalisation.
-Japkowicz et al.~\cite{Japkowicz1995} introduce a novelty detection
+Gal and Ghahramani~\cite{Gal2016} showed that dropout training is a
-method based on the hippocampus of Gluck and Meyers~\cite{Gluck1993}
+Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
-and use an auto-encoder to recognize novel instances.
+showed that dropout training actually corresponds to a general approximate
-Thompson et al.~\cite{Thompson2002} show that auto-encoders
+Bayesian model. This means every network trained with dropout is an
-can learn "normal" system behaviour implicitly.
+approximate Bayesian model. During inference the dropout remains active,
-Goodfellow et al.~\cite{Goodfellow2014} introduce adversarial
+this form of inference is called Monte Carlo Dropout (MCDO).
-networks: a generator that attempts to trick the discriminator
+Miller et al.~\cite{Miller2018} built upon the work of Gal and Ghahramani: they
-by generating samples indistinguishable from the real data.
+use MC dropout under open-set conditions for object detection.
-Makhzani et al.~\cite{Makhzani2015} build on the work of Goodfellow
+In a second paper \cite{Miller2018a}, Miller et al. continued their work and
-and propose adversarial auto-encoders. Richter and
+compared merging strategies for sampling-based uncertainty techniques in
-Roy~\cite{Richter2017} use an auto-encoder to detect novelty.
+object detection.
-Wang et al.~\cite{Wang2018} base upon Goodfellow's work and
+Teye et al.~\cite{Teye2018} make the point that most modern networks have
-use a generative adversarial network for novelty detection.
+adopted other regularisation techniques. Ioffe and Szeged~\cite{Ioffe2015}
-Sabokrou et al.~\cite{Sabokrou2018} implement an end-to-end
+introduced batch normalisation which has been adapted widely. Teye et al.
-architecture for one-class classification: it consists of two
+showed how batch normalisation training is similar to dropout and can be
-deep networks, with one being the novelty detector and the other
+viewed as an approximate Bayesian inference. Estimates of the model uncertainty
-enhancing inliers and distorting outliers.
+can be gained with a technique named Monte Carlo Batch Normalisation (MCBN).
-Pidhorskyi et al.~\cite{Pidhorskyi2018} take a probabilistic approach
+Consequently, this technique can be applied to any network that utilises
-and compute how likely it is that a sample is generated by the
+standard batch normalisation.
-inlier distribution.
+Li et al.~\cite{Li2019} investigated the problem of poor performance
 when combining dropout and batch normalisation: Dropout shifts the variance
 of a neural unit when switching from train to test, batch normalisation
 does not change the variance. This inconsistency leads to a variance shift which
 can have a larger or smaller impact based on the network used. For example,
 adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like
 Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses
 batch normalisation.
-Kendall and Gal~\cite{Kendall2017} provide a Bayesian deep learning
+Non-Bayesian approaches have been developed as well. Usually, they compare with
-framework that combines input-dependent
+MC dropout and show better performance.
-aleatoric\footnote{captures noise inherent in observations}
+Postels et al.~\cite{Postels2019} provided a sampling-free approach for
-uncertainty with epistemic\footnote{uncertainty in the model}
+uncertainty estimation that does not affect training and approximates the
-uncertainty. Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
+sampling on test time. They compared it to MC dropout and found less computational
-implement a predictive uncertainty estimation using deep ensembles
+overhead with better results.
-rather than Bayesian networks. Geifman et al.~\cite{Geifman2018}
+Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
-introduce an uncertainty estimation algorithm for non-Bayesian deep
+implemented a predictive uncertainty estimation using deep ensembles.
 Compared to MC dropout, it showed better results.
 Geifman et al.~\cite{Geifman2018}
 introduced an uncertainty estimation algorithm for non-Bayesian deep
 neural classification that estimates the uncertainty of highly
-confident points using earlier snapshots of the trained model.
+confident points using earlier snapshots of the trained model and improves,
-Miller et al.~\cite{Miller2018a} compare merging strategies
+among others, the approach introduced by Lakshminarayanan et al.
-for sampling-based uncertainty techniques in object detection.
+Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
-Sensoy et al.~\cite{Sensoy2018} treat prediction confidence
+a Dirichlet distribution is placed over the class probabilities. Consequently,
-as subjective opinions: they place a Dirichlet distribution on it.
+the predictions of a neural network are treated as subjective opinions.
 The trained predictor for a multi-class classification is also a
 Dirichlet distribution.
-Gal and Ghahramani~\cite{Gal2016} show how dropout can be used
+In addition to the aforementioned Bayesian and non-Bayesian works,
-as a Bayesian approximation. Miller et al.~\cite{Miller2018}
+there are some Bayesian works that do not quite fit with the rest but
-build upon the work of Miller et al.~\cite{Miller2018a} and
+are important as well. Mukhoti and Gal~\cite{Mukhoti2018}
-Gal and Ghahramani: they use dropout sampling under open-set
+contributed metrics to measure uncertainty for semantic
-conditions for object detection. Mukhoti and Gal~\cite{Mukhoti2018}
+segmentation. Wu et al.~\cite{Wu2019} introduced two innovations
 contribute metrics to measure uncertainty for semantic
 segmentation. Wu et al.~\cite{Wu2019} introduce two innovations
 that turn variational Bayes into a robust tool for Bayesian
-networks: they introduce a novel deterministic method to approximate
+networks: a novel deterministic method to approximate
 moments in neural networks which eliminates gradient variance, and
-they introduce a hierarchical prior for parameters and an
+a hierarchical prior for parameters and an empirical Bayes procedure to select
-Empirical Bayes procedure to select prior variances.
+prior variances.
 \section{Background for Bayesian SSD}