From a98f9e8d5528d37416cd9835574cae142d42e26f Mon Sep 17 00:00:00 2001
From: Jim Martens <github@2martens.de>
Date: Tue, 27 Aug 2019 13:57:51 +0200
Subject: [PATCH] Finished raw version of related works

Signed-off-by: Jim Martens <github@2martens.de>
---
 body.tex | 165 ++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 114 insertions(+), 51 deletions(-)

diff --git a/body.tex b/body.tex
index 1f54e2e..6b01d03 100644
--- a/body.tex
+++ b/body.tex
@@ -175,68 +175,131 @@ be explained.
 
 \section{Related Works}
 
+The task of novelty detection can be accomplished in a variety of ways.
+Pimentel et al.~\cite{Pimentel2014} provide a review of novelty detection
+methods published over the previous decade. They showcase probabilistic,
+distance-based, reconstruction-based, domain-based, and information-theoretic
+novelty detection. Based on their categorisation, this thesis falls under
+reconstruction-based novelty detection as it deals only with neural network
+approaches. Therefore, the other types of novelty detection will only be
+briefly introduced.
+
+\subsection{Overview over types of novelty detection}
+
+Probabilistic approaches estimate the generative probability density function (pdf)
+of the data. It is assumed that the training data is generated from an underlying
+probability distribution \(D\). This distribution can be estimated with the
+training data, the estimate is defined as \(\hat D\) and represents a model
+of normality. A novelty threshold is applied to \(\hat D\) in a way that
+allows a probabilistic interpretation. Pidhorskyi et al.~\cite{Pidhorskyi2018}
+combine a probabilistic approach to novelty detection with auto-encoders.
+
+Distance-based novelty detection uses either nearest neighbour-based approaches
+(e.g. ) %TODO citations)
+or clustering-based approaches
+(e.g. ). % TODO citations
+Both methods are similar to estimating the
+pdf of data, they use well-defined distance metrics to compute the distance
+between two data points.
+
+Domain-based novelty detection describes the boundary of the known data, rather
+than the data itself. Unknown data is identified by its position relative to
+the boundary. A common implementation for this are support vector machines
+(e.g. implemented by ). % TODO citations
+
+Information-theoretic novelty detection computes the information content
+of a data set, for example, with metrics like entropy. Such metrics assume
+that novel data inside the data set significantly alters the information
+content of an otherwise normal data set. First, the metrics are calculated over the
+whole data set. Afterwards, a subset is identified that causes the biggest
+difference in the metric when removed from the data set. This subset is considered
+to consist of novel data. For example, xyz provide a recent approach.
+% TODO citations
+
+\subsection{Reconstruction-based novelty detection}
+
+Reconstruction-based approaches use the reconstruction error in one form
+or another to calculate the novelty score. This can be auto-encoders that
+literally reconstruct the input but it also includes MLP networks which try
+to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiated
+between neural network-based approaches and subspace methods. The first were
+further differentiated between MLPs, Hopfield networks, autoassociative networks,
+radial basis function, and self-organising networks.
+The remainder of this section focuses on MLP-based works, a particular focus will
+be on the task of object detection and Bayesian networks.
+
 Novelty detection for object detection is intricately linked with
 open set conditions: the test data can contain unknown classes.
-Bishop~\cite{Bishop1994} investigates the correlation between
+Bishop~\cite{Bishop1994} investigated the correlation between
 the degree of novel input data and the reliability of network
-outputs. Pimentel et al.~\cite{Pimentel2014} provide a review
-of novelty detection methods published over the previous decade.
+outputs.
 
-There are two primary pathways that deal with novelty: novelty
-detection using auto-encoders and uncertainty estimation with
-bayesian networks.
+The Bayesian approach provides a theoretical foundation for
+modelling uncertainty \cite{Ghahramani2015}.
+MacKay~\cite{MacKay1992} provided a practical Bayesian
+framework for backpropagation networks. Neal~\cite{Neal1996} built upon
+the work of MacKay and explored Bayesian learning for neural networks.
+However, these Bayesian neural networks do not scale well. Over the course
+of time, two major Bayesian approximations were introduced: one based
+on dropout and one based on batch normalisation.
 
-Japkowicz et al.~\cite{Japkowicz1995} introduce a novelty detection
-method based on the hippocampus of Gluck and Meyers~\cite{Gluck1993}
-and use an auto-encoder to recognize novel instances.
-Thompson et al.~\cite{Thompson2002} show that auto-encoders
-can learn "normal" system behaviour implicitly.
-Goodfellow et al.~\cite{Goodfellow2014} introduce adversarial
-networks: a generator that attempts to trick the discriminator
-by generating samples indistinguishable from the real data.
-Makhzani et al.~\cite{Makhzani2015} build on the work of Goodfellow
-and propose adversarial auto-encoders. Richter and
-Roy~\cite{Richter2017} use an auto-encoder to detect novelty.
+Gal and Ghahramani~\cite{Gal2016} showed that dropout training is a
+Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
+showed that dropout training actually corresponds to a general approximate
+Bayesian model. This means every network trained with dropout is an
+approximate Bayesian model. During inference the dropout remains active,
+this form of inference is called Monte Carlo Dropout (MCDO).
+Miller et al.~\cite{Miller2018} built upon the work of Gal and Ghahramani: they
+use MC dropout under open-set conditions for object detection.
+In a second paper \cite{Miller2018a}, Miller et al. continued their work and
+compared merging strategies for sampling-based uncertainty techniques in
+object detection.
 
-Wang et al.~\cite{Wang2018} base upon Goodfellow's work and
-use a generative adversarial network for novelty detection.
-Sabokrou et al.~\cite{Sabokrou2018} implement an end-to-end
-architecture for one-class classification: it consists of two
-deep networks, with one being the novelty detector and the other
-enhancing inliers and distorting outliers.
-Pidhorskyi et al.~\cite{Pidhorskyi2018} take a probabilistic approach
-and compute how likely it is that a sample is generated by the
-inlier distribution.
+Teye et al.~\cite{Teye2018} make the point that most modern networks have
+adopted other regularisation techniques. Ioffe and Szeged~\cite{Ioffe2015}
+introduced batch normalisation which has been adapted widely. Teye et al.
+showed how batch normalisation training is similar to dropout and can be
+viewed as an approximate Bayesian inference. Estimates of the model uncertainty
+can be gained with a technique named Monte Carlo Batch Normalisation (MCBN).
+Consequently, this technique can be applied to any network that utilises
+standard batch normalisation.
+Li et al.~\cite{Li2019} investigated the problem of poor performance
+when combining dropout and batch normalisation: Dropout shifts the variance
+of a neural unit when switching from train to test, batch normalisation
+does not change the variance. This inconsistency leads to a variance shift which
+can have a larger or smaller impact based on the network used. For example,
+adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like
+Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses
+batch normalisation.
 
-Kendall and Gal~\cite{Kendall2017} provide a Bayesian deep learning
-framework that combines input-dependent
-aleatoric\footnote{captures noise inherent in observations}
-uncertainty with epistemic\footnote{uncertainty in the model}
-uncertainty. Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
-implement a predictive uncertainty estimation using deep ensembles
-rather than Bayesian networks. Geifman et al.~\cite{Geifman2018}
-introduce an uncertainty estimation algorithm for non-Bayesian deep
+Non-Bayesian approaches have been developed as well. Usually, they compare with
+MC dropout and show better performance.
+Postels et al.~\cite{Postels2019} provided a sampling-free approach for
+uncertainty estimation that does not affect training and approximates the
+sampling on test time. They compared it to MC dropout and found less computational
+overhead with better results.
+Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
+implemented a predictive uncertainty estimation using deep ensembles.
+Compared to MC dropout, it showed better results.
+Geifman et al.~\cite{Geifman2018}
+introduced an uncertainty estimation algorithm for non-Bayesian deep
 neural classification that estimates the uncertainty of highly
-confident points using earlier snapshots of the trained model.
-Miller et al.~\cite{Miller2018a} compare merging strategies
-for sampling-based uncertainty techniques in object detection.
-Sensoy et al.~\cite{Sensoy2018} treat prediction confidence
-as subjective opinions: they place a Dirichlet distribution on it.
-The trained predictor for a multi-class classification is also a
-Dirichlet distribution.
+confident points using earlier snapshots of the trained model and improves,
+among others, the approach introduced by Lakshminarayanan et al.
+Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
+a Dirichlet distribution is placed over the class probabilities. Consequently,
+the predictions of a neural network are treated as subjective opinions.
 
-Gal and Ghahramani~\cite{Gal2016} show how dropout can be used
-as a Bayesian approximation. Miller et al.~\cite{Miller2018}
-build upon the work of Miller et al.~\cite{Miller2018a} and
-Gal and Ghahramani: they use dropout sampling under open-set
-conditions for object detection. Mukhoti and Gal~\cite{Mukhoti2018}
-contribute metrics to measure uncertainty for semantic
-segmentation. Wu et al.~\cite{Wu2019} introduce two innovations
+In addition to the aforementioned Bayesian and non-Bayesian works,
+there are some Bayesian works that do not quite fit with the rest but
+are important as well. Mukhoti and Gal~\cite{Mukhoti2018}
+contributed metrics to measure uncertainty for semantic
+segmentation. Wu et al.~\cite{Wu2019} introduced two innovations
 that turn variational Bayes into a robust tool for Bayesian
-networks: they introduce a novel deterministic method to approximate
+networks: a novel deterministic method to approximate
 moments in neural networks which eliminates gradient variance, and
-they introduce a hierarchical prior for parameters and an
-Empirical Bayes procedure to select prior variances.
+a hierarchical prior for parameters and an empirical Bayes procedure to select
+prior variances.
 
 \section{Background for Bayesian SSD}