From a98f9e8d5528d37416cd9835574cae142d42e26f Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Tue, 27 Aug 2019 13:57:51 +0200 Subject: [PATCH] Finished raw version of related works Signed-off-by: Jim Martens --- body.tex | 165 ++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 114 insertions(+), 51 deletions(-) diff --git a/body.tex b/body.tex index 1f54e2e..6b01d03 100644 --- a/body.tex +++ b/body.tex @@ -175,68 +175,131 @@ be explained. \section{Related Works} +The task of novelty detection can be accomplished in a variety of ways. +Pimentel et al.~\cite{Pimentel2014} provide a review of novelty detection +methods published over the previous decade. They showcase probabilistic, +distance-based, reconstruction-based, domain-based, and information-theoretic +novelty detection. Based on their categorisation, this thesis falls under +reconstruction-based novelty detection as it deals only with neural network +approaches. Therefore, the other types of novelty detection will only be +briefly introduced. + +\subsection{Overview over types of novelty detection} + +Probabilistic approaches estimate the generative probability density function (pdf) +of the data. It is assumed that the training data is generated from an underlying +probability distribution \(D\). This distribution can be estimated with the +training data, the estimate is defined as \(\hat D\) and represents a model +of normality. A novelty threshold is applied to \(\hat D\) in a way that +allows a probabilistic interpretation. Pidhorskyi et al.~\cite{Pidhorskyi2018} +combine a probabilistic approach to novelty detection with auto-encoders. + +Distance-based novelty detection uses either nearest neighbour-based approaches +(e.g. ) %TODO citations) +or clustering-based approaches +(e.g. ). % TODO citations +Both methods are similar to estimating the +pdf of data, they use well-defined distance metrics to compute the distance +between two data points. + +Domain-based novelty detection describes the boundary of the known data, rather +than the data itself. Unknown data is identified by its position relative to +the boundary. A common implementation for this are support vector machines +(e.g. implemented by ). % TODO citations + +Information-theoretic novelty detection computes the information content +of a data set, for example, with metrics like entropy. Such metrics assume +that novel data inside the data set significantly alters the information +content of an otherwise normal data set. First, the metrics are calculated over the +whole data set. Afterwards, a subset is identified that causes the biggest +difference in the metric when removed from the data set. This subset is considered +to consist of novel data. For example, xyz provide a recent approach. +% TODO citations + +\subsection{Reconstruction-based novelty detection} + +Reconstruction-based approaches use the reconstruction error in one form +or another to calculate the novelty score. This can be auto-encoders that +literally reconstruct the input but it also includes MLP networks which try +to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiated +between neural network-based approaches and subspace methods. The first were +further differentiated between MLPs, Hopfield networks, autoassociative networks, +radial basis function, and self-organising networks. +The remainder of this section focuses on MLP-based works, a particular focus will +be on the task of object detection and Bayesian networks. + Novelty detection for object detection is intricately linked with open set conditions: the test data can contain unknown classes. -Bishop~\cite{Bishop1994} investigates the correlation between +Bishop~\cite{Bishop1994} investigated the correlation between the degree of novel input data and the reliability of network -outputs. Pimentel et al.~\cite{Pimentel2014} provide a review -of novelty detection methods published over the previous decade. +outputs. -There are two primary pathways that deal with novelty: novelty -detection using auto-encoders and uncertainty estimation with -bayesian networks. +The Bayesian approach provides a theoretical foundation for +modelling uncertainty \cite{Ghahramani2015}. +MacKay~\cite{MacKay1992} provided a practical Bayesian +framework for backpropagation networks. Neal~\cite{Neal1996} built upon +the work of MacKay and explored Bayesian learning for neural networks. +However, these Bayesian neural networks do not scale well. Over the course +of time, two major Bayesian approximations were introduced: one based +on dropout and one based on batch normalisation. -Japkowicz et al.~\cite{Japkowicz1995} introduce a novelty detection -method based on the hippocampus of Gluck and Meyers~\cite{Gluck1993} -and use an auto-encoder to recognize novel instances. -Thompson et al.~\cite{Thompson2002} show that auto-encoders -can learn "normal" system behaviour implicitly. -Goodfellow et al.~\cite{Goodfellow2014} introduce adversarial -networks: a generator that attempts to trick the discriminator -by generating samples indistinguishable from the real data. -Makhzani et al.~\cite{Makhzani2015} build on the work of Goodfellow -and propose adversarial auto-encoders. Richter and -Roy~\cite{Richter2017} use an auto-encoder to detect novelty. +Gal and Ghahramani~\cite{Gal2016} showed that dropout training is a +Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017} +showed that dropout training actually corresponds to a general approximate +Bayesian model. This means every network trained with dropout is an +approximate Bayesian model. During inference the dropout remains active, +this form of inference is called Monte Carlo Dropout (MCDO). +Miller et al.~\cite{Miller2018} built upon the work of Gal and Ghahramani: they +use MC dropout under open-set conditions for object detection. +In a second paper \cite{Miller2018a}, Miller et al. continued their work and +compared merging strategies for sampling-based uncertainty techniques in +object detection. -Wang et al.~\cite{Wang2018} base upon Goodfellow's work and -use a generative adversarial network for novelty detection. -Sabokrou et al.~\cite{Sabokrou2018} implement an end-to-end -architecture for one-class classification: it consists of two -deep networks, with one being the novelty detector and the other -enhancing inliers and distorting outliers. -Pidhorskyi et al.~\cite{Pidhorskyi2018} take a probabilistic approach -and compute how likely it is that a sample is generated by the -inlier distribution. +Teye et al.~\cite{Teye2018} make the point that most modern networks have +adopted other regularisation techniques. Ioffe and Szeged~\cite{Ioffe2015} +introduced batch normalisation which has been adapted widely. Teye et al. +showed how batch normalisation training is similar to dropout and can be +viewed as an approximate Bayesian inference. Estimates of the model uncertainty +can be gained with a technique named Monte Carlo Batch Normalisation (MCBN). +Consequently, this technique can be applied to any network that utilises +standard batch normalisation. +Li et al.~\cite{Li2019} investigated the problem of poor performance +when combining dropout and batch normalisation: Dropout shifts the variance +of a neural unit when switching from train to test, batch normalisation +does not change the variance. This inconsistency leads to a variance shift which +can have a larger or smaller impact based on the network used. For example, +adding dropout layers to SSD \cite{Liu2016} and applying MC dropout, like +Miller et al.~\cite{Miller2018} did, causes such a problem because SSD uses +batch normalisation. -Kendall and Gal~\cite{Kendall2017} provide a Bayesian deep learning -framework that combines input-dependent -aleatoric\footnote{captures noise inherent in observations} -uncertainty with epistemic\footnote{uncertainty in the model} -uncertainty. Lakshminarayanan et al.~\cite{Lakshminarayanan2017} -implement a predictive uncertainty estimation using deep ensembles -rather than Bayesian networks. Geifman et al.~\cite{Geifman2018} -introduce an uncertainty estimation algorithm for non-Bayesian deep +Non-Bayesian approaches have been developed as well. Usually, they compare with +MC dropout and show better performance. +Postels et al.~\cite{Postels2019} provided a sampling-free approach for +uncertainty estimation that does not affect training and approximates the +sampling on test time. They compared it to MC dropout and found less computational +overhead with better results. +Lakshminarayanan et al.~\cite{Lakshminarayanan2017} +implemented a predictive uncertainty estimation using deep ensembles. +Compared to MC dropout, it showed better results. +Geifman et al.~\cite{Geifman2018} +introduced an uncertainty estimation algorithm for non-Bayesian deep neural classification that estimates the uncertainty of highly -confident points using earlier snapshots of the trained model. -Miller et al.~\cite{Miller2018a} compare merging strategies -for sampling-based uncertainty techniques in object detection. -Sensoy et al.~\cite{Sensoy2018} treat prediction confidence -as subjective opinions: they place a Dirichlet distribution on it. -The trained predictor for a multi-class classification is also a -Dirichlet distribution. +confident points using earlier snapshots of the trained model and improves, +among others, the approach introduced by Lakshminarayanan et al. +Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty: +a Dirichlet distribution is placed over the class probabilities. Consequently, +the predictions of a neural network are treated as subjective opinions. -Gal and Ghahramani~\cite{Gal2016} show how dropout can be used -as a Bayesian approximation. Miller et al.~\cite{Miller2018} -build upon the work of Miller et al.~\cite{Miller2018a} and -Gal and Ghahramani: they use dropout sampling under open-set -conditions for object detection. Mukhoti and Gal~\cite{Mukhoti2018} -contribute metrics to measure uncertainty for semantic -segmentation. Wu et al.~\cite{Wu2019} introduce two innovations +In addition to the aforementioned Bayesian and non-Bayesian works, +there are some Bayesian works that do not quite fit with the rest but +are important as well. Mukhoti and Gal~\cite{Mukhoti2018} +contributed metrics to measure uncertainty for semantic +segmentation. Wu et al.~\cite{Wu2019} introduced two innovations that turn variational Bayes into a robust tool for Bayesian -networks: they introduce a novel deterministic method to approximate +networks: a novel deterministic method to approximate moments in neural networks which eliminates gradient variance, and -they introduce a hierarchical prior for parameters and an -Empirical Bayes procedure to select prior variances. +a hierarchical prior for parameters and an empirical Bayes procedure to select +prior variances. \section{Background for Bayesian SSD}