Added GPND section to background

Signed-off-by: Jim Martens <github@2martens.de>
2019-03-08 20:22:43 +01:00
parent bcf3e21b75
commit d795c10c87
1 changed files with 130 additions and 4 deletions
--- a/body_expose.tex
+++ b/body_expose.tex
@ -235,10 +235,6 @@ detections for unknown object classes have a higher label
 uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
 be used to identify and reject these false positive cases.
 \section{Generative Probabilistic Novelty Detection}
 % TODO Write about GPND in understandable terms
 \section{Adversarial Auto-encoder}
 This section will explain the adversarial auto-encoder used by
@ -360,6 +356,136 @@ to give it a higher probability. Therefore, using only one
 auto-encoder fulfils the task of differentiating between
 known and unknown classes.
 \section{Generative Probabilistic Novelty Detection}
 It is still unclear how the novelty score is calculated.
 This section will clear this up in as understandable as
 possible terms. However, the name "Generative Probabilistic
 Novelty Detection" already signals that probability theory
 has something to do with it. Furthermore, this section
 will make use of some mathematical terms which cannot
 be explained in great detail here. Moreover, the previous section
 already introduced many required components, which will not be
 explained here again.
 For the purpose of this explanation a trained auto-encoder
 is assumed. In that case the generator function describes
 the model that the auto-encoder is actually using for the
 novelty detection. The task of training is to make sure this
 model comes as close as possible to the real model of the
 training or testing data. The model of the auto-encoder
 is in mathematical terms a parameterized manifold
 \(\mathcal{M} \equiv g(\Omega)\) of dimension \(n\).
 The set of training or testing data can then be described
 in the following way:
 \begin{equation} \label{eq:train-set}
    x_i = g(z_i) + \xi_i \quad i \in \mathbb{N},
 \end{equation}
 \noindent
 where \(\xi_i\) represents noise. It may be confusing but
 for the purpose of this novelty test the "truth" is what
 the generator function generates from a set of \(z_i \in \Omega\),
 not the ground truth from the data set. Furthermore,
 the previously introduced encoder function \(e\) is assumed
 to work as an exact inverse of \(g\) for every \(x \in \mathcal{M}\).
 For such \(x\) it follows that \(x = g(e(x))\).
 Let \(\overline{x} \in \mathbb{R}^m\) be a data point from the test
 data. The remainder of the section will explain how the novelty
 test is performed for this \(\overline{x}\). It is important
 to note that this data point is not necessarily part of the
 auto-encoder model. Therefore, \(g(e(\overline{x})) = x\) cannot
 be assumed. However, it can be observed that \(\overline{x}\)
 can be non-linearly projected onto
 \(\overline{x}^{\|} \in \mathcal{M}\)
 by using \(g(\overline{z})\) with \(\overline{z} = e(\overline{x})\).
 It is assumed that \(g\) is smooth enough to perform a linearization
 based on the first-order Taylor expansion:
 \begin{equation} \label{eq:taylor-expanse}
    g(z) = g(\overline{z}) + J_g(\overline{z}) (z - \overline{z}) + \mathcal{O}(\| z - \overline{z} \|^2),
 \end{equation}
 \noindent
 where \(J_g(\overline{z})\) is the Jacobi matrix of \(g\) computed
 at \(\overline{z}\). It is assumed that the Jacobi matrix of \(g\)
 has the full rank at every point of the manifold. A Jacobi matrix
 contains all first-order partial derivatives of a function.
 \(\| \cdot \|\) is the \(\mathbf{L}_2\) norm, which calculates the
 length of a vector by calculating the square root of the sum of
 squares of all dimensions of the vector. Lastly, \(\mathcal{O}\)
 is called Big-O notation and is used for specifying the time
 complexity of an algorithm. In this case it contains a linear
 value, which means that this part of the term can be ignored for
 \(z\) growing to infinity.
 Next the tangent space of \(g\) at \(\overline{x}^{\|}\), which
 is spanned by the \(n\) independent column vectors of the Jacobi
 matrix \(J_g(\overline{z})\), is defined as
 \(\mathcal{T} = \text{span}(J_g(\overline{z}))\). The tangent space
 of a point of a function describes all the vectors that could go
 through this point. The Jacobi matrix can be decomposed into three
 matrices using singular value decomposition: \(J_g(\overline{z}) = U^{\|}SV^{*}\). \(\mathcal{T}\) is defined to also be spanned
 by the column vectors of \(U^{\|}\): \(\mathcal{T} = \text{span}(U^{\|})\). \(U^{\|}\) contains the left-singular values
 and \(V^{*}\) is the conjugate transposed version of the matrix
 \(V\), which contains the right-singular values. \(U^{\bot}\) is
 defined in such a way that \(U = [U^{\|}U^{\bot}]\) is a unitary
 matrix. \(\mathcal{T^{\bot}}\) is the orthogonal complement of
 \(\mathcal{T}\). With this preparation \(\overline{x}\) can be
 represented with respect to the local coordinates that define
 \(\mathcal{T}\) and \(\mathcal{T}^{\bot}\). This representation
 can be achieved by computing
 \begin{equation} \label{eq:w-definition}
    \overline{w} = U^{\top} \overline{x} = \left[\begin{matrix}
        U^{\|^{\top}} \overline{x} \\
        U^{\bot^{\top}} \overline{x}
    \end{matrix}\right] = \left[\begin{matrix}
        \overline{w}^{\|} \\
        \overline{w}^{\bot}
    \end{matrix}\right],
 \end{equation}
 \noindent
 where the rotated coordinates (training/testing data points
 changed to be on the tangent space)
 \(\overline{w}\) are decomposed into \(\overline{w}^{\|}\), which
 are parallel to \(\mathcal{T}\), and \(\overline{w}^{\bot}\), which
 are orthogonal to \(\mathcal{T}\).
 The last step to define the novelty test involves probability
 density functions (PDFs), which are now introduced. The PDF \(p_X(x)\)
 describes the random variable \(X\), from which the training and
 testing data points are drawn. In addition, \(p_W(w)\) is the
 probability density function of the random variable \(W\),
 which represents \(X\) after changing the coordinates. Both
 distributions are identical. But it is assumed that the coordinates
 \(W^{\|}\), which are parallel to \(\mathcal{T}\), and the coordinates
 \(W^{\bot}\), which are orthogonal to \(\mathcal{T}\), are
 statistically independent. With this assumption the following holds:
 \begin{equation} \label{eq:pdf-x}
    p_X(x) = p_W(w) = p_W(w^{\|}, w^{\bot}) = p_{W^{\|}}(w^{\|}) p_{W^{\bot}}(w^{\bot})
 \end{equation}
 The previously introduced noise comes into play again. In fprmula
 (\ref{eq:train-set}) it is assumed that the noise \(\xi\)
 predominantly deviates the point \(x\) away from the manifold
 \(\mathcal{M}\) in a direction orthogonal to \(\mathcal{T}\).
 As a consequence \(W^{\bot}\) is mainly responsible for the noise
 effects. Since noise and drawing from the manifold are statistically
 independent, \(W^{\|}\) and \(W^{\bot}\) are also independent.
 Finally, referring back to the data point \(\overline{x}\), the
 novelty test is defined like this:
 \begin{equation} \label{eq:novelty-test}
    p_X(\overline{x}) = p_{W^{\|}}(\overline{w}^{\|})p_{W^{\bot}}(\overline{w}^{\bot}) =
    \begin{cases}
        \geq \gamma & \Longrightarrow \text{Inlier} \\
        < \gamma &  \Longrightarrow \text{Outlier}
    \end{cases}
 \end{equation}
 \noindent
 where \(\gamma\) is a suitable threshold.
 At this point it is very clear that the GPND approach requires
 far more math background than dropout sampling to understand
 the novelty test. Nonetheless it could be the better method.
 \section{Contribution}
 This section will outline what exactly the scientific as well as