Added GPND section to background

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-03-08 20:22:43 +01:00
parent bcf3e21b75
commit d795c10c87

View File

@ -235,10 +235,6 @@ detections for unknown object classes have a higher label
uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
be used to identify and reject these false positive cases. be used to identify and reject these false positive cases.
\section{Generative Probabilistic Novelty Detection}
% TODO Write about GPND in understandable terms
\section{Adversarial Auto-encoder} \section{Adversarial Auto-encoder}
This section will explain the adversarial auto-encoder used by This section will explain the adversarial auto-encoder used by
@ -360,6 +356,136 @@ to give it a higher probability. Therefore, using only one
auto-encoder fulfils the task of differentiating between auto-encoder fulfils the task of differentiating between
known and unknown classes. known and unknown classes.
\section{Generative Probabilistic Novelty Detection}
It is still unclear how the novelty score is calculated.
This section will clear this up in as understandable as
possible terms. However, the name "Generative Probabilistic
Novelty Detection" already signals that probability theory
has something to do with it. Furthermore, this section
will make use of some mathematical terms which cannot
be explained in great detail here. Moreover, the previous section
already introduced many required components, which will not be
explained here again.
For the purpose of this explanation a trained auto-encoder
is assumed. In that case the generator function describes
the model that the auto-encoder is actually using for the
novelty detection. The task of training is to make sure this
model comes as close as possible to the real model of the
training or testing data. The model of the auto-encoder
is in mathematical terms a parameterized manifold
\(\mathcal{M} \equiv g(\Omega)\) of dimension \(n\).
The set of training or testing data can then be described
in the following way:
\begin{equation} \label{eq:train-set}
x_i = g(z_i) + \xi_i \quad i \in \mathbb{N},
\end{equation}
\noindent
where \(\xi_i\) represents noise. It may be confusing but
for the purpose of this novelty test the "truth" is what
the generator function generates from a set of \(z_i \in \Omega\),
not the ground truth from the data set. Furthermore,
the previously introduced encoder function \(e\) is assumed
to work as an exact inverse of \(g\) for every \(x \in \mathcal{M}\).
For such \(x\) it follows that \(x = g(e(x))\).
Let \(\overline{x} \in \mathbb{R}^m\) be a data point from the test
data. The remainder of the section will explain how the novelty
test is performed for this \(\overline{x}\). It is important
to note that this data point is not necessarily part of the
auto-encoder model. Therefore, \(g(e(\overline{x})) = x\) cannot
be assumed. However, it can be observed that \(\overline{x}\)
can be non-linearly projected onto
\(\overline{x}^{\|} \in \mathcal{M}\)
by using \(g(\overline{z})\) with \(\overline{z} = e(\overline{x})\).
It is assumed that \(g\) is smooth enough to perform a linearization
based on the first-order Taylor expansion:
\begin{equation} \label{eq:taylor-expanse}
g(z) = g(\overline{z}) + J_g(\overline{z}) (z - \overline{z}) + \mathcal{O}(\| z - \overline{z} \|^2),
\end{equation}
\noindent
where \(J_g(\overline{z})\) is the Jacobi matrix of \(g\) computed
at \(\overline{z}\). It is assumed that the Jacobi matrix of \(g\)
has the full rank at every point of the manifold. A Jacobi matrix
contains all first-order partial derivatives of a function.
\(\| \cdot \|\) is the \(\mathbf{L}_2\) norm, which calculates the
length of a vector by calculating the square root of the sum of
squares of all dimensions of the vector. Lastly, \(\mathcal{O}\)
is called Big-O notation and is used for specifying the time
complexity of an algorithm. In this case it contains a linear
value, which means that this part of the term can be ignored for
\(z\) growing to infinity.
Next the tangent space of \(g\) at \(\overline{x}^{\|}\), which
is spanned by the \(n\) independent column vectors of the Jacobi
matrix \(J_g(\overline{z})\), is defined as
\(\mathcal{T} = \text{span}(J_g(\overline{z}))\). The tangent space
of a point of a function describes all the vectors that could go
through this point. The Jacobi matrix can be decomposed into three
matrices using singular value decomposition: \(J_g(\overline{z}) = U^{\|}SV^{*}\). \(\mathcal{T}\) is defined to also be spanned
by the column vectors of \(U^{\|}\): \(\mathcal{T} = \text{span}(U^{\|})\). \(U^{\|}\) contains the left-singular values
and \(V^{*}\) is the conjugate transposed version of the matrix
\(V\), which contains the right-singular values. \(U^{\bot}\) is
defined in such a way that \(U = [U^{\|}U^{\bot}]\) is a unitary
matrix. \(\mathcal{T^{\bot}}\) is the orthogonal complement of
\(\mathcal{T}\). With this preparation \(\overline{x}\) can be
represented with respect to the local coordinates that define
\(\mathcal{T}\) and \(\mathcal{T}^{\bot}\). This representation
can be achieved by computing
\begin{equation} \label{eq:w-definition}
\overline{w} = U^{\top} \overline{x} = \left[\begin{matrix}
U^{\|^{\top}} \overline{x} \\
U^{\bot^{\top}} \overline{x}
\end{matrix}\right] = \left[\begin{matrix}
\overline{w}^{\|} \\
\overline{w}^{\bot}
\end{matrix}\right],
\end{equation}
\noindent
where the rotated coordinates (training/testing data points
changed to be on the tangent space)
\(\overline{w}\) are decomposed into \(\overline{w}^{\|}\), which
are parallel to \(\mathcal{T}\), and \(\overline{w}^{\bot}\), which
are orthogonal to \(\mathcal{T}\).
The last step to define the novelty test involves probability
density functions (PDFs), which are now introduced. The PDF \(p_X(x)\)
describes the random variable \(X\), from which the training and
testing data points are drawn. In addition, \(p_W(w)\) is the
probability density function of the random variable \(W\),
which represents \(X\) after changing the coordinates. Both
distributions are identical. But it is assumed that the coordinates
\(W^{\|}\), which are parallel to \(\mathcal{T}\), and the coordinates
\(W^{\bot}\), which are orthogonal to \(\mathcal{T}\), are
statistically independent. With this assumption the following holds:
\begin{equation} \label{eq:pdf-x}
p_X(x) = p_W(w) = p_W(w^{\|}, w^{\bot}) = p_{W^{\|}}(w^{\|}) p_{W^{\bot}}(w^{\bot})
\end{equation}
The previously introduced noise comes into play again. In fprmula
(\ref{eq:train-set}) it is assumed that the noise \(\xi\)
predominantly deviates the point \(x\) away from the manifold
\(\mathcal{M}\) in a direction orthogonal to \(\mathcal{T}\).
As a consequence \(W^{\bot}\) is mainly responsible for the noise
effects. Since noise and drawing from the manifold are statistically
independent, \(W^{\|}\) and \(W^{\bot}\) are also independent.
Finally, referring back to the data point \(\overline{x}\), the
novelty test is defined like this:
\begin{equation} \label{eq:novelty-test}
p_X(\overline{x}) = p_{W^{\|}}(\overline{w}^{\|})p_{W^{\bot}}(\overline{w}^{\bot}) =
\begin{cases}
\geq \gamma & \Longrightarrow \text{Inlier} \\
< \gamma & \Longrightarrow \text{Outlier}
\end{cases}
\end{equation}
\noindent
where \(\gamma\) is a suitable threshold.
At this point it is very clear that the GPND approach requires
far more math background than dropout sampling to understand
the novelty test. Nonetheless it could be the better method.
\section{Contribution} \section{Contribution}
This section will outline what exactly the scientific as well as This section will outline what exactly the scientific as well as