Added GPND section to background
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
134
body_expose.tex
134
body_expose.tex
@ -235,10 +235,6 @@ detections for unknown object classes have a higher label
|
|||||||
uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
|
uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
|
||||||
be used to identify and reject these false positive cases.
|
be used to identify and reject these false positive cases.
|
||||||
|
|
||||||
\section{Generative Probabilistic Novelty Detection}
|
|
||||||
|
|
||||||
% TODO Write about GPND in understandable terms
|
|
||||||
|
|
||||||
\section{Adversarial Auto-encoder}
|
\section{Adversarial Auto-encoder}
|
||||||
|
|
||||||
This section will explain the adversarial auto-encoder used by
|
This section will explain the adversarial auto-encoder used by
|
||||||
@ -360,6 +356,136 @@ to give it a higher probability. Therefore, using only one
|
|||||||
auto-encoder fulfils the task of differentiating between
|
auto-encoder fulfils the task of differentiating between
|
||||||
known and unknown classes.
|
known and unknown classes.
|
||||||
|
|
||||||
|
\section{Generative Probabilistic Novelty Detection}
|
||||||
|
|
||||||
|
It is still unclear how the novelty score is calculated.
|
||||||
|
This section will clear this up in as understandable as
|
||||||
|
possible terms. However, the name "Generative Probabilistic
|
||||||
|
Novelty Detection" already signals that probability theory
|
||||||
|
has something to do with it. Furthermore, this section
|
||||||
|
will make use of some mathematical terms which cannot
|
||||||
|
be explained in great detail here. Moreover, the previous section
|
||||||
|
already introduced many required components, which will not be
|
||||||
|
explained here again.
|
||||||
|
|
||||||
|
For the purpose of this explanation a trained auto-encoder
|
||||||
|
is assumed. In that case the generator function describes
|
||||||
|
the model that the auto-encoder is actually using for the
|
||||||
|
novelty detection. The task of training is to make sure this
|
||||||
|
model comes as close as possible to the real model of the
|
||||||
|
training or testing data. The model of the auto-encoder
|
||||||
|
is in mathematical terms a parameterized manifold
|
||||||
|
\(\mathcal{M} \equiv g(\Omega)\) of dimension \(n\).
|
||||||
|
The set of training or testing data can then be described
|
||||||
|
in the following way:
|
||||||
|
\begin{equation} \label{eq:train-set}
|
||||||
|
x_i = g(z_i) + \xi_i \quad i \in \mathbb{N},
|
||||||
|
\end{equation}
|
||||||
|
\noindent
|
||||||
|
where \(\xi_i\) represents noise. It may be confusing but
|
||||||
|
for the purpose of this novelty test the "truth" is what
|
||||||
|
the generator function generates from a set of \(z_i \in \Omega\),
|
||||||
|
not the ground truth from the data set. Furthermore,
|
||||||
|
the previously introduced encoder function \(e\) is assumed
|
||||||
|
to work as an exact inverse of \(g\) for every \(x \in \mathcal{M}\).
|
||||||
|
For such \(x\) it follows that \(x = g(e(x))\).
|
||||||
|
|
||||||
|
Let \(\overline{x} \in \mathbb{R}^m\) be a data point from the test
|
||||||
|
data. The remainder of the section will explain how the novelty
|
||||||
|
test is performed for this \(\overline{x}\). It is important
|
||||||
|
to note that this data point is not necessarily part of the
|
||||||
|
auto-encoder model. Therefore, \(g(e(\overline{x})) = x\) cannot
|
||||||
|
be assumed. However, it can be observed that \(\overline{x}\)
|
||||||
|
can be non-linearly projected onto
|
||||||
|
\(\overline{x}^{\|} \in \mathcal{M}\)
|
||||||
|
by using \(g(\overline{z})\) with \(\overline{z} = e(\overline{x})\).
|
||||||
|
It is assumed that \(g\) is smooth enough to perform a linearization
|
||||||
|
based on the first-order Taylor expansion:
|
||||||
|
\begin{equation} \label{eq:taylor-expanse}
|
||||||
|
g(z) = g(\overline{z}) + J_g(\overline{z}) (z - \overline{z}) + \mathcal{O}(\| z - \overline{z} \|^2),
|
||||||
|
\end{equation}
|
||||||
|
\noindent
|
||||||
|
where \(J_g(\overline{z})\) is the Jacobi matrix of \(g\) computed
|
||||||
|
at \(\overline{z}\). It is assumed that the Jacobi matrix of \(g\)
|
||||||
|
has the full rank at every point of the manifold. A Jacobi matrix
|
||||||
|
contains all first-order partial derivatives of a function.
|
||||||
|
\(\| \cdot \|\) is the \(\mathbf{L}_2\) norm, which calculates the
|
||||||
|
length of a vector by calculating the square root of the sum of
|
||||||
|
squares of all dimensions of the vector. Lastly, \(\mathcal{O}\)
|
||||||
|
is called Big-O notation and is used for specifying the time
|
||||||
|
complexity of an algorithm. In this case it contains a linear
|
||||||
|
value, which means that this part of the term can be ignored for
|
||||||
|
\(z\) growing to infinity.
|
||||||
|
|
||||||
|
Next the tangent space of \(g\) at \(\overline{x}^{\|}\), which
|
||||||
|
is spanned by the \(n\) independent column vectors of the Jacobi
|
||||||
|
matrix \(J_g(\overline{z})\), is defined as
|
||||||
|
\(\mathcal{T} = \text{span}(J_g(\overline{z}))\). The tangent space
|
||||||
|
of a point of a function describes all the vectors that could go
|
||||||
|
through this point. The Jacobi matrix can be decomposed into three
|
||||||
|
matrices using singular value decomposition: \(J_g(\overline{z}) = U^{\|}SV^{*}\). \(\mathcal{T}\) is defined to also be spanned
|
||||||
|
by the column vectors of \(U^{\|}\): \(\mathcal{T} = \text{span}(U^{\|})\). \(U^{\|}\) contains the left-singular values
|
||||||
|
and \(V^{*}\) is the conjugate transposed version of the matrix
|
||||||
|
\(V\), which contains the right-singular values. \(U^{\bot}\) is
|
||||||
|
defined in such a way that \(U = [U^{\|}U^{\bot}]\) is a unitary
|
||||||
|
matrix. \(\mathcal{T^{\bot}}\) is the orthogonal complement of
|
||||||
|
\(\mathcal{T}\). With this preparation \(\overline{x}\) can be
|
||||||
|
represented with respect to the local coordinates that define
|
||||||
|
\(\mathcal{T}\) and \(\mathcal{T}^{\bot}\). This representation
|
||||||
|
can be achieved by computing
|
||||||
|
\begin{equation} \label{eq:w-definition}
|
||||||
|
\overline{w} = U^{\top} \overline{x} = \left[\begin{matrix}
|
||||||
|
U^{\|^{\top}} \overline{x} \\
|
||||||
|
U^{\bot^{\top}} \overline{x}
|
||||||
|
\end{matrix}\right] = \left[\begin{matrix}
|
||||||
|
\overline{w}^{\|} \\
|
||||||
|
\overline{w}^{\bot}
|
||||||
|
\end{matrix}\right],
|
||||||
|
\end{equation}
|
||||||
|
\noindent
|
||||||
|
where the rotated coordinates (training/testing data points
|
||||||
|
changed to be on the tangent space)
|
||||||
|
\(\overline{w}\) are decomposed into \(\overline{w}^{\|}\), which
|
||||||
|
are parallel to \(\mathcal{T}\), and \(\overline{w}^{\bot}\), which
|
||||||
|
are orthogonal to \(\mathcal{T}\).
|
||||||
|
|
||||||
|
The last step to define the novelty test involves probability
|
||||||
|
density functions (PDFs), which are now introduced. The PDF \(p_X(x)\)
|
||||||
|
describes the random variable \(X\), from which the training and
|
||||||
|
testing data points are drawn. In addition, \(p_W(w)\) is the
|
||||||
|
probability density function of the random variable \(W\),
|
||||||
|
which represents \(X\) after changing the coordinates. Both
|
||||||
|
distributions are identical. But it is assumed that the coordinates
|
||||||
|
\(W^{\|}\), which are parallel to \(\mathcal{T}\), and the coordinates
|
||||||
|
\(W^{\bot}\), which are orthogonal to \(\mathcal{T}\), are
|
||||||
|
statistically independent. With this assumption the following holds:
|
||||||
|
\begin{equation} \label{eq:pdf-x}
|
||||||
|
p_X(x) = p_W(w) = p_W(w^{\|}, w^{\bot}) = p_{W^{\|}}(w^{\|}) p_{W^{\bot}}(w^{\bot})
|
||||||
|
\end{equation}
|
||||||
|
The previously introduced noise comes into play again. In fprmula
|
||||||
|
(\ref{eq:train-set}) it is assumed that the noise \(\xi\)
|
||||||
|
predominantly deviates the point \(x\) away from the manifold
|
||||||
|
\(\mathcal{M}\) in a direction orthogonal to \(\mathcal{T}\).
|
||||||
|
As a consequence \(W^{\bot}\) is mainly responsible for the noise
|
||||||
|
effects. Since noise and drawing from the manifold are statistically
|
||||||
|
independent, \(W^{\|}\) and \(W^{\bot}\) are also independent.
|
||||||
|
|
||||||
|
Finally, referring back to the data point \(\overline{x}\), the
|
||||||
|
novelty test is defined like this:
|
||||||
|
\begin{equation} \label{eq:novelty-test}
|
||||||
|
p_X(\overline{x}) = p_{W^{\|}}(\overline{w}^{\|})p_{W^{\bot}}(\overline{w}^{\bot}) =
|
||||||
|
\begin{cases}
|
||||||
|
\geq \gamma & \Longrightarrow \text{Inlier} \\
|
||||||
|
< \gamma & \Longrightarrow \text{Outlier}
|
||||||
|
\end{cases}
|
||||||
|
\end{equation}
|
||||||
|
\noindent
|
||||||
|
where \(\gamma\) is a suitable threshold.
|
||||||
|
|
||||||
|
At this point it is very clear that the GPND approach requires
|
||||||
|
far more math background than dropout sampling to understand
|
||||||
|
the novelty test. Nonetheless it could be the better method.
|
||||||
|
|
||||||
\section{Contribution}
|
\section{Contribution}
|
||||||
|
|
||||||
This section will outline what exactly the scientific as well as
|
This section will outline what exactly the scientific as well as
|
||||||
|
|||||||
Reference in New Issue
Block a user