diff --git a/body_expose.tex b/body_expose.tex index 0bf3451..4b062d5 100644 --- a/body_expose.tex +++ b/body_expose.tex @@ -235,10 +235,6 @@ detections for unknown object classes have a higher label uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then be used to identify and reject these false positive cases. -\section{Generative Probabilistic Novelty Detection} - -% TODO Write about GPND in understandable terms - \section{Adversarial Auto-encoder} This section will explain the adversarial auto-encoder used by @@ -360,6 +356,136 @@ to give it a higher probability. Therefore, using only one auto-encoder fulfils the task of differentiating between known and unknown classes. +\section{Generative Probabilistic Novelty Detection} + +It is still unclear how the novelty score is calculated. +This section will clear this up in as understandable as +possible terms. However, the name "Generative Probabilistic +Novelty Detection" already signals that probability theory +has something to do with it. Furthermore, this section +will make use of some mathematical terms which cannot +be explained in great detail here. Moreover, the previous section +already introduced many required components, which will not be +explained here again. + +For the purpose of this explanation a trained auto-encoder +is assumed. In that case the generator function describes +the model that the auto-encoder is actually using for the +novelty detection. The task of training is to make sure this +model comes as close as possible to the real model of the +training or testing data. The model of the auto-encoder +is in mathematical terms a parameterized manifold +\(\mathcal{M} \equiv g(\Omega)\) of dimension \(n\). +The set of training or testing data can then be described +in the following way: +\begin{equation} \label{eq:train-set} + x_i = g(z_i) + \xi_i \quad i \in \mathbb{N}, +\end{equation} +\noindent +where \(\xi_i\) represents noise. It may be confusing but +for the purpose of this novelty test the "truth" is what +the generator function generates from a set of \(z_i \in \Omega\), +not the ground truth from the data set. Furthermore, +the previously introduced encoder function \(e\) is assumed +to work as an exact inverse of \(g\) for every \(x \in \mathcal{M}\). +For such \(x\) it follows that \(x = g(e(x))\). + +Let \(\overline{x} \in \mathbb{R}^m\) be a data point from the test +data. The remainder of the section will explain how the novelty +test is performed for this \(\overline{x}\). It is important +to note that this data point is not necessarily part of the +auto-encoder model. Therefore, \(g(e(\overline{x})) = x\) cannot +be assumed. However, it can be observed that \(\overline{x}\) +can be non-linearly projected onto +\(\overline{x}^{\|} \in \mathcal{M}\) +by using \(g(\overline{z})\) with \(\overline{z} = e(\overline{x})\). +It is assumed that \(g\) is smooth enough to perform a linearization +based on the first-order Taylor expansion: +\begin{equation} \label{eq:taylor-expanse} + g(z) = g(\overline{z}) + J_g(\overline{z}) (z - \overline{z}) + \mathcal{O}(\| z - \overline{z} \|^2), +\end{equation} +\noindent +where \(J_g(\overline{z})\) is the Jacobi matrix of \(g\) computed +at \(\overline{z}\). It is assumed that the Jacobi matrix of \(g\) +has the full rank at every point of the manifold. A Jacobi matrix +contains all first-order partial derivatives of a function. +\(\| \cdot \|\) is the \(\mathbf{L}_2\) norm, which calculates the +length of a vector by calculating the square root of the sum of +squares of all dimensions of the vector. Lastly, \(\mathcal{O}\) +is called Big-O notation and is used for specifying the time +complexity of an algorithm. In this case it contains a linear +value, which means that this part of the term can be ignored for +\(z\) growing to infinity. + +Next the tangent space of \(g\) at \(\overline{x}^{\|}\), which +is spanned by the \(n\) independent column vectors of the Jacobi +matrix \(J_g(\overline{z})\), is defined as +\(\mathcal{T} = \text{span}(J_g(\overline{z}))\). The tangent space +of a point of a function describes all the vectors that could go +through this point. The Jacobi matrix can be decomposed into three +matrices using singular value decomposition: \(J_g(\overline{z}) = U^{\|}SV^{*}\). \(\mathcal{T}\) is defined to also be spanned +by the column vectors of \(U^{\|}\): \(\mathcal{T} = \text{span}(U^{\|})\). \(U^{\|}\) contains the left-singular values +and \(V^{*}\) is the conjugate transposed version of the matrix +\(V\), which contains the right-singular values. \(U^{\bot}\) is +defined in such a way that \(U = [U^{\|}U^{\bot}]\) is a unitary +matrix. \(\mathcal{T^{\bot}}\) is the orthogonal complement of +\(\mathcal{T}\). With this preparation \(\overline{x}\) can be +represented with respect to the local coordinates that define +\(\mathcal{T}\) and \(\mathcal{T}^{\bot}\). This representation +can be achieved by computing +\begin{equation} \label{eq:w-definition} + \overline{w} = U^{\top} \overline{x} = \left[\begin{matrix} + U^{\|^{\top}} \overline{x} \\ + U^{\bot^{\top}} \overline{x} + \end{matrix}\right] = \left[\begin{matrix} + \overline{w}^{\|} \\ + \overline{w}^{\bot} + \end{matrix}\right], +\end{equation} +\noindent +where the rotated coordinates (training/testing data points +changed to be on the tangent space) +\(\overline{w}\) are decomposed into \(\overline{w}^{\|}\), which +are parallel to \(\mathcal{T}\), and \(\overline{w}^{\bot}\), which +are orthogonal to \(\mathcal{T}\). + +The last step to define the novelty test involves probability +density functions (PDFs), which are now introduced. The PDF \(p_X(x)\) +describes the random variable \(X\), from which the training and +testing data points are drawn. In addition, \(p_W(w)\) is the +probability density function of the random variable \(W\), +which represents \(X\) after changing the coordinates. Both +distributions are identical. But it is assumed that the coordinates +\(W^{\|}\), which are parallel to \(\mathcal{T}\), and the coordinates +\(W^{\bot}\), which are orthogonal to \(\mathcal{T}\), are +statistically independent. With this assumption the following holds: +\begin{equation} \label{eq:pdf-x} + p_X(x) = p_W(w) = p_W(w^{\|}, w^{\bot}) = p_{W^{\|}}(w^{\|}) p_{W^{\bot}}(w^{\bot}) +\end{equation} +The previously introduced noise comes into play again. In fprmula +(\ref{eq:train-set}) it is assumed that the noise \(\xi\) +predominantly deviates the point \(x\) away from the manifold +\(\mathcal{M}\) in a direction orthogonal to \(\mathcal{T}\). +As a consequence \(W^{\bot}\) is mainly responsible for the noise +effects. Since noise and drawing from the manifold are statistically +independent, \(W^{\|}\) and \(W^{\bot}\) are also independent. + +Finally, referring back to the data point \(\overline{x}\), the +novelty test is defined like this: +\begin{equation} \label{eq:novelty-test} + p_X(\overline{x}) = p_{W^{\|}}(\overline{w}^{\|})p_{W^{\bot}}(\overline{w}^{\bot}) = + \begin{cases} + \geq \gamma & \Longrightarrow \text{Inlier} \\ + < \gamma & \Longrightarrow \text{Outlier} + \end{cases} +\end{equation} +\noindent +where \(\gamma\) is a suitable threshold. + +At this point it is very clear that the GPND approach requires +far more math background than dropout sampling to understand +the novelty test. Nonetheless it could be the better method. + \section{Contribution} This section will outline what exactly the scientific as well as