diff --git a/body_expose.tex b/body_expose.tex
index 0bf3451..4b062d5 100644
--- a/body_expose.tex
+++ b/body_expose.tex
@@ -235,10 +235,6 @@ detections for unknown object classes have a higher label
 uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
 be used to identify and reject these false positive cases.
 
-\section{Generative Probabilistic Novelty Detection}
-
-% TODO Write about GPND in understandable terms
-
 \section{Adversarial Auto-encoder}
 
 This section will explain the adversarial auto-encoder used by
@@ -360,6 +356,136 @@ to give it a higher probability. Therefore, using only one
 auto-encoder fulfils the task of differentiating between
 known and unknown classes.
 
+\section{Generative Probabilistic Novelty Detection}
+
+It is still unclear how the novelty score is calculated.
+This section will clear this up in as understandable as
+possible terms. However, the name "Generative Probabilistic
+Novelty Detection" already signals that probability theory
+has something to do with it. Furthermore, this section
+will make use of some mathematical terms which cannot
+be explained in great detail here. Moreover, the previous section
+already introduced many required components, which will not be
+explained here again.
+
+For the purpose of this explanation a trained auto-encoder
+is assumed. In that case the generator function describes
+the model that the auto-encoder is actually using for the
+novelty detection. The task of training is to make sure this
+model comes as close as possible to the real model of the
+training or testing data. The model of the auto-encoder
+is in mathematical terms a parameterized manifold
+\(\mathcal{M} \equiv g(\Omega)\) of dimension \(n\).
+The set of training or testing data can then be described
+in the following way:
+\begin{equation} \label{eq:train-set}
+    x_i = g(z_i) + \xi_i \quad i \in \mathbb{N},
+\end{equation}
+\noindent
+where \(\xi_i\) represents noise. It may be confusing but
+for the purpose of this novelty test the "truth" is what
+the generator function generates from a set of \(z_i \in \Omega\),
+not the ground truth from the data set. Furthermore,
+the previously introduced encoder function \(e\) is assumed
+to work as an exact inverse of \(g\) for every \(x \in \mathcal{M}\).
+For such \(x\) it follows that \(x = g(e(x))\).
+
+Let \(\overline{x} \in \mathbb{R}^m\) be a data point from the test
+data. The remainder of the section will explain how the novelty
+test is performed for this \(\overline{x}\). It is important
+to note that this data point is not necessarily part of the
+auto-encoder model. Therefore, \(g(e(\overline{x})) = x\) cannot
+be assumed. However, it can be observed that \(\overline{x}\)
+can be non-linearly projected onto
+\(\overline{x}^{\|} \in \mathcal{M}\)
+by using \(g(\overline{z})\) with \(\overline{z} = e(\overline{x})\).
+It is assumed that \(g\) is smooth enough to perform a linearization
+based on the first-order Taylor expansion:
+\begin{equation} \label{eq:taylor-expanse}
+    g(z) = g(\overline{z}) + J_g(\overline{z}) (z - \overline{z}) + \mathcal{O}(\| z - \overline{z} \|^2),
+\end{equation}
+\noindent
+where \(J_g(\overline{z})\) is the Jacobi matrix of \(g\) computed
+at \(\overline{z}\). It is assumed that the Jacobi matrix of \(g\)
+has the full rank at every point of the manifold. A Jacobi matrix
+contains all first-order partial derivatives of a function.
+\(\| \cdot \|\) is the \(\mathbf{L}_2\) norm, which calculates the
+length of a vector by calculating the square root of the sum of
+squares of all dimensions of the vector. Lastly, \(\mathcal{O}\)
+is called Big-O notation and is used for specifying the time
+complexity of an algorithm. In this case it contains a linear
+value, which means that this part of the term can be ignored for
+\(z\) growing to infinity.
+
+Next the tangent space of \(g\) at \(\overline{x}^{\|}\), which
+is spanned by the \(n\) independent column vectors of the Jacobi
+matrix \(J_g(\overline{z})\), is defined as
+\(\mathcal{T} = \text{span}(J_g(\overline{z}))\). The tangent space
+of a point of a function describes all the vectors that could go
+through this point. The Jacobi matrix can be decomposed into three
+matrices using singular value decomposition: \(J_g(\overline{z}) = U^{\|}SV^{*}\). \(\mathcal{T}\) is defined to also be spanned
+by the column vectors of \(U^{\|}\): \(\mathcal{T} = \text{span}(U^{\|})\). \(U^{\|}\) contains the left-singular values
+and \(V^{*}\) is the conjugate transposed version of the matrix
+\(V\), which contains the right-singular values. \(U^{\bot}\) is
+defined in such a way that \(U = [U^{\|}U^{\bot}]\) is a unitary
+matrix. \(\mathcal{T^{\bot}}\) is the orthogonal complement of
+\(\mathcal{T}\). With this preparation \(\overline{x}\) can be
+represented with respect to the local coordinates that define
+\(\mathcal{T}\) and \(\mathcal{T}^{\bot}\). This representation
+can be achieved by computing
+\begin{equation} \label{eq:w-definition}
+    \overline{w} = U^{\top} \overline{x} = \left[\begin{matrix}
+        U^{\|^{\top}} \overline{x} \\
+        U^{\bot^{\top}} \overline{x}
+    \end{matrix}\right] = \left[\begin{matrix}
+        \overline{w}^{\|} \\
+        \overline{w}^{\bot}
+    \end{matrix}\right],
+\end{equation}
+\noindent
+where the rotated coordinates (training/testing data points
+changed to be on the tangent space)
+\(\overline{w}\) are decomposed into \(\overline{w}^{\|}\), which
+are parallel to \(\mathcal{T}\), and \(\overline{w}^{\bot}\), which
+are orthogonal to \(\mathcal{T}\).
+
+The last step to define the novelty test involves probability
+density functions (PDFs), which are now introduced. The PDF \(p_X(x)\)
+describes the random variable \(X\), from which the training and
+testing data points are drawn. In addition, \(p_W(w)\) is the
+probability density function of the random variable \(W\),
+which represents \(X\) after changing the coordinates. Both
+distributions are identical. But it is assumed that the coordinates
+\(W^{\|}\), which are parallel to \(\mathcal{T}\), and the coordinates
+\(W^{\bot}\), which are orthogonal to \(\mathcal{T}\), are
+statistically independent. With this assumption the following holds:
+\begin{equation} \label{eq:pdf-x}
+    p_X(x) = p_W(w) = p_W(w^{\|}, w^{\bot}) = p_{W^{\|}}(w^{\|}) p_{W^{\bot}}(w^{\bot})
+\end{equation}
+The previously introduced noise comes into play again. In fprmula
+(\ref{eq:train-set}) it is assumed that the noise \(\xi\)
+predominantly deviates the point \(x\) away from the manifold
+\(\mathcal{M}\) in a direction orthogonal to \(\mathcal{T}\).
+As a consequence \(W^{\bot}\) is mainly responsible for the noise
+effects. Since noise and drawing from the manifold are statistically
+independent, \(W^{\|}\) and \(W^{\bot}\) are also independent.
+
+Finally, referring back to the data point \(\overline{x}\), the
+novelty test is defined like this:
+\begin{equation} \label{eq:novelty-test}
+    p_X(\overline{x}) = p_{W^{\|}}(\overline{w}^{\|})p_{W^{\bot}}(\overline{w}^{\bot}) =
+    \begin{cases}
+        \geq \gamma & \Longrightarrow \text{Inlier} \\
+        < \gamma &  \Longrightarrow \text{Outlier}
+    \end{cases}
+\end{equation}
+\noindent
+where \(\gamma\) is a suitable threshold.
+
+At this point it is very clear that the GPND approach requires
+far more math background than dropout sampling to understand
+the novelty test. Nonetheless it could be the better method.
+
 \section{Contribution}
 
 This section will outline what exactly the scientific as well as