Added missing tex files and skeleton chapters

Signed-off-by: Jim Martens <github@2martens.de>
2019-07-28 14:50:50 +02:00 · 2019-07-28 14:50:50 +02:00 · f6a6907076
parent d0733c068f
commit f6a6907076
4 changed files with 23 additions and 466 deletions
--- a/abstract.tex
+++ b/abstract.tex
@ -0,0 +1,4 @@
+\clearpage
+\section*{Abstract}
+
+Ich bin ein Abstract
--- a/acknowledge.tex
+++ b/acknowledge.tex
@ -0,0 +1,4 @@
+\clearpage
+\section*{Acknowledgement}
+
+Lobhudelei
--- a/body.tex
+++ b/body.tex
@ -7,10 +7,10 @@
 Famous examples like the automatic soap dispenser which does not
 recognize the hand of a black person but dispenses soap when presented
 with a paper towel raise the question of bias in computer
-systems\cite{Friedman1996}. Related to this ethical question regarding
+systems~\cite{Friedman1996}. Related to this ethical question regarding
 the design of so called algorithms, a term often used in public
 discourse for applied neural networks, is the question of
-algorithmic accountability\cite{Diakopoulos2014}.
+algorithmic accountability~\cite{Diakopoulos2014}.

 The charm of supervised neural networks, that they can learn from
 input-output relations and figure out by themselves what connections
@ -79,7 +79,7 @@ with this type of task: model uncertainty and novelty detection.

 Model uncertainty can be measured with dropout sampling.
 Dropout is usually used only during training but
-Miller et al\cite{Miller2018} use them also during testing
+Miller et al.~\cite{Miller2018} use them also during testing
 to achieve different results for the same image making use of
 multiple forward passes. The output scores for the forward passes
 of the same image are then averaged. If the averaged class
@ -94,7 +94,7 @@ Novelty detection is the more direct approach to solve the task.
 In the realm of neural networks it is usually done with the help of
 auto-encoders that essentially solve a regression task of finding an
 identity function that reconstructs on the output the given
-input\cite{Pimentel2014}. Auto-encoders have
+input~\cite{Pimentel2014}. Auto-encoders have
 internally at least two components: an encoder, and a decoder or
 generator. The job of the encoder is to find an encoding that
 compresses the input as good as possible while simultaneously
@ -113,22 +113,22 @@ novelty score.
 Given these two approaches to solve the explanation task of above,
 it comes down to performance. At the end of the day the best
 theoretical idea does not help in solving the task if it cannot
-be implemented in a performant way. Miller et al have shown
+be implemented in a performant way. Miller et al. have shown
 some success in using dropout sampling. However, the many forward
 passes during testing for every image seem computationally expensive.
 In comparison a single run through a trained auto-encoder seems
 intuitively to be faster. This leads to the hypothesis (see below).

 For the purpose of this thesis, I will
-use the work of Miller et al as baseline to compare against.
-They use the SSD\cite{Liu2016} network for object detection,
+use the work of Miller et al. as baseline to compare against.
+They use the SSD~\cite{Liu2016} network for object detection,
 modified by added dropout layers, and the SceneNet
-RGB-D\cite{McCormac2017} data set using the MS COCO\cite{Lin2014}
+RGB-D~\cite{McCormac2017} data set using the MS COCO~\cite{Lin2014}
 classes. Instead of dropout sampling my approach will use
 an auto-encoder for novelty detection with all else, like
 using SSD for object detection and the SceneNet RGB-D data set,
 being equal. With respect to auto-encoders a recent implementation
-of an adversarial auto-encoder\cite{Pidhorskyi2018} will be used.
+of an adversarial auto-encoder~\cite{Pidhorskyi2018} will be used.

 \paragraph{Hypothesis} Novelty detection using auto-encoders
 delivers similar or better object detection performance under open set
@ -144,461 +144,10 @@ with MS COCO classes.

 \chapter{Background and Contribution}

-This chapter will provide a more in-depth look at the two works
-this thesis is based upon. First, the dropout sampling introduced
-by Miller et al\cite{Miller2018} will be showcased. Afterwards
-the Generative Probabilistic Novelty Detection with Adversarial
-Autoencoders\cite{Pidhorskyi2018} will be presented. The chapter
-will conclude with a more detailed explanation of the intended
-contribution of this thesis.
+\chapter{Methods}

-The dropout sampling explanation will follow the paper of Miller et
-al\cite{Miller2018} rather closely including the formulae used
-in their paper.
+\chapter{Results}

-\section{Dropout Sampling}
+\chapter{Discussion}

-To understand dropout sampling, it is necessary to explain the
-idea of Bayesian neural networks. They place a prior distribution
-over the network weights, for example a Gaussian prior distribution:
-\(\mathbf{W} \sim \mathcal{N}(0, I)\). In this example
-\(\mathbf{W}\) are the weights and \(I\) symbolises that every
-weight is drawn from an independent and identical distribution. The
-training of the network determines a plausible set of weights by
-evaluating the posterior (probability output) over the weights given
-the training data: \(p(\mathbf{W}|\mathbf{T})\). However, this
-evaluation cannot be performed in any reasonable
-time. Therefore approximation techniques are
-required. In those techniques the posterior is fitted with a
-simple distribution \(q^{*}_{\theta}(\mathbf{W})\). The original
-and intractable problem of averaging over all weights in the network
-is replaced with an optimisation task, where the parameters of the
-simple distribution are optimised over\cite{Kendall2017}.
-
-\subsubsection*{Dropout Variational Inference}
-
-Kendall and Gal\cite{Kendall2017} showed an approximation for
-classfication and recognition tasks. Dropout variational inference
-is a practical approximation technique by adding dropout layers
-in front of every weight layer and using them also during test
-time to sample from the approximate posterior. Effectively, this
-results in the approximation of the class probability
-\(p(y|\mathcal{I}, \mathbf{T})\) by performing multiple forward
-passes through the network and averaging over the obtained Softmax
-scores \(\mathbf{s}_i\), given an image \(\mathcal{I}\) and the
-training data \(\mathbf{T}\):
-\begin{equation} \label{eq:drop-sampling}
-p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i
-\end{equation}
-
-With this dropout sampling technique \(n\) model weights
-\(\widetilde{\mathbf{W}}_i\) are sampled from the posterior
-\(p(\mathbf{W}|\mathbf{T})\). The class probability
-\(p(y|\mathcal{I}, \mathbf{T})\) is a probability vector
-\(\mathbf{q}\) over all class labels. Finally, the uncertainty
-of the network with respect to the classification is given by
-the entropy \(H(\mathbf{q}) = - \sum_i q_i \cdot \log q_i\).
-
-\subsubsection*{Dropout Sampling for Object Detection}
-
-Miller et al\cite{Miller2018} apply the dropout sampling to
-object detection. In that case \(\mathbf{W}\) represents the
-learned weights of a detection network like SSD\cite{Liu2016}.
-Every forward pass uses a different network
-\(\widetilde{\mathbf{W}}\) which is approximately sampled from
-\(p(\mathbf{W}|\mathbf{T})\). Each forward pass in object
-detection results in a set of detections, each consisting of bounding
-box coordinates \(\mathbf{b}\) and softmax score \(\mathbf{s}\).
-The detections are denoted by Miller et al as \(D_i =
-\{\mathbf{s}_i,\mathbf{b}_i\}\). The detections of all passes are put
-into a large set \(\mathfrak{D} = \{D_1, ..., D_2\}\).
-
-All detections with mutual intersection-over-union scores (IoU)
-of \(0.95\) or higher are defined as an observation \(\mathcal{O}_i\).
-Subsequently, the corresponding vector of class probabilities
-\(\mathbf{q}_i\) for the observation is calculated by averaging all
-score vectors \(\mathbf{s}_j\) in a particular observation
-\(\mathcal{O}_i\): \(\mathbf{q}_i \approx \overline{\mathbf{s}}_i = \frac{1}{n} \sum_{j=1}^{n} \mathbf{s}_j\). The label uncertainty
-of the detector for a particular observation is measured by
-the entropy \(H(\mathbf{q}_i) = - \sum_j q_{ij} \cdot \log q_{ij}\).
-
-In the introduction I used a very reduced version to describe
-maximum and low uncertainty. A more complete explanation:
-If \(\mathbf{q}_i\), which I called averaged class probabilities,
-resembles a uniform distribution the entropy will be high. A uniform
-distribution means that no class is more likely than another, which
-is a perfect example of maximum uncertainty. Conversely, if
-one class has a very high probability the entropy will be low.
-
-In open set conditions it can be expected that falsely generated
-detections for unknown object classes have a higher label
-uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
-be used to identify and reject these false positive cases.
-
-\section{Adversarial Auto-encoder}
-
-This section will explain the adversarial auto-encoder used by
-Pidhorskyi et al\cite{Pidhorskyi2018} but in a slightly modified
-form to make it more understandable.
-
-The training data points \(x_i \in \mathbb{R}^m \) are the input
-of the auto-encoder. An encoding function \(e: \mathbb{R}^m \rightarrow \mathbb{R}^n\) takes the data points
-and produces a representation \(\overline{z_i} \in \mathbb{R}^n\)
-in a latent space. This latent space is smaller (\(n < m\)) than the
-input which necessitates some form of compression.
-
-A second function \(g: \Omega \rightarrow \mathbb{R}^m\) is the
-generator function that takes the latent representation
-\(z_i \in \Omega \subset \mathbb{R}^n\) and generates an output
-\(\overline{x_i}\) as close as possible to the input data
-distribution.
-
-What then is the difference between \(\overline{z_i}\) and \(z_i\)?
-With a simple auto-encoder both would be identical. In this case
-of an adversarial auto-encoder it is slightly more complicated.
-There is a discriminator \(D_z\) that tries to distinguish between
-an encoded data point \(\overline{z_i}\) and a \(z_i \sim \mathcal{N}(0,1)\) drawn from a normal distribution with \(0\) mean
-and a standard deviation of \(1\). During training, the encoding
-function \(e\) attempts to minimize any perceivable difference
-between \(z_i\) and \(\overline{z_i}\) while \(D_z\) has the
-aforementioned adversarial task to differentiate between them.
-
-Furthermore, there is a discriminator \(D_x\) that has the task
-to differentiate the generated output \(\overline{x_i}\) from the
-actual input \(x_i\). During training, the generator function \(g\)
-tries to minimize the perceivable difference between \(\overline{x_i}\) and \(x_i\) while \(D_x\) has the mentioned
-adversarial task to distinguish between them.
-
-With this all components of the adversarial auto-encoder employed
-by Pidhorskyi et al are introduced. Finally, the losses are
-presented. The two adversarial objectives have been mentioned
-already. Specifically, there is the adversarial loss for the
-discriminator \(D_z\):
-\begin{equation} \label{eq:adv-loss-z}
-    \mathcal{L}_{adv-d_z}(x,e,D_z) = E[\log (D_z(\mathcal{N}(0,1)))] + E[\log (1 - D_z(e(x)))],
-\end{equation}
-\noindent
-where \(E\) stands for an expected
-value\footnote{a term used in probability theory},
-\(x\) stands for the input, and
-\(\mathcal{N}(0,1)\) represents an element drawn from the specified
-distribution. The encoder \(e\) attempts to minimize this loss while
-the discriminator \(D_z\) intends to maximize it.
-
-In the same way the adversarial loss for the discriminator \(D_x\)
-is specified:
-\begin{equation} \label{eq:adv-loss-x}
-    \mathcal{L}_{adv-d_x}(x,D_x,g) = E[\log(D_x(x))] + E[\log(1 - D_x(g(\mathcal{N}(0,1))))],
-\end{equation}
-\noindent
-where \(x\), \(E\), and \(\mathcal{N}(0,1)\) have the same meaning
-as before. In this case the generator \(g\) tries to minimize the loss
-while the discriminator \(D_x\) attempts to maximize it.
-
-Every auto-encoder requires a reconstruction error to work. This
-error calculates the difference between the original input and
-the generated or decoded output. In this case, the reconstruction
-loss is defined like this:
-\begin{equation} \label{eq:recon-loss}
-    \mathcal{L}_{error}(x, e, g) = - E[\log(p(g(e(x)) | x))],
-\end{equation}
-\noindent
-where \(\log(p)\) is the expected log-likelihood and \(x\),
-\(E\), \(e\), and \(g\) have the same meaning as before.
-
-All losses combined result in the following formula:
-\begin{equation} \label{eq:full-loss}
-    \mathcal{L}(x,e,D_z,D_x,g) = \mathcal{L}_{adv-d_z}(x,e,D_z) + \mathcal{L}_{adv-d_x}(x,D_x,g) + \lambda \mathcal{L}_{error}(x,e,g),
-\end{equation}
-\noindent
-where \(\lambda\) is a parameter used to balance the adversarial
-losses with the reconstruction loss. The model is trained by
-Pidhorskyi et al using the Adam optimizer by doing alternative
-updates of each of the aforementioned components:
-
-\begin{itemize}
-    \item Maximize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(D_x\);
-    \item Minimize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(g\);
-    \item Maximize \(\mathcal{L}_{adv-d_z}\) by updating weights of \(D_z\);
-    \item Minimize \(\mathcal{L}_{error}\) and \(\mathcal{L}_{adv-d_z}\) by updating weights of \(e\) and \(g\).
-\end{itemize}
-
-Practically, the auto-encoder is trained separately for every
-object class that is considered "known". Pidhorskyi et al trained
-it on the MNIST\cite{Lecun1998} data set, once for every digit.
-
-For this thesis it needs to be trained on the SceneNet RGB-D
-data set using MS COCO classes as known classes. As in every
-test epoch all known classes are present, it becomes
-non-trivial which of the trained auto-encoders should be used to
-calculate novelty. To phrase it differently, a true positive
-detection is possible for multiple classes in the same image.
-If, for example, one object is classified correctly by SSD as a chair
-the novelty score should be low. But the auto-encoders of all
-known classes but the "chair" class will give ideally a high novelty
-score. Which of the values should be used? The only sensible solution
-is to only run it through the auto-encoder that was trained for
-the class the SSD model predicted. This provides the following
-scenarios:
-\begin{itemize}
-    \item true positive classification: novelty score should be low
-    \item false positive classification and correct class is
-    among the known classes: novelty score should be high
-    \item false positive classification and correct class is unknown:
-    novelty score should be high
-\end{itemize}
-\noindent
-Negative classifications are not listed as these are not part
-of the output of the SSD and cannot be given to the auto-encoder
-as input. Furthermore, the 2nd case should not happen because
-the trained SSD knows this other class and is very likely
-to give it a higher probability. Therefore, using only one
-auto-encoder fulfils the task of differentiating between
-known and unknown classes.
-
-\section{Generative Probabilistic Novelty Detection}
-
-It is still unclear how the novelty score is calculated.
-This section will clear this up in as understandable as
-possible terms. However, the name "Generative Probabilistic
-Novelty Detection"\cite{Pidhorskyi2018} already signals that
-probability theory has something to do with it. Furthermore, this
-section will make use of some mathematical terms which cannot
-be explained in great detail here. Moreover, the previous section
-already introduced many required components, which will not be
-explained here again.
-
-For the purpose of this explanation a trained auto-encoder
-is assumed. In that case the generator function describes
-the model that the auto-encoder is actually using for the
-novelty detection. The task of training is to make sure this
-model comes as close as possible to the real model of the
-training or testing data. The model of the auto-encoder
-is in mathematical terms a parameterized manifold
-\(\mathcal{M} \equiv g(\Omega)\) of dimension \(n\).
-The set of training or testing data can then be described
-in the following way:
-\begin{equation} \label{eq:train-set}
-    x_i = g(z_i) + \xi_i \quad i \in \mathbb{N},
-\end{equation}
-\noindent
-where \(\xi_i\) represents noise. It may be confusing but
-for the purpose of this novelty test the "truth" is what
-the generator function generates from a set of \(z_i \in \Omega\),
-not the ground truth from the data set. Furthermore,
-the previously introduced encoder function \(e\) is assumed
-to work as an exact inverse of \(g\) for every \(x \in \mathcal{M}\).
-For such \(x\) it follows that \(x = g(e(x))\).
-
-Let \(\overline{x} \in \mathbb{R}^m\) be a data point from the test
-data. The remainder of the section will explain how the novelty
-test is performed for this \(\overline{x}\). It is important
-to note that this data point is not necessarily part of the
-auto-encoder model. Therefore, \(g(e(\overline{x})) = x\) cannot
-be assumed. However, it can be observed that \(\overline{x}\)
-can be non-linearly projected onto
-\(\overline{x}^{\|} \in \mathcal{M}\)
-by using \(g(\overline{z})\) with \(\overline{z} = e(\overline{x})\).
-It is assumed that \(g\) is smooth enough to perform a linearization
-based on the first-order Taylor expansion:
-\begin{equation} \label{eq:taylor-expanse}
-    g(z) = g(\overline{z}) + J_g(\overline{z}) (z - \overline{z}) + \mathcal{O}(\| z - \overline{z} \|^2),
-\end{equation}
-\noindent
-where \(J_g(\overline{z})\) is the Jacobi matrix of \(g\) computed
-at \(\overline{z}\). It is assumed that the Jacobi matrix of \(g\)
-has the full rank at every point of the manifold. A Jacobi matrix
-contains all first-order partial derivatives of a function.
-\(\| \cdot \|\) is the \(\mathbf{L}_2\) norm, which calculates the
-length of a vector by calculating the square root of the sum of
-squares of all dimensions of the vector. Lastly, \(\mathcal{O}\)
-is called Big-O notation and is used for specifying the time
-complexity of an algorithm. In this case it contains a linear
-value, which means that this part of the term can be ignored for
-\(z\) growing to infinity.
-
-Next the tangent space of \(g\) at \(\overline{x}^{\|}\), which
-is spanned by the \(n\) independent column vectors of the Jacobi
-matrix \(J_g(\overline{z})\), is defined as
-\(\mathcal{T} = \text{span}(J_g(\overline{z}))\). The tangent space
-of a point of a function describes all the vectors that could go
-through this point. The Jacobi matrix can be decomposed into three
-matrices using singular value decomposition: \(J_g(\overline{z}) = U^{\|}SV^{*}\). \(\mathcal{T}\) is defined to also be spanned
-by the column vectors of \(U^{\|}\): \(\mathcal{T} = \text{span}(U^{\|})\). \(U^{\|}\) contains the left-singular values
-and \(V^{*}\) is the conjugate transposed version of the matrix
-\(V\), which contains the right-singular values. \(U^{\bot}\) is
-defined in such a way that \(U = [U^{\|}U^{\bot}]\) is a unitary
-matrix. \(\mathcal{T^{\bot}}\) is the orthogonal complement of
-\(\mathcal{T}\). With this preparation \(\overline{x}\) can be
-represented with respect to the local coordinates that define
-\(\mathcal{T}\) and \(\mathcal{T}^{\bot}\). This representation
-can be achieved by computing
-\begin{equation} \label{eq:w-definition}
-    \overline{w} = U^{\top} \overline{x} = \left[\begin{matrix}
-        U^{\|^{\top}} \overline{x} \\
-        U^{\bot^{\top}} \overline{x}
-    \end{matrix}\right] = \left[\begin{matrix}
-        \overline{w}^{\|} \\
-        \overline{w}^{\bot}
-    \end{matrix}\right],
-\end{equation}
-\noindent
-where the rotated coordinates (training/testing data points
-changed to be on the tangent space)
-\(\overline{w}\) are decomposed into \(\overline{w}^{\|}\), which
-are parallel to \(\mathcal{T}\), and \(\overline{w}^{\bot}\), which
-are orthogonal to \(\mathcal{T}\).
-
-The last step to define the novelty test involves probability
-density functions (PDFs), which are now introduced. The PDF \(p_X(x)\)
-describes the random variable \(X\), from which the training and
-testing data points are drawn. In addition, \(p_W(w)\) is the
-probability density function of the random variable \(W\),
-which represents \(X\) after changing the coordinates. Both
-distributions are identical. But it is assumed that the coordinates
-\(W^{\|}\), which are parallel to \(\mathcal{T}\), and the coordinates
-\(W^{\bot}\), which are orthogonal to \(\mathcal{T}\), are
-statistically independent. With this assumption the following holds:
-\begin{equation} \label{eq:pdf-x}
-    p_X(x) = p_W(w) = p_W(w^{\|}, w^{\bot}) = p_{W^{\|}}(w^{\|}) p_{W^{\bot}}(w^{\bot})
-\end{equation}
-The previously introduced noise comes into play again. In formula
-(\ref{eq:train-set}) it is assumed that the noise \(\xi\)
-predominantly deviates the point \(x\) away from the manifold
-\(\mathcal{M}\) in a direction orthogonal to \(\mathcal{T}\).
-As a consequence \(W^{\bot}\) is mainly responsible for the noise
-effects. Since noise and drawing from the manifold are statistically
-independent, \(W^{\|}\) and \(W^{\bot}\) are also independent.
-
-Finally, referring back to the data point \(\overline{x}\), the
-novelty test is defined like this:
-\begin{equation} \label{eq:novelty-test}
-    p_X(\overline{x}) = p_{W^{\|}}(\overline{w}^{\|})p_{W^{\bot}}(\overline{w}^{\bot}) =
-    \begin{cases}
-        \geq \gamma & \Longrightarrow \text{Inlier} \\
-        < \gamma &  \Longrightarrow \text{Outlier}
-    \end{cases}
-\end{equation}
-\noindent
-where \(\gamma\) is a suitable threshold.
-
-At this point it is very clear that the GPND approach requires
-far more math background than dropout sampling to understand
-the novelty test. Nonetheless it could be the better method.
-
-\section{Contribution}
-
-This section will outline what exactly the scientific as well as
-technical contribution of this thesis will be.
-
-\subsection*{Scientific Contribution}
-
-Miller et al\cite{Miller2018} use the SSD\cite{Liu2016} network
-extended with dropout layers and run multiple forward passes
-during the testing phase for every image. Considering the number
-of images in the SceneNet RGB-D\cite{McCormac2017} data set, these
-forward passes will take considerable time. It could be faster
-to only run one forward pass and then use the auto-encoder for
-novelty detection. However, the auto-encoder can only work
-with one detection at the time and must be called for every
-detection of the object detector separately. Therefore,
-it is interesting to investigate whether the second approach
-is indeed faster than the first.
-
-Dropout sampling uses the entropy to identify false positive
-cases. Such identified detections are discarded, which allows for
-a better object detection performance. The GPND approach uses
-the auto-encoder losses and results to identify novel cases and
-therefore mark detections as false positive. Subsequently these
-detections can be discarded as well. By comparing the object
-detection performance after discarding the identified false positive
-cases, the effectiveness of both approaches can be compared with each
-other. It is interesting to research if the GPND approach results in
-a better object detection performance than the dropout sampling
-provides.
-
-The formulated hypothesis, which is repeated after this paragraph,
-combines both aspects and requires a similar or better result in
-both of them. As a consequence it will be falsified if
-the computational performance of the GPND approach is not better than
-the one of dropout sampling or if the object detection performance
-is worse.
-
-\paragraph{Hypothesis} Novelty detection using auto-encoders
-delivers similar or better object detection performance under open set
-conditions while being less computationally expensive compared to
-dropout sampling.\\
-
-There are three possible scenarios that can be the result of
-the thesis:
-\begin{itemize}
-    \item the hypothesis is confirmed: Win-Win situation where
-    switching to GPND is straightforward.
-    \item one of the conditions fails: Win-Lose situation where
-    it is a trade-off between object detection performance and
-    computational performance. One approach will be better in
-    one thing and the other approach in the other thing.
-    \item both conditions fail: Lose-Lose situation where
-    dropout sampling is the best in both aspects.
-\end{itemize}
-
-Summarising, the scientific contribution is a comparison between
-dropout sampling and GPND with respect to both object detection
-performance and computational performance under open set conditions
-using the SceneNet RGB-D data set with the MS COCO classes as
-"known" object classes.
-
-The computational performance is measured by the time in milliseconds
-every test run takes. Interesting are not the absolute numbers,
-as these vary from machine to machine and are influenced by a
-plethora of uncontrollable factors, but the relative difference
-between both approaches and if the difference is significant.
-Object detection performance is measured by precision, recall,
-F1-score, and an open set error. While the first three metrics are
-standard, the last is adapted from Miller et al. It is defined
-as the number of observations (for dropout sampling) or detections
-(for GPND) that pass the respective false positive test (entropy or
-novelty), fall on unknown objects (there are no overlapping ground
-truth objects with IoU \(\geq 0.5\) and a known true class label)
-and do not have a winning class label of "unknown".
-
-\subsection*{Technical Contribution}
-
-Technical contribution includes all contributions
-that are not necessarily new in the scientific sense but are a
-meaningful engineering contribution in itself.
-
-There is no available source code for the work of
-Miller et al\cite{Miller2018}, which necessitates a re-implementation
-of their work by myself. The contribution is the fine-tuning of
-an SSD model pre-trained on ImageNet\cite{Deng2009}, extended by
-dropout layers, to the SceneNet RGB-D data set using MS COCO classes
-as the known classes for SSD.
-As MS COCO classes are more general than SceneNet RGB-D classes this
-also requires a mapping from one set of classes to the other.
-This entire contribution is technical and only re-implements
-what Miller et al have already done. It is expected that the
-evaluation of the results using this self-trained model will
-reproduce the results of Miller et al.
-
-For GPND source code is available but only for MNIST and using
-PyTorch. Therefore, the source code has to be transcoded from
-PyTorch to Tensorflow. Furthermore, it must be made compatible
-with the SceneNet RGB-D as the architecture is tailored to MNIST.
-The mapping from SceneNet RGB-D to MS COCO applies here as well and
-can therefore be considered a separate contribution. A fine-tuned
-SSD is required also but this time without added dropout layers.
-Additionally, it is necessary to train the auto-encoder for every
-known class separately.
-
-To summarise it in a list, the following separate deliverables
-are contributed:
-
-\begin{itemize}
-    \item source code for dropout sampling compatible with Tensorflow
-    \item source code for GPND compatible with Tensorflow
-    \item mapping from SceneNet RGB-D classes to MS COCO classes
-    \item vanilla SSD model fine-tuned on SceneNet RGB-D
-    \item dropout SSD model fine-tuned on SceneNet RGB-D
-    \item auto-encoder model trained separately on every MS COCO class
-\end{itemize}
+\chapter{Closing}
--- a/thesis.tex
+++ b/thesis.tex
@ -25,7 +25,7 @@
 }{}

 % use custom package to prevent spamming the preamble
-\usepackage[licence]{masterthesis}
+\usepackage[licence,library,acknowledge,abstract]{masterthesis}

 % specify image location
 \graphicspath{{./images/}{./private/images/}}
@ -38,7 +38,7 @@
 % invoke start command(s) from masterthesis package
 \start

-\input{body_expose.tex}
+\input{body.tex}

 % invoke finish command(s) from masterthesis package
 \finish