637 lines
30 KiB
TeX
637 lines
30 KiB
TeX
% body thesis file that contains the actual content
|
|
|
|
\chapter{Introduction}
|
|
|
|
\subsection*{Motivation}
|
|
|
|
Famous examples like the automatic soap dispenser which does not
|
|
recognize the hand of a black person but dispenses soap when presented
|
|
with a paper towel raise the question of bias in computer
|
|
systems\cite{Friedman1996}. Related to this ethical question regarding
|
|
the design of so called algorithms, a term often used in public
|
|
discourse for applied neural networks, is the question of
|
|
algorithmic accountability\cite{Diakopoulos2014}.
|
|
|
|
The charme of supervised neural networks, that they can learn from
|
|
input-output relations and figure out by themselves what connections
|
|
are necessary for that, is also their achilles heel. This feature
|
|
makes them effectively black boxes. It is possible to question the
|
|
training environment, like potential biases inside the data sets, or
|
|
the engineers constructing the networks but it is not really possible
|
|
to question the internal calculations made by a network. On the one
|
|
hand, one might argue, it is only math and nothing magical that
|
|
happens inside these networks. Clearly it is possible, albeit a chore,
|
|
to manually follow the calculations of any given trained network.
|
|
After all it is executed on a computer and at the lowest level only
|
|
uses basic math that does not differ between humans and computers. On
|
|
the other hand not everyone is capable of doing so and more
|
|
importantly it does not reveal any answers to questions of causality.
|
|
|
|
However, these questions of causility are of enormous consequence when
|
|
neural networks are used, for example, in predictive policing. Is a
|
|
correlation, a coincidence, enough to bring forth negative consequences
|
|
for a particular person? And if so, what is the possible defence
|
|
against math? Similar questions can be raised when looking at computer
|
|
vision networks that might be used together with so called smart
|
|
CCTV cameras, for example, like those tested at the train station
|
|
Berlin Südkreuz. What if a network implies you committed suspicious
|
|
behaviour?
|
|
|
|
This leads to the need for neural networks to explain their results.
|
|
Such an explanation must come from the network or an attached piece
|
|
of technology to allow adoption in mass. Obviously this setting
|
|
poses the question, how such an endeavour can be achieved.
|
|
|
|
For neural networks there are fundamentally two type of tasks:
|
|
regression and classification. Regression deals with any case
|
|
where the goal for the network is to come close to an ideal
|
|
function that connects all data points. Classification, however,
|
|
describes tasks where the network is supposed to identify the
|
|
class of any given input. In this thesis, I will focus on
|
|
classification.
|
|
|
|
\subsection*{Object detection in open-set conditions}
|
|
|
|
More specifically, I will look at object detection in the open-set
|
|
conditions. In non-technical words this effectively describes
|
|
the kind of situation you encounter with CCTV cameras or robots
|
|
outside of a laboratory. Both use cameras that record
|
|
images. Subsequently a neural network analyses the image
|
|
and returns a list of detected and classified objects that it
|
|
found in the image. The problem here is that networks can only
|
|
classify what they know. If presented with an object type that
|
|
the network was not trained with, as happens frequently in real
|
|
environments, it will still classify the object and might even
|
|
have a high confidence in doing so. Such an example would be
|
|
a false positive. Any ordinary person who uses the results of
|
|
such a network would falsely assume that a high confidence always
|
|
means the classification is very likely correct. If they use
|
|
a proprietary system they might not even be able to find out
|
|
that the network was never trained on a particular type of object.
|
|
Therefore it would be impossible for them to identify the output
|
|
of the network as false positive.
|
|
|
|
This goes back to the need for automatic explanation. Such a system
|
|
should by itself recognize that the given object is unknown and
|
|
hence mark any classification result of the network as meaningless.
|
|
Technically there are two slightly different things that deal
|
|
with this type of task: model uncertainty and novelty detection.
|
|
|
|
Model uncertainty can be measured with dropout sampling.
|
|
Dropout is usually used only during training but
|
|
Miller et al\cite{Miller2018} use them also during testing
|
|
to achieve different results for the same image making use of
|
|
multiple forward passes. The output scores for the forward passes
|
|
of the same image are then averaged. If the averaged class
|
|
probabilities resemble a uniform distribution (every class has
|
|
the same probability) this symbolises maximum uncertainty. Conversely,
|
|
if there is one very high probability with every other being very
|
|
low this signifies a low uncertainty. An unknown object is more
|
|
likely to cause high uncertainty which allows for an identification
|
|
of false positive cases.
|
|
|
|
Novelty detection is the more direct approach to solve the task.
|
|
In the realm of neural networks it is usually done with the help of
|
|
auto-encoders that essentially solve a regression task of finding an
|
|
identity function that reconstructs on the output the given
|
|
input\cite{Pimentel2014}. Auto-encoders have
|
|
internally at least two components: an encoder, and a decoder or
|
|
generator. The job of the encoder is to find an encoding that
|
|
compresses the input as good as possible while simultaneously
|
|
being as loss-free as possible. The decoder takes this latent
|
|
representation of the input and has to find a decompression
|
|
that reconstructs the input as accurate as possible. During
|
|
training these auto-encoders learn to reproduce a certain group
|
|
of object classes. The actual novelty detection takes place
|
|
during testing. Given an image, and the output and loss of the
|
|
auto-encoder, a novelty score is calculated. A low novelty
|
|
score signals a known object. The opposite is true for a high
|
|
novelty score.
|
|
|
|
\subsection*{Research question}
|
|
|
|
Given these two approaches to solve the explanation task of above,
|
|
it comes down to performance. At the end of the day the best
|
|
theoretical idea does not help in solving the task if it cannot
|
|
be implemented in a performant way. Miller et al have shown
|
|
some success in using dropout sampling. However, the many forward
|
|
passes during testing for every image seem computationally expensive.
|
|
In comparison a single run through a trained auto-encoder seems
|
|
intuitively to be faster. This leads to the hypothesis (see below).
|
|
|
|
For the purpose of this thesis, I will
|
|
use the work of Miller et al as baseline to compare against.
|
|
They use the SSD\cite{Liu2016} network for object detection,
|
|
modified by added dropout layers, and the SceneNet
|
|
RGB-D\cite{McCormac2017} data set using the MS COCO\cite{Lin2014}
|
|
classes. Instead of dropout sampling my approach will use
|
|
an auto-encoder for novelty detection with all else, like
|
|
using SSD for object detection and the SceneNet RGB-D data set,
|
|
being equal. With respect to auto-encoders a recent implementation
|
|
of an adversarial auto-encoder\cite{Pidhorskyi2018} will be used.
|
|
|
|
\paragraph{Hypothesis} Novelty detection using auto-encoders
|
|
delivers similar or better object detection performance under open-set
|
|
conditions while being less computationally expensive compared to
|
|
dropout sampling.
|
|
|
|
\paragraph{Contribution}
|
|
The contribution of this thesis is a comparison between dropout
|
|
sampling and auto-encoding with respect to the overall performance
|
|
of both for object detection in the open-set conditions using
|
|
the SSD network for object detection and the SceneNet RGB-D data set
|
|
with MS COCO classes.
|
|
|
|
\chapter{Background and Research Plan}
|
|
|
|
This chapter will provide a more in-depth look at the two works
|
|
this thesis is based upon. First, the dropout sampling introduced
|
|
by Miller et al\cite{Miller2018} will be showcased. Afterwards
|
|
the Generative Probabilistic Novelty Detection with Adversarial
|
|
Autoencoders\cite{Pidhorskyi2018} will be presented. The chapter
|
|
will conclude with a more detailed explanation of the intended
|
|
contribution of this thesis.
|
|
|
|
The dropout sampling explanation will follow the paper of Miller et
|
|
al\cite{Miller2018} rather closely including the formulae used
|
|
in their paper.
|
|
|
|
\section{Dropout Sampling}
|
|
|
|
To understand dropout sampling, it is necessary to explain the
|
|
idea of Bayesian neural networks. They place a prior distribution
|
|
over the network weights, for example a Gaussian prior distribution:
|
|
\(\mathbf{W} \sim \mathcal{N}(0, I)\). In this example
|
|
\(\mathbf{W}\) are the weights and \(I\) symbolises that every
|
|
weight is drawn from an independent and identical distribution. The
|
|
training of the network determines a plausible set of weights by
|
|
evaluating the posterior (probability output) over the weights given
|
|
the training data: \(p(\mathbf{W}|\mathbf{T})\). However, this
|
|
evaluation cannot be performed in any reasonable
|
|
time. Therefore approximation techniques are
|
|
required. In those techniques the posterior is fitted with a
|
|
simple distribution \(q^{*}_{\theta}(\mathbf{W})\). The original
|
|
and intractable problem of averaging over all weights in the network
|
|
is replaced with an optimisation task, where the parameters of the
|
|
simple distribution are optimised over\cite{Kendall2017}.
|
|
|
|
\subsubsection*{Dropout variational inference}
|
|
|
|
Kendall and Gal\cite{Kendall2017} showed an approximation for
|
|
classfication and recognition tasks. Dropout variational inference
|
|
is a practical approximation technique by adding dropout layers
|
|
in front of every weight layer and using them also during test
|
|
time to sample from the approximate posterior. Effectively, this
|
|
results in the approximation of the class probability
|
|
\(p(y|\mathcal{I}, \mathbf{T})\) by performing multiple forward
|
|
passes through the network and averaging over the obtained Softmax
|
|
scores \(\mathbf{s}_i\), given an image \(\mathcal{I}\) and the
|
|
training data \(\mathbf{T}\):
|
|
\begin{equation} \label{eq:drop-sampling}
|
|
p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i
|
|
\end{equation}
|
|
|
|
With this dropout sampling technique \(n\) model weights
|
|
\(\widetilde{\mathbf{W}}_i\) are sampled from the posterior
|
|
\(p(\mathbf{W}|\mathbf{T})\). The class probability
|
|
\(p(y|\mathcal{I}, \mathbf{T})\) is a probability vector
|
|
\(\mathbf{q}\) over all class labels. Finally, the uncertainty
|
|
of the network with respect to the classification is given by
|
|
the entropy \(H(\mathbf{q}) = - \sum_i q_i \cdot \log q_i\).
|
|
|
|
\subsubsection*{Dropout sampling for object detection}
|
|
|
|
Miller et al\cite{Miller2018} apply the dropout sampling to
|
|
object detection. In that case \(\mathbf{W}\) represents the
|
|
learned weights of a detection network like SSD\cite{Liu2016}.
|
|
Every forward pass uses a different network
|
|
\(\widetilde{\mathbf{W}}\) which is approximately sampled from
|
|
\(p(\mathbf{W}|\mathbf{T})\). Each forward pass in object
|
|
detection results in a set of detections, each consisting of bounding
|
|
box coordinates \(\mathbf{b}\) and softmax score \(\mathbf{s}\).
|
|
The detections are denoted by Miller et al as \(D_i =
|
|
\{\mathbf{s}_i,\mathbf{b}_i\}\). The detections of all passes are put
|
|
into a large set \(\mathfrak{D} = \{D_1, ..., D_2\}\).
|
|
|
|
All detections with mutual intersection-over-union scores (IoU)
|
|
of \(0.95\) or higher are defined as an observation \(\mathcal{O}_i\).
|
|
Subsequently, the corresponding vector of class probabilities
|
|
\(\mathbf{q}_i\) for the observation is calculated by averaging all
|
|
score vectors \(\mathbf{s}_j\) in a particular observation
|
|
\(\mathcal{O}_i\): \(\mathbf{q}_i \approx \overline{\mathbf{s}}_i = \frac{1}{n} \sum_{j=1}^{n} \mathbf{s}_j\). The label uncertainty
|
|
of the detector for a particular observation is measured by
|
|
the entropy \(H(\mathbf{q}_i) = - \sum_j q_{ij} \cdot \log q_{ij}\).
|
|
|
|
In the introduction I used a very reduced version to describe
|
|
maximum and low uncertainty. A more complete explanation:
|
|
If \(\mathbf{q}_i\), which I called averaged class probabilities,
|
|
resembles a uniform distribution the entropy will be high. A uniform
|
|
distribution means that no class is more likely than another, which
|
|
is a perfect example of maximum uncertainty. Conversely, if
|
|
one class has a very high probability the entropy will be low.
|
|
|
|
In open-set conditions it can be expected that falsely generated
|
|
detections for unknown object classes have a higher label
|
|
uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
|
|
be used to identify and reject these false positive cases.
|
|
|
|
\section{Generative Probabilistic Novelty Detection}
|
|
|
|
% TODO Write about GPND in understandable terms
|
|
|
|
\section{Adversarial Auto-encoder}
|
|
|
|
This section will explain the adversarial auto-encoder used by
|
|
Pidhorskyi et al\cite{Pidhorskyi2018} but in a slightly modified
|
|
form to make it more understandable.
|
|
|
|
The training data points \(x_i \in \mathbb{R}^m \) are the input
|
|
of the auto-encoder. An encoding function \(e: \mathbb{R}^m \rightarrow \mathbb{R}^n\) takes the data points
|
|
and produces a representation \(\overline{z_i} \in \mathbb{R}^n\)
|
|
in a latent space. This latent space is smaller (\(n < m\)) than the
|
|
input which necessitates some form of compression.
|
|
|
|
A second function \(g: \Omega \rightarrow \mathbb{R}^m\) is the
|
|
generator function that takes the latent representation
|
|
\(z_i \in \Omega \subset \mathbb{R}^n\) and generates an output
|
|
\(\overline{x_i}\) as close as possible to the input data
|
|
distribution.
|
|
|
|
What then is the difference between \(\overline{z_i}\) and \(z_i\)?
|
|
With a simple auto-encoder both would be identical. In this case
|
|
of an adversarial auto-encoder it is slightly more complicated.
|
|
There is a discriminator \(D_z\) that tries to distinguish between
|
|
an encoded data point \(\overline{z_i}\) and a \(z_i \sim \mathcal{N}(0,1)\) drawn from a normal distribution with \(0\) mean
|
|
and a standard deviation of \(1\). During training, the encoding
|
|
function \(e\) attempts to minimize any perceivable difference
|
|
between \(z_i\) and \(\overline{z_i}\) while \(D_z\) has the
|
|
aforementioned adversarial task to differentiate between them.
|
|
|
|
Furthermore, there is a discriminator \(D_x\) that has the task
|
|
to differentiate the generated output \(\overline{x_i}\) from the
|
|
actual input \(x_i\). During training, the generator function \(g\)
|
|
tries to minimize the perceivable difference between \(\overline{x_i}\) and \(x_i\) while \(D_x\) has the mentioned
|
|
adversarial task to distinguish between them.
|
|
|
|
With this all components of the adversarial auto-encoder employed
|
|
by Pidhorskyi et al are introduced. Finally, the losses are
|
|
presented. The two adversarial objectives have been mentioned
|
|
already. Specifically, there is the adversarial loss for the
|
|
discriminator \(D_z\):
|
|
\begin{equation} \label{eq:adv-loss-z}
|
|
\mathcal{L}_{adv-d_z}(x,e,D_z) = E[\log (D_z(\mathcal{N}(0,1)))] + E[\log (1 - D_z(e(x)))],
|
|
\end{equation}
|
|
\noindent
|
|
where \(E\) stands for an expected
|
|
value\footnote{a term used in probability theory},
|
|
\(x\) stands for the input, and
|
|
\(\mathcal{N}(0,1)\) represents an element drawn from the specified
|
|
distribution. The encoder \(e\) attempts to minimize this loss while
|
|
the discriminator \(D_z\) intends to maximize it.
|
|
|
|
In the same way the adversarial loss for the discriminator \(D_x\)
|
|
is specified:
|
|
\begin{equation} \label{eq:adv-loss-x}
|
|
\mathcal{L}_{adv-d_x}(x,D_x,g) = E[\log(D_x(x))] + E[\log(1 - D_x(g(\mathcal{N}(0,1))))],
|
|
\end{equation}
|
|
\noindent
|
|
where \(x\), \(E\), and \(\mathcal{N}(0,1)\) have the same meaning
|
|
as before. In this case the generator \(g\) tries to minimize the loss
|
|
while the discriminator \(D_x\) attempts to maximize it.
|
|
|
|
Every auto-encoder requires a reconstruction error to work. This
|
|
error calculates the difference between the original input and
|
|
the generated or decoded output. In this case, the reconstruction
|
|
loss is defined like this:
|
|
\begin{equation} \label{eq:recon-loss}
|
|
\mathcal{L}_{error}(x, e, g) = - E[\log(p(g(e(x)) | x))],
|
|
\end{equation}
|
|
\noindent
|
|
where \(\log(p)\) is the expected log-likelihood and \(x\),
|
|
\(E\), \(e\), and \(g\) have the same meaning as before.
|
|
|
|
All losses combined result in the following formula:
|
|
\begin{equation} \label{eq:full-loss}
|
|
\mathcal{L}(x,e,D_z,D_x,g) = \mathcal{L}_{adv-d_z}(x,e,D_z) + \mathcal{L}_{adv-d_x}(x,D_x,g) + \lambda \mathcal{L}_{error}(x,e,g),
|
|
\end{equation}
|
|
\noindent
|
|
where \(\lambda\) is a parameter used to balance the adversarial
|
|
losses with the reconstruction loss. The model is trained by
|
|
Pidhorskyi et al using the Adam optimizer by doing alternative
|
|
updates of each of the aforementioned components:
|
|
|
|
\begin{itemize}
|
|
\item Maximize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(D_x\);
|
|
\item Minimize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(g\);
|
|
\item Maximize \(\mathcal{L}_{adv-d_z}\) by updating weights of \(D_z\);
|
|
\item Minimize \(\mathcal{L}_{error}\) and \(\mathcal{L}_{adv-d_z}\) by updating weights of \(e\) and \(g\).
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\section{Contribution}
|
|
|
|
\chapter{Thesis as a project}
|
|
|
|
After introducing the topic and the general task ahead, this part of
|
|
the exposé will focus on how to get there. This includes a timetable
|
|
with SMART goals as well as an outline of the software development
|
|
practices used for implementing the code for this thesis.
|
|
|
|
\section{Software Development}
|
|
|
|
Most scientific implementations found on GitHub are not done with
|
|
distribution in mind. They usually require manual cloning of the
|
|
repository, have bad code documentation and don't follow common
|
|
coding standards. This is bad enough by itself but becomes a real
|
|
nuisance if you want to use those implementations in your own
|
|
code. As they are not marked up as Python packages, using them
|
|
usually requires manual workarounds to make them usable as library
|
|
code, for example, in a Python package.
|
|
|
|
The code of this thesis will be developed from the start inside
|
|
a Python package structure which will make it easy to include
|
|
it later on as dependency for other work. After the thesis
|
|
has been graded the package will be uploaded to the PyPi package
|
|
repository and the corresponding Git repository will be made
|
|
publicly available.
|
|
Any required third party implementations, like the SSD implementation
|
|
for Keras, which are not already available as Python packages will
|
|
be included as library code according to their respective licences.
|
|
|
|
A large chunk of the code will be written as library-ready code
|
|
that can be used in other applications. Only a small part will
|
|
provide the interface to the library code. The specifics of the
|
|
interface cannot be predicted ahead of time but it will certainly
|
|
include a properly documented CLI as that will be necessary for
|
|
the work of the thesis itself.
|
|
|
|
Tensorflow will be used as the deep learning framework. To make
|
|
the code future-proof, the eager execution mode will be used as it
|
|
is the default for Tensorflow
|
|
2.0\footnote{\url{https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8}}.
|
|
|
|
\section{Stretch Goals}
|
|
|
|
There are a number of goals that are not tightly included in the
|
|
following timetable. Those are optional addons that are nice-to-have
|
|
but not critical for successful completion of the thesis.
|
|
|
|
\begin{itemize}
|
|
\item make own approach work on the YCB-Video data
|
|
set\cite{Xiang2017}
|
|
\item test dropout sampling and own approach on data set
|
|
self-recorded with a robot arm and mounted Kinect
|
|
\item provide GUI to select freely an image to be classified by
|
|
the trained model and see visualization of result
|
|
\end{itemize}
|
|
|
|
\section{Timetable}
|
|
|
|
This timetable is structured by milestones that I want to
|
|
achieve. Every milestone has the related tasks grouped beneath it.
|
|
The scheduling is done with respect to my full personal calendar
|
|
and will only account Monday through Friday at most. Weekends will
|
|
not be scheduled work time for the thesis. This allows for
|
|
some additional unreliable emergency buffer in the end if things do
|
|
not proceed as planned. Furthermore I will only be able to
|
|
regularly plan the time between 11 am and 5 pm for working on the
|
|
thesis as the evenings are mostly full and regardless of that
|
|
fact I do want to reserve free time.
|
|
|
|
\paragraph{Main tasks}
|
|
Everything but the stretch goals are non-optional which makes
|
|
the term "main task" rather difficult to grasp. The term implies
|
|
that all other tasks are nice-to-have but not required. Therefore,
|
|
I have chosen to use milestones instead as the highest grouping
|
|
level.
|
|
|
|
\subsection*{Milestones}
|
|
|
|
The detailed timetable starts in the next subsection. A summary
|
|
of the timetable regarding the milestones is presented here.
|
|
|
|
\begin{enumerate}
|
|
\item Environment set up: Due date 20th March
|
|
\item Fine-tuned SSD on SceneNet RGB-D: Due date 5th April
|
|
\item Fine-tuned GPND on SceneNet RGB-D: Due date 12th April
|
|
\item Networks evaluated: Due date 10th May
|
|
\item Visualizations created: Due date 31st May
|
|
\item Stretch Goals/Buffer: Due date 27th June
|
|
\item Thesis writing: Due date 30th August
|
|
\item Finishing touches: Due date 13th September
|
|
\end{enumerate}
|
|
|
|
\subsection*{Environment set up}
|
|
|
|
\textbf{Due date:} 20th March
|
|
|
|
\begin{description}
|
|
\item[Download SceneNet RGB-D to cvpc\{7,8\} computer] \hfill \\
|
|
Requires external resource.
|
|
\end{description}
|
|
|
|
\subsection*{Fine-tuned SSD on SceneNet RGB-D}
|
|
|
|
\textbf{Due date:} 5th April
|
|
|
|
\begin{description}
|
|
\item[Download pre-trained weights of SSD for MS COCO] \hfill \\
|
|
This is trivial. Takes not more than two hours.
|
|
\item[Modify SSD Keras implementation to work inside masterthesis package] \hfill \\
|
|
Should be possible to achieve within one day.
|
|
\item[Implement integration of SSD into masterthesis package] \hfill \\
|
|
Implementing the glue code between the git submodule and
|
|
my own code. Should be doable within one day.
|
|
\item[Group SceneNet RGB-D classes to MS COCO classes] \hfill \\
|
|
SceneNet contains more classes than COCO. Miller et al have
|
|
grouped, for example, various chair classes in SceneNet into
|
|
one chair class of COCO. This grouping involves researching
|
|
the 80 classes of COCO and finding all related SceneNet
|
|
classes and then writing a mapper between them.
|
|
|
|
All in all this could take up a full day and perhaps slip
|
|
into a second one.
|
|
\item[Implement variant of SSD with dropout layers (Bayesian SSD)] \hfill \\
|
|
This is a rather trivial task as it only involves adding two
|
|
Keras dropout layers into SSD. Can be done in one hour.
|
|
\item[Fine-tune vanilla SSD on SceneNet RGB-D] \hfill \\
|
|
Requires external resource and length of required training is
|
|
unknown. Due to two unknown factors (availability of resource,
|
|
and length of training) this task can be considered a project
|
|
risk.
|
|
\item[Fine-tune Bayesian SSD on SceneNet RGB-D] \hfill \\
|
|
Similar remarks like the previous task.
|
|
\end{description}
|
|
|
|
The tasks prior to the training could be achievable by the 21st
|
|
March if work starts on the 18th. Buffer time will go to the 25th
|
|
of March. Training is scheduled to commence as early as possible
|
|
but no later than the 26th of March.
|
|
|
|
Since the SSD network is a proven one, I am confident that this
|
|
milestone can be reached and the time between 26th of March and
|
|
5th April should provide more than enough time for training.
|
|
Once training has started, I can work on tasks from other milestones
|
|
so that the training time is used as efficiently as possible.
|
|
|
|
\subsection*{Fine-tuned GPND on SceneNet RGB-D}
|
|
|
|
\textbf{Due date:} 12th April
|
|
|
|
\begin{description}
|
|
\item[Adapt GPND implementation for SceneNet RGB-D using COCO classes] \hfill \\
|
|
Requires research to figure out the exact architecture needed
|
|
for a different data set. The code is not well
|
|
documented and some logical variables like image size are
|
|
sometimes hard-coded, which makes this adaption difficult
|
|
and error-prone.
|
|
Furthermore, some trial-and-error regarding training successes
|
|
is likely needed, which makes this task a project risk.
|
|
If the needed architecture was known the time to implement
|
|
it would be at most one day. The uncertainty therefore
|
|
lies with the research part.
|
|
\item[Implement novelty score calculation for GPND] \hfill \\
|
|
There is an implementation for this in the original
|
|
author's implementation. It would have to be ported
|
|
to Tensorflow and integrated into the package structure.
|
|
Takes likely one day or two.
|
|
\item[Apply insights of GAN stability to GPND implementation] \hfill \\
|
|
The insights from the GAN stability\footnote{\url{https://avg.is.tuebingen.mpg.de/publications/meschedericml2018}} research should be applied to
|
|
my GPND implementation. Requires research what, if any,
|
|
insights can be used for this thesis. The research is
|
|
doable within one day and the application of it
|
|
within another.
|
|
\item[Train GPND on SceneNet RGB-D] \hfill \\
|
|
Requires external resource.
|
|
In contrast to the SSD network, there are no pre-trained
|
|
weights available for the GPND. Therefore it has to be
|
|
trained from scratch. Furthermore, it will have to be
|
|
trained for every class separately, which prolongs the
|
|
training even further. This task than be classified as
|
|
project risk.
|
|
\end{description}
|
|
|
|
I will only be able to start working on these tasks on April 1st.
|
|
Assuming that the research in the first task goes well, I will
|
|
be able to finish the preparatory work on April 5th. Training
|
|
could start as early as April 5th. The seven days to the due date
|
|
April 12th are tight and maybe it takes longer but this is
|
|
the aggressive date I will work towards.
|
|
|
|
\subsection*{Networks evaluated}
|
|
|
|
\textbf{Due date:} 10th May
|
|
|
|
\begin{description}
|
|
\item[Implement evaluation pipeline for vanilla SSD] \hfill \\
|
|
Involves the implementation of the evaluation steps
|
|
according to the chosen metrics. Takes likely two days.
|
|
\item[Implement evaluation pipeline for Bayesian SSD] \hfill \\
|
|
Involves the implementation of the evaluation steps
|
|
for the Bayesian variant. As more is has to be done,
|
|
it will likely take three days.
|
|
\item[Implement evaluation pipeline for SSD with GPND for novelty score] \hfill \\
|
|
Implementation of the evaluation steps for my approach.
|
|
It will probably take two days.
|
|
\item[Run vanilla SSD on test data] \hfill \\
|
|
The trained network is run on the test data and the results
|
|
are stored. Requires external resource but should be
|
|
far quicker than the training and will probably be done
|
|
in two days at most.
|
|
\item[Run Bayesian SSD on test data] \hfill \\
|
|
Similar remarks to previous task.
|
|
\item[Run vanilla SSD detections through GPND] \hfill \\
|
|
For my approach the SSD detections need to be run through
|
|
the GPND to have all the relevant data for evaluation.
|
|
Requires external resource. Will take likely two days.
|
|
\item[Calculate evaluation metrics for vanilla SSD] \hfill \\
|
|
Takes one day.
|
|
\item[Calculate evaluation metrics for Bayesian SSD] \hfill \\
|
|
Takes one day.
|
|
\item[Calculate evaluation metrics for vanilla SSD with GPND] \hfill \\
|
|
Takes one day
|
|
\end{description}
|
|
|
|
If I can start on April 15th with the preparatory work it should be
|
|
done by April 23rd. The testing runs can begin as early as April 24th
|
|
and should finish around April 30th. This leaves the week from
|
|
May 6th up to the due date to finish the calculations, which can
|
|
happen on the CPU as all the data is already there by then.
|
|
|
|
\subsection*{Visualizations created}
|
|
|
|
\textbf{Due date:} 31st May\\
|
|
|
|
I won't be able to work on the thesis between May 13th and May 26th
|
|
due to the election campaign. I am involved into the campaign already
|
|
as of this writing but I hope that up until May 10th both thesis
|
|
and campaign can somewhat co-exist.
|
|
The visualizations should be creatable within one week from May 27th
|
|
to May 31st.
|
|
|
|
\subsection*{Stretch goals}
|
|
|
|
\textbf{Due date:} 27th June\\
|
|
|
|
As I mentioned earlier, there are no specific tasks for the
|
|
stretch goals. If the critical path is finished by the end
|
|
of May as planned then the month of June is available for
|
|
stretch goals. If the critical path is not finished then
|
|
June serves as a buffer zone to prevent spillover into the
|
|
writing period.
|
|
|
|
\subsection*{Thesis writing}
|
|
|
|
\textbf{Due date:} 30th August\\
|
|
|
|
A first complete draft of the thesis should be finished
|
|
at the latest by August 16th. The following week I am not
|
|
able to work on the thesis but it can be used for feedback.
|
|
The last week of August should allow for polishing of the
|
|
thesis with a submission-ready candidate by August 30th.
|
|
|
|
\subsection*{Finishing touches}
|
|
|
|
\textbf{Due date:} 13th September\\
|
|
|
|
The submission requires three printed copies of the thesis,
|
|
together with any digital components on a CD glued to the back
|
|
of the thesis. A non-editable CD ensures that the code submitted
|
|
cannot be modified and will be exactly as submitted when reviewed.
|
|
I will use these two weeks to print the copies and to make
|
|
last publication steps for the code like improving the code
|
|
documentation and adding usage examples.
|
|
|
|
\subsection*{Colloquium}
|
|
|
|
Last but not least is the colloquium which will probably take
|
|
place within the second half of September. I will prepare
|
|
a presentation for the colloquium in the time before such date.
|
|
|
|
\section{Project Risks}
|
|
|
|
In this section other project risks will be listed in
|
|
addition to those indicated in the timetable section.
|
|
|
|
The workload for the election campaign, in which I have an
|
|
organizational responsibility in addition to being a candidate
|
|
myself, could come into conflict with the progress of the
|
|
thesis.
|
|
|
|
Availability of the external resource can hinder the progress
|
|
and delay steps of the thesis. In such a case dependent
|
|
tasks cannot commence until the earlier task has been finished,
|
|
resulting in an overall delay of the thesis.
|
|
|
|
To deal with these risks, I have planned for one whole month
|
|
of buffer time that can account for many delays. Furthermore,
|
|
the writing time is intentionally that long as it is difficult
|
|
to predict how inspired I will be. I know from my bachelor
|
|
thesis that on some days it can be many pages you write and
|
|
on others you might barely make one page progress.
|
|
|
|
I would argue that the thesis success is largely dependent on
|
|
the first part of the work as it can make or break it. Once
|
|
the technical part is done, the way forward should be downhill.
|