masterthesis-latex/body.tex

146 lines
6.7 KiB
TeX
Raw Normal View History

% body thesis file that contains the actual content
\chapter{Introduction}
\subsection*{Motivation}
Famous examples like the automatic soap dispenser which does not
recognize the hand of a black person but dispenses soap when presented
with a paper towel raise the question of bias in computer
systems~\cite{Friedman1996}. Related to this ethical question regarding
the design of so called algorithms is the question of
algorithmic accountability~\cite{Diakopoulos2014}.
Supervised neural networks learn from input-output relations and
figure out by themselves what connections are necessary for that.
This feature is also their Achilles heel: it makes them effectively
black boxes and prevents any answers to questions of causality.
However, these questions of causility are of enormous consequence when
results of neural networks are used to make life changing decisions:
Is a correlation enough to bring forth negative consequences
for a particular person? And if so, what is the possible defence
against math? Similar questions can be raised when looking at computer
vision networks that might be used together with so called smart
CCTV cameras to discover suspicious activity.
This leads to the need for neural networks to explain their results.
Such an explanation must come from the network or an attached piece
of technology to allow adoption in mass. Obviously this setting
poses the question, how such an endeavour can be achieved.
For neural networks there are fundamentally two type of tasks:
regression and classification. Regression deals with any case
where the goal for the network is to come close to an ideal
function that connects all data points. Classification, however,
describes tasks where the network is supposed to identify the
class of any given input. In this thesis, I will focus on
classification.
\subsection*{Object Detection in Open Set Conditions}
More specifically, I will look at object detection in the open set
conditions. In non-technical words this effectively describes
the kind of situation you encounter with CCTV cameras or robots
outside of a laboratory. Both use cameras that record
images. Subsequently a neural network analyses the image
and returns a list of detected and classified objects that it
found in the image. The problem here is that networks can only
classify what they know. If presented with an object type that
the network was not trained with, as happens frequently in real
environments, it will still classify the object and might even
have a high confidence in doing so. Such an example would be
a false positive. Any ordinary person who uses the results of
such a network would falsely assume that a high confidence always
means the classification is very likely correct. If they use
a proprietary system they might not even be able to find out
that the network was never trained on a particular type of object.
Therefore it would be impossible for them to identify the output
of the network as false positive.
This goes back to the need for automatic explanation. Such a system
should by itself recognize that the given object is unknown and
hence mark any classification result of the network as meaningless.
Technically there are two slightly different things that deal
with this type of task: model uncertainty and novelty detection.
Model uncertainty can be measured with dropout sampling.
Dropout is usually used only during training but
Miller et al.~\cite{Miller2018} use them also during testing
to achieve different results for the same image making use of
multiple forward passes. The output scores for the forward passes
of the same image are then averaged. If the averaged class
probabilities resemble a uniform distribution (every class has
the same probability) this symbolises maximum uncertainty. Conversely,
if there is one very high probability with every other being very
low this signifies a low uncertainty. An unknown object is more
likely to cause high uncertainty which allows for an identification
of false positive cases.
Novelty detection is the more direct approach to solve the task.
In the realm of neural networks it is usually done with the help of
auto-encoders that essentially solve a regression task of finding an
identity function that reconstructs on the output the given
input~\cite{Pimentel2014}. Auto-encoders have
internally at least two components: an encoder, and a decoder or
generator. The job of the encoder is to find an encoding that
compresses the input as good as possible while simultaneously
being as loss-free as possible. The decoder takes this latent
representation of the input and has to find a decompression
that reconstructs the input as accurate as possible. During
training these auto-encoders learn to reproduce a certain group
of object classes. The actual novelty detection takes place
during testing: Given an image, and the output and loss of the
auto-encoder, a novelty score is calculated. A low novelty
score signals a known object. The opposite is true for a high
novelty score.
\subsection*{Research Question}
Both presented approaches describe one way to solve the aforementioned
problem of explanation. They can be differentiated by measuring
their performance: the best theoretical idea is useless if it does
not perform well. Miller et al. have shown
some success in using dropout sampling. However, the many forward
passes during testing for every image seem computationally expensive.
In comparison a single run through a trained auto-encoder seems
intuitively to be faster. This leads to the hypothesis (see below).
For the purpose of this thesis, I will
use the work of Miller et al. as baseline to compare against.
They use the SSD~\cite{Liu2016} network for object detection,
modified by added dropout layers, and the SceneNet
RGB-D~\cite{McCormac2017} data set using the MS COCO~\cite{Lin2014}
classes. I will use a simple implementation of an auto-encoder and
novelty detection to compare with the work of Miller et al.
SSD for the object detection and SceneNet RGB-D as the data
set are used for both approaches.
\paragraph{Hypothesis} Novelty detection using auto-encoders
delivers similar or better object detection performance under open set
conditions while being less computationally expensive compared to
dropout sampling.
\paragraph{Contribution}
The contribution of this thesis is a comparison between dropout
sampling and auto-encoding with respect to the overall performance
of both for object detection in the open set conditions using
the SSD network for object detection and the SceneNet RGB-D data set
with MS COCO classes.
\chapter{Background and Contribution}
\chapter{Methods}
\section{Design of Source Code}
\section{Preparation of data sets}
\section{Replication of Miller et al.}
\chapter{Results}
\chapter{Discussion}
\chapter{Closing}