Finished exposé except for timetable
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
dee2a83e9a
commit
9dd9b49ab1
194
body_expose.tex
194
body_expose.tex
|
@ -2,12 +2,192 @@
|
||||||
|
|
||||||
\chapter{Introduction}
|
\chapter{Introduction}
|
||||||
|
|
||||||
\textcolor{uhhRed}{Hallo}
|
Famous examples like the automatic soap dispenser which does not
|
||||||
|
recognize the hand of a black person but dispenses soap when presented
|
||||||
|
with a paper towel raise the question of bias in computer
|
||||||
|
systems\cite{Friedman1996}. Related to this ethical question regarding
|
||||||
|
the design of so called algorithms, a term often used in public
|
||||||
|
discourse for applied neural networks, is the question of
|
||||||
|
algorithmic accountability\cite{Diakopoulos2014}.
|
||||||
|
|
||||||
\chapter{Conclusion}
|
The charme of supervised neural networks, that they can learn from
|
||||||
|
input-output relations and figure out by themselves what connections
|
||||||
|
are necessary for that, is also their achilles heel. This feature
|
||||||
|
makes them effectively black boxes. It is possible to question the
|
||||||
|
training environment, like potential biases inside the data sets, or
|
||||||
|
the engineers constructing the networks but it is not really possible
|
||||||
|
to question the internal calculations made by a network. On the one
|
||||||
|
hand, one might argue, it is only math and nothing magical that
|
||||||
|
happens inside these networks. Clearly it is possible, albeit a chore,
|
||||||
|
to manually follow the calculations of any given trained network.
|
||||||
|
After all it is executed on a computer and at the lowest level only
|
||||||
|
uses basic math that does not differ between humans and computers. On
|
||||||
|
the other hand not everyone is capable of doing so and more
|
||||||
|
importantly it does not reveal any answers to questions of causality.
|
||||||
|
|
||||||
% test references
|
However, these questions of causility are of enormous consequence when
|
||||||
\cite{Wu2019}
|
neural networks are used, for example, in predictive policing. Is a
|
||||||
\cite{Gal2016}
|
correlation, a coincidence, enough to bring forth negative consequences
|
||||||
\cite{Ilg2018}
|
for a particular person? And if so, what is the possible defence
|
||||||
\cite{Geifman2018}
|
against math? Similar questions can be raised when looking at computer
|
||||||
|
vision networks that might be used together with so called smart
|
||||||
|
CCTV cameras, for example, like those tested at the train station
|
||||||
|
Berlin Südkreuz. What if a network implies you committed suspicious
|
||||||
|
behaviour?
|
||||||
|
|
||||||
|
This leads to the need for neural networks to explain their results.
|
||||||
|
Such an explanation must come from the network or an attached piece
|
||||||
|
of technology to allow adoption in mass. Obviously this setting
|
||||||
|
poses the question, how such an endeavour can be achieved.
|
||||||
|
|
||||||
|
For neural networks there are fundamentally two type of tasks:
|
||||||
|
regression, and classification. Regression deals with any case
|
||||||
|
where the goal for the network is to come close to an ideal
|
||||||
|
function that connects all data points. Classification, however,
|
||||||
|
describes tasks where the network is supposed to identify the
|
||||||
|
class of any given input. In this thesis, I will focus on
|
||||||
|
classification.
|
||||||
|
|
||||||
|
More specifically, I will look at object detection in the open-set
|
||||||
|
conditions. In non-technical words this effectively describes
|
||||||
|
the kind of situation you encounter with CCTV cameras or robots
|
||||||
|
outside of a laboratory. Both use cameras that, well, record
|
||||||
|
images. Subsequently a neural network analyses the image
|
||||||
|
and returns a list of detected and classified objects that it
|
||||||
|
found in the image. The problem here is that networks can only
|
||||||
|
classify what they know. If presented with an object type that
|
||||||
|
the network was not trained with, as happens frequently in real
|
||||||
|
environments, it will still classify the object and might even
|
||||||
|
have a high confidence in doing so. Such an example would be
|
||||||
|
a false positive. Any ordinary person who uses the results of
|
||||||
|
such a network would falsely assume that a high confidence always
|
||||||
|
means the classification is very likely correct. If they use
|
||||||
|
a proprietary system they might not even be able to find out
|
||||||
|
that the network was never trained on a particular type of object.
|
||||||
|
Therefore it would be impossible for them to identify the output
|
||||||
|
of the network as false positive.
|
||||||
|
|
||||||
|
This goes back to the need for automatic explanation. Such a system
|
||||||
|
should by itself recognize that the given object is unknown and
|
||||||
|
hence mark any classification result of the network as meaningless.
|
||||||
|
Technically there are two slightly different things that deal
|
||||||
|
with this type of task: model uncertainty, and novelty detection.
|
||||||
|
|
||||||
|
Model uncertainty can be measured with dropout sampling.
|
||||||
|
Dropout is usually used only during training but
|
||||||
|
Miller et al\cite{Miller2018} use them also during testing
|
||||||
|
to achieve different results for the same image making use of
|
||||||
|
multiple forward passes. The output scores for the forward passes
|
||||||
|
of the same image are then averaged. If the averaged class
|
||||||
|
probabilities resemble a uniform distribution (every class has
|
||||||
|
the same probability) this symbolises maximum uncertainty. Conversely,
|
||||||
|
if there is one very high probability with every other being very
|
||||||
|
low this signifies a low uncertainty. An unknown object is more
|
||||||
|
likely to cause high uncertainty which allows for an identification
|
||||||
|
of false positive cases.
|
||||||
|
|
||||||
|
Novelty detection is the more direct approach to solve the task.
|
||||||
|
In the realm of neural networks it is usually done with the help of
|
||||||
|
auto-encoders that essentially solve a regression task of finding an
|
||||||
|
identity function that reconstructs on the output the given
|
||||||
|
input\cite{Pimentel2014}. Auto-encoders have
|
||||||
|
internally at least two components: an encoder, and a decoder or
|
||||||
|
generator. The job of the encoder is to find an encoding that
|
||||||
|
compresses the input as good as possible while simultaneously
|
||||||
|
being as loss-free as possible. The decoder takes this latent
|
||||||
|
representation of the input and has to find a decompression
|
||||||
|
that reconstructs the input as accurate as possible. During
|
||||||
|
training these auto-encoders learn to reproduce a certain group
|
||||||
|
of object classes. The actual novelty detection takes place
|
||||||
|
during testing. Given an image, and the output and loss of the
|
||||||
|
auto-encoder, a novelty score is calculated. A low novelty
|
||||||
|
score signals a known object. The opposite is true for a high
|
||||||
|
novelty score.
|
||||||
|
|
||||||
|
Given these two approaches to solve the explanation task of above,
|
||||||
|
it comes down to performance. At the end of the day the best
|
||||||
|
theoretical idea does not help in solving the task if it cannot
|
||||||
|
be implemented in a performant way. Miller et al have shown
|
||||||
|
some success in using dropout sampling. However, the many forward
|
||||||
|
passes during testing for every image seem computationally expensive.
|
||||||
|
In comparison a single run through a trained auto-encoder seems
|
||||||
|
intuitively to be faster. This leads to the following hypothesis:
|
||||||
|
\emph{Novelty detection using auto-encoders delivers similar or better
|
||||||
|
object detection performance under open-set conditions while
|
||||||
|
being less computationally expensive compared to dropout sampling}.
|
||||||
|
|
||||||
|
For the purpose of this thesis, I will
|
||||||
|
use the work of Miller et al as baseline to compare against.
|
||||||
|
They use the SSD\cite{Liu2016} network for object detection,
|
||||||
|
modified by added dropout layers, and the SceneNet
|
||||||
|
RGB-D\cite{McCormac2017} data set using the MS COCO\cite{Lin2014}
|
||||||
|
classes. Instead of dropout sampling my approach will use
|
||||||
|
an auto-encoder for novelty detection with all else, like
|
||||||
|
using SSD for object detection and the SceneNet RGB-D data set,
|
||||||
|
being equal. With respect to auto-encoders a recent implementation
|
||||||
|
of an adversarial auto-encoder\cite{Pidhorskyi2018} will be used.
|
||||||
|
|
||||||
|
The contribution of this thesis is a comparison between dropout
|
||||||
|
sampling and auto-encoding with respect to the overall performance
|
||||||
|
of both for object detection in the open-set conditions using
|
||||||
|
the SSD network for object detection and the SceneNet RGB-D data set
|
||||||
|
with MS COCO classes.
|
||||||
|
|
||||||
|
\chapter{Thesis as a project}
|
||||||
|
|
||||||
|
After introducing the topic and the general task ahead, this part of
|
||||||
|
the exposé will focus on how to get there. This includes a timetable
|
||||||
|
with SMART goals as well as an outline of the software development
|
||||||
|
practices used for implementing the code for this thesis.
|
||||||
|
|
||||||
|
\section{Software Development}
|
||||||
|
|
||||||
|
Most scientific implementations found on GitHub are not done with
|
||||||
|
distribution in mind. They usually require manual cloning of the
|
||||||
|
repository, have bad code documentation and don't follow common
|
||||||
|
coding standards. This is bad enough by itself but becomes a real
|
||||||
|
nuisance if you want to use those implementations in your own
|
||||||
|
code. As they are not marked up as Python packages, using them
|
||||||
|
usually requires manual workarounds to make them usable as library
|
||||||
|
code, for example, in a Python package.
|
||||||
|
|
||||||
|
The code of this thesis will be developed from the start inside
|
||||||
|
a Python package structure which will make it easy to include
|
||||||
|
it later on as dependency for other work. After the thesis
|
||||||
|
has been graded the package will be uploaded to the PyPi package
|
||||||
|
repository and the corresponding Git repository will be made
|
||||||
|
publicly available.
|
||||||
|
Any required third party implementations, like the SSD implementation
|
||||||
|
for Keras, which are not already available as Python packages will
|
||||||
|
be included as library code according to their respective licences.
|
||||||
|
|
||||||
|
A large chunk of the code will be written as library-ready code
|
||||||
|
that can be used in other applications. Only a small part will
|
||||||
|
provide the interface to the library code. The specifics of the
|
||||||
|
interface cannot be predicted ahead of time but it will certainly
|
||||||
|
include a properly documented CLI as that will be necessary for
|
||||||
|
the work of the thesis itself.
|
||||||
|
|
||||||
|
Tensorflow will be used as the deep learning framework. To make
|
||||||
|
the code future-proof, the eager execution mode will be used as it
|
||||||
|
is the default for Tensorflow
|
||||||
|
2.0\footnote{\url{https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8}}.
|
||||||
|
|
||||||
|
\section{Stretch Goals}
|
||||||
|
|
||||||
|
There are a number of goals that are not tightly included in the
|
||||||
|
following timetable. Those are optional addons that are nice-to-have
|
||||||
|
but not critical for successful completion of the thesis.
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item make own approach work on the YCB-Video data
|
||||||
|
set\cite{Xiang2017}
|
||||||
|
\item test dropout sampling and own approach on data set
|
||||||
|
self-recorded with a robot arm and mounted Kinect
|
||||||
|
\item provide GUI to select freely an image to be classified by
|
||||||
|
the trained model and see visualization of result
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\section{Timetable}
|
||||||
|
|
||||||
|
% TODO
|
||||||
|
|
Loading…
Reference in New Issue