Finished exposé except for timetable
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
dee2a83e9a
commit
9dd9b49ab1
194
body_expose.tex
194
body_expose.tex
|
@ -2,12 +2,192 @@
|
|||
|
||||
\chapter{Introduction}
|
||||
|
||||
\textcolor{uhhRed}{Hallo}
|
||||
Famous examples like the automatic soap dispenser which does not
|
||||
recognize the hand of a black person but dispenses soap when presented
|
||||
with a paper towel raise the question of bias in computer
|
||||
systems\cite{Friedman1996}. Related to this ethical question regarding
|
||||
the design of so called algorithms, a term often used in public
|
||||
discourse for applied neural networks, is the question of
|
||||
algorithmic accountability\cite{Diakopoulos2014}.
|
||||
|
||||
\chapter{Conclusion}
|
||||
The charme of supervised neural networks, that they can learn from
|
||||
input-output relations and figure out by themselves what connections
|
||||
are necessary for that, is also their achilles heel. This feature
|
||||
makes them effectively black boxes. It is possible to question the
|
||||
training environment, like potential biases inside the data sets, or
|
||||
the engineers constructing the networks but it is not really possible
|
||||
to question the internal calculations made by a network. On the one
|
||||
hand, one might argue, it is only math and nothing magical that
|
||||
happens inside these networks. Clearly it is possible, albeit a chore,
|
||||
to manually follow the calculations of any given trained network.
|
||||
After all it is executed on a computer and at the lowest level only
|
||||
uses basic math that does not differ between humans and computers. On
|
||||
the other hand not everyone is capable of doing so and more
|
||||
importantly it does not reveal any answers to questions of causality.
|
||||
|
||||
% test references
|
||||
\cite{Wu2019}
|
||||
\cite{Gal2016}
|
||||
\cite{Ilg2018}
|
||||
\cite{Geifman2018}
|
||||
However, these questions of causility are of enormous consequence when
|
||||
neural networks are used, for example, in predictive policing. Is a
|
||||
correlation, a coincidence, enough to bring forth negative consequences
|
||||
for a particular person? And if so, what is the possible defence
|
||||
against math? Similar questions can be raised when looking at computer
|
||||
vision networks that might be used together with so called smart
|
||||
CCTV cameras, for example, like those tested at the train station
|
||||
Berlin Südkreuz. What if a network implies you committed suspicious
|
||||
behaviour?
|
||||
|
||||
This leads to the need for neural networks to explain their results.
|
||||
Such an explanation must come from the network or an attached piece
|
||||
of technology to allow adoption in mass. Obviously this setting
|
||||
poses the question, how such an endeavour can be achieved.
|
||||
|
||||
For neural networks there are fundamentally two type of tasks:
|
||||
regression, and classification. Regression deals with any case
|
||||
where the goal for the network is to come close to an ideal
|
||||
function that connects all data points. Classification, however,
|
||||
describes tasks where the network is supposed to identify the
|
||||
class of any given input. In this thesis, I will focus on
|
||||
classification.
|
||||
|
||||
More specifically, I will look at object detection in the open-set
|
||||
conditions. In non-technical words this effectively describes
|
||||
the kind of situation you encounter with CCTV cameras or robots
|
||||
outside of a laboratory. Both use cameras that, well, record
|
||||
images. Subsequently a neural network analyses the image
|
||||
and returns a list of detected and classified objects that it
|
||||
found in the image. The problem here is that networks can only
|
||||
classify what they know. If presented with an object type that
|
||||
the network was not trained with, as happens frequently in real
|
||||
environments, it will still classify the object and might even
|
||||
have a high confidence in doing so. Such an example would be
|
||||
a false positive. Any ordinary person who uses the results of
|
||||
such a network would falsely assume that a high confidence always
|
||||
means the classification is very likely correct. If they use
|
||||
a proprietary system they might not even be able to find out
|
||||
that the network was never trained on a particular type of object.
|
||||
Therefore it would be impossible for them to identify the output
|
||||
of the network as false positive.
|
||||
|
||||
This goes back to the need for automatic explanation. Such a system
|
||||
should by itself recognize that the given object is unknown and
|
||||
hence mark any classification result of the network as meaningless.
|
||||
Technically there are two slightly different things that deal
|
||||
with this type of task: model uncertainty, and novelty detection.
|
||||
|
||||
Model uncertainty can be measured with dropout sampling.
|
||||
Dropout is usually used only during training but
|
||||
Miller et al\cite{Miller2018} use them also during testing
|
||||
to achieve different results for the same image making use of
|
||||
multiple forward passes. The output scores for the forward passes
|
||||
of the same image are then averaged. If the averaged class
|
||||
probabilities resemble a uniform distribution (every class has
|
||||
the same probability) this symbolises maximum uncertainty. Conversely,
|
||||
if there is one very high probability with every other being very
|
||||
low this signifies a low uncertainty. An unknown object is more
|
||||
likely to cause high uncertainty which allows for an identification
|
||||
of false positive cases.
|
||||
|
||||
Novelty detection is the more direct approach to solve the task.
|
||||
In the realm of neural networks it is usually done with the help of
|
||||
auto-encoders that essentially solve a regression task of finding an
|
||||
identity function that reconstructs on the output the given
|
||||
input\cite{Pimentel2014}. Auto-encoders have
|
||||
internally at least two components: an encoder, and a decoder or
|
||||
generator. The job of the encoder is to find an encoding that
|
||||
compresses the input as good as possible while simultaneously
|
||||
being as loss-free as possible. The decoder takes this latent
|
||||
representation of the input and has to find a decompression
|
||||
that reconstructs the input as accurate as possible. During
|
||||
training these auto-encoders learn to reproduce a certain group
|
||||
of object classes. The actual novelty detection takes place
|
||||
during testing. Given an image, and the output and loss of the
|
||||
auto-encoder, a novelty score is calculated. A low novelty
|
||||
score signals a known object. The opposite is true for a high
|
||||
novelty score.
|
||||
|
||||
Given these two approaches to solve the explanation task of above,
|
||||
it comes down to performance. At the end of the day the best
|
||||
theoretical idea does not help in solving the task if it cannot
|
||||
be implemented in a performant way. Miller et al have shown
|
||||
some success in using dropout sampling. However, the many forward
|
||||
passes during testing for every image seem computationally expensive.
|
||||
In comparison a single run through a trained auto-encoder seems
|
||||
intuitively to be faster. This leads to the following hypothesis:
|
||||
\emph{Novelty detection using auto-encoders delivers similar or better
|
||||
object detection performance under open-set conditions while
|
||||
being less computationally expensive compared to dropout sampling}.
|
||||
|
||||
For the purpose of this thesis, I will
|
||||
use the work of Miller et al as baseline to compare against.
|
||||
They use the SSD\cite{Liu2016} network for object detection,
|
||||
modified by added dropout layers, and the SceneNet
|
||||
RGB-D\cite{McCormac2017} data set using the MS COCO\cite{Lin2014}
|
||||
classes. Instead of dropout sampling my approach will use
|
||||
an auto-encoder for novelty detection with all else, like
|
||||
using SSD for object detection and the SceneNet RGB-D data set,
|
||||
being equal. With respect to auto-encoders a recent implementation
|
||||
of an adversarial auto-encoder\cite{Pidhorskyi2018} will be used.
|
||||
|
||||
The contribution of this thesis is a comparison between dropout
|
||||
sampling and auto-encoding with respect to the overall performance
|
||||
of both for object detection in the open-set conditions using
|
||||
the SSD network for object detection and the SceneNet RGB-D data set
|
||||
with MS COCO classes.
|
||||
|
||||
\chapter{Thesis as a project}
|
||||
|
||||
After introducing the topic and the general task ahead, this part of
|
||||
the exposé will focus on how to get there. This includes a timetable
|
||||
with SMART goals as well as an outline of the software development
|
||||
practices used for implementing the code for this thesis.
|
||||
|
||||
\section{Software Development}
|
||||
|
||||
Most scientific implementations found on GitHub are not done with
|
||||
distribution in mind. They usually require manual cloning of the
|
||||
repository, have bad code documentation and don't follow common
|
||||
coding standards. This is bad enough by itself but becomes a real
|
||||
nuisance if you want to use those implementations in your own
|
||||
code. As they are not marked up as Python packages, using them
|
||||
usually requires manual workarounds to make them usable as library
|
||||
code, for example, in a Python package.
|
||||
|
||||
The code of this thesis will be developed from the start inside
|
||||
a Python package structure which will make it easy to include
|
||||
it later on as dependency for other work. After the thesis
|
||||
has been graded the package will be uploaded to the PyPi package
|
||||
repository and the corresponding Git repository will be made
|
||||
publicly available.
|
||||
Any required third party implementations, like the SSD implementation
|
||||
for Keras, which are not already available as Python packages will
|
||||
be included as library code according to their respective licences.
|
||||
|
||||
A large chunk of the code will be written as library-ready code
|
||||
that can be used in other applications. Only a small part will
|
||||
provide the interface to the library code. The specifics of the
|
||||
interface cannot be predicted ahead of time but it will certainly
|
||||
include a properly documented CLI as that will be necessary for
|
||||
the work of the thesis itself.
|
||||
|
||||
Tensorflow will be used as the deep learning framework. To make
|
||||
the code future-proof, the eager execution mode will be used as it
|
||||
is the default for Tensorflow
|
||||
2.0\footnote{\url{https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8}}.
|
||||
|
||||
\section{Stretch Goals}
|
||||
|
||||
There are a number of goals that are not tightly included in the
|
||||
following timetable. Those are optional addons that are nice-to-have
|
||||
but not critical for successful completion of the thesis.
|
||||
|
||||
\begin{itemize}
|
||||
\item make own approach work on the YCB-Video data
|
||||
set\cite{Xiang2017}
|
||||
\item test dropout sampling and own approach on data set
|
||||
self-recorded with a robot arm and mounted Kinect
|
||||
\item provide GUI to select freely an image to be classified by
|
||||
the trained model and see visualization of result
|
||||
\end{itemize}
|
||||
|
||||
\section{Timetable}
|
||||
|
||||
% TODO
|
||||
|
|
Loading…
Reference in New Issue