Finished exposé except for timetable

Signed-off-by: Jim Martens <github@2martens.de>
2019-02-20 17:20:20 +01:00 · 2019-02-20 17:20:20 +01:00 · 9dd9b49ab1
parent dee2a83e9a
commit 9dd9b49ab1
1 changed files with 187 additions and 7 deletions
--- a/body_expose.tex
+++ b/body_expose.tex
@ -2,12 +2,192 @@
 \chapter{Introduction}
-\textcolor{uhhRed}{Hallo}
+Famous examples like the automatic soap dispenser which does not
 recognize the hand of a black person but dispenses soap when presented
 with a paper towel raise the question of bias in computer
 systems\cite{Friedman1996}. Related to this ethical question regarding
 the design of so called algorithms, a term often used in public
 discourse for applied neural networks, is the question of
 algorithmic accountability\cite{Diakopoulos2014}.
-\chapter{Conclusion}
+The charme of supervised neural networks, that they can learn from
 input-output relations and figure out by themselves what connections
 are necessary for that, is also their achilles heel. This feature
 makes them effectively black boxes. It is possible to question the
 training environment, like potential biases inside the data sets, or
 the engineers constructing the networks but it is not really possible
 to question the internal calculations made by a network. On the one
 hand, one might argue, it is only math and nothing magical that
 happens inside these networks. Clearly it is possible, albeit a chore,
 to manually follow the calculations of any given trained network.
 After all it is executed on a computer and at the lowest level only
 uses basic math that does not differ between humans and computers. On
 the other hand not everyone is capable of doing so and more
 importantly it does not reveal any answers to questions of causality.
-% test references
+However, these questions of causility are of enormous consequence when
-\cite{Wu2019}
+neural networks are used, for example, in predictive policing. Is a
-\cite{Gal2016}
+correlation, a coincidence, enough to bring forth negative consequences
-\cite{Ilg2018}
+for a particular person? And if so, what is the possible defence
-\cite{Geifman2018}
+against math? Similar questions can be raised when looking at computer
 vision networks that might be used together with so called smart
 CCTV cameras, for example, like those tested at the train station
 Berlin Südkreuz. What if a network implies you committed suspicious
 behaviour?
 This leads to the need for neural networks to explain their results.
 Such an explanation must come from the network or an attached piece
 of technology to allow adoption in mass. Obviously this setting
 poses the question, how such an endeavour can be achieved.
 For neural networks there are fundamentally two type of tasks:
 regression, and classification. Regression deals with any case
 where the goal for the network is to come close to an ideal
 function that connects all data points. Classification, however,
 describes tasks where the network is supposed to identify the
 class of any given input. In this thesis, I will focus on
 classification.
 More specifically, I will look at object detection in the open-set
 conditions. In non-technical words this effectively describes
 the kind of situation you encounter with CCTV cameras or robots
 outside of a laboratory. Both use cameras that, well, record
 images. Subsequently a neural network analyses the image
 and returns a list of detected and classified objects that it
 found in the image. The problem here is that networks can only
 classify what they know. If presented with an object type that
 the network was not trained with, as happens frequently in real
 environments, it will still classify the object and might even
 have a high confidence in doing so. Such an example would be
 a false positive. Any ordinary person who uses the results of
 such a network would falsely assume that a high confidence always
 means the classification is very likely correct. If they use
 a proprietary system they might not even be able to find out
 that the network was never trained on a particular type of object.
 Therefore it would be impossible for them to identify the output
 of the network as false positive.
 This goes back to the need for automatic explanation. Such a system
 should by itself recognize that the given object is unknown and
 hence mark any classification result of the network as meaningless.
 Technically there are two slightly different things that deal
 with this type of task: model uncertainty, and novelty detection.
 Model uncertainty can be measured with dropout sampling.
 Dropout is usually used only during training but
 Miller et al\cite{Miller2018} use them also during testing
 to achieve different results for the same image making use of
 multiple forward passes. The output scores for the forward passes
 of the same image are then averaged. If the averaged class
 probabilities resemble a uniform distribution (every class has
 the same probability) this symbolises maximum uncertainty. Conversely,
 if there is one very high probability with every other being very
 low this signifies a low uncertainty. An unknown object is more
 likely to cause high uncertainty which allows for an identification
 of false positive cases.
 Novelty detection is the more direct approach to solve the task.
 In the realm of neural networks it is usually done with the help of
 auto-encoders that essentially solve a regression task of finding an
 identity function that reconstructs on the output the given
 input\cite{Pimentel2014}. Auto-encoders have
 internally at least two components: an encoder, and a decoder or
 generator. The job of the encoder is to find an encoding that
 compresses the input as good as possible while simultaneously
 being as loss-free as possible. The decoder takes this latent
 representation of the input and has to find a decompression
 that reconstructs the input as accurate as possible. During
 training these auto-encoders learn to reproduce a certain group
 of object classes. The actual novelty detection takes place
 during testing. Given an image, and the output and loss of the
 auto-encoder, a novelty score is calculated. A low novelty
 score signals a known object. The opposite is true for a high
 novelty score.
 Given these two approaches to solve the explanation task of above,
 it comes down to performance. At the end of the day the best
 theoretical idea does not help in solving the task if it cannot
 be implemented in a performant way. Miller et al have shown
 some success in using dropout sampling. However, the many forward
 passes during testing for every image seem computationally expensive.
 In comparison a single run through a trained auto-encoder seems
 intuitively to be faster. This leads to the following hypothesis:
 \emph{Novelty detection using auto-encoders delivers similar or better
 object detection performance under open-set conditions while
 being less computationally expensive compared to dropout sampling}.
 For the purpose of this thesis, I will
 use the work of Miller et al as baseline to compare against.
 They use the SSD\cite{Liu2016} network for object detection,
 modified by added dropout layers, and the SceneNet
 RGB-D\cite{McCormac2017} data set using the MS COCO\cite{Lin2014}
 classes. Instead of dropout sampling my approach will use
 an auto-encoder for novelty detection with all else, like
 using SSD for object detection and the SceneNet RGB-D data set,
 being equal. With respect to auto-encoders a recent implementation
 of an adversarial auto-encoder\cite{Pidhorskyi2018} will be used.
 The contribution of this thesis is a comparison between dropout
 sampling and auto-encoding with respect to the overall performance
 of both for object detection in the open-set conditions using
 the SSD network for object detection and the SceneNet RGB-D data set
 with MS COCO classes.
 \chapter{Thesis as a project}
 After introducing the topic and the general task ahead, this part of
 the exposé will focus on how to get there. This includes a timetable
 with SMART goals as well as an outline of the software development
 practices used for implementing the code for this thesis.
 \section{Software Development}
 Most scientific implementations found on GitHub are not done with
 distribution in mind. They usually require manual cloning of the
 repository, have bad code documentation and don't follow common
 coding standards. This is bad enough by itself but becomes a real
 nuisance if you want to use those implementations in your own
 code. As they are not marked up as Python packages, using them
 usually requires manual workarounds to make them usable as library
 code, for example, in a Python package.
 The code of this thesis will be developed from the start inside
 a Python package structure which will make it easy to include
 it later on as dependency for other work. After the thesis
 has been graded the package will be uploaded to the PyPi package
 repository and the corresponding Git repository will be made
 publicly available.
 Any required third party implementations, like the SSD implementation
 for Keras, which are not already available as Python packages will
 be included as library code according to their respective licences.
 A large chunk of the code will be written as library-ready code
 that can be used in other applications. Only a small part will
 provide the interface to the library code. The specifics of the
 interface cannot be predicted ahead of time but it will certainly
 include a properly documented CLI as that will be necessary for
 the work of the thesis itself.
 Tensorflow will be used as the deep learning framework. To make
 the code future-proof, the eager execution mode will be used as it
 is the default for Tensorflow
 2.0\footnote{\url{https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8}}.
 \section{Stretch Goals}
 There are a number of goals that are not tightly included in the
 following timetable. Those are optional addons that are nice-to-have
 but not critical for successful completion of the thesis.
 \begin{itemize}
    \item make own approach work on the YCB-Video data
          set\cite{Xiang2017}
    \item test dropout sampling and own approach on data set
          self-recorded with a robot arm and mounted Kinect
    \item provide GUI to select freely an image to be classified by
          the trained model and see visualization of result
 \end{itemize}
 \section{Timetable}
 % TODO