% body thesis file that contains the actual content \chapter{Introduction} \subsection*{Motivation} Famous examples like the automatic soap dispenser which does not recognize the hand of a black person but dispenses soap when presented with a paper towel raise the question of bias in computer systems~\cite{Friedman1996}. Related to this ethical question regarding the design of so called algorithms, a term often used in public discourse for applied neural networks, is the question of algorithmic accountability~\cite{Diakopoulos2014}. The charm of supervised neural networks, that they can learn from input-output relations and figure out by themselves what connections are necessary for that, is also their Achilles heel. This feature makes them effectively black boxes. It is possible to question the training environment, like potential biases inside the data sets, or the engineers constructing the networks but it is not really possible to question the internal calculations made by a network. On the one hand, one might argue, it is only math and nothing magical that happens inside these networks. Clearly it is possible, albeit a chore, to manually follow the calculations of any given trained network. After all it is executed on a computer and at the lowest level only uses basic math that does not differ between humans and computers. On the other hand not everyone is capable of doing so and more importantly it does not reveal any answers to questions of causality. However, these questions of causility are of enormous consequence when neural networks are used, for example, in predictive policing. Is a correlation, a coincidence, enough to bring forth negative consequences for a particular person? And if so, what is the possible defence against math? Similar questions can be raised when looking at computer vision networks that might be used together with so called smart CCTV cameras, for example, like those tested at the train station Berlin Südkreuz. What if a network implies you committed suspicious behaviour? This leads to the need for neural networks to explain their results. Such an explanation must come from the network or an attached piece of technology to allow adoption in mass. Obviously this setting poses the question, how such an endeavour can be achieved. For neural networks there are fundamentally two type of tasks: regression and classification. Regression deals with any case where the goal for the network is to come close to an ideal function that connects all data points. Classification, however, describes tasks where the network is supposed to identify the class of any given input. In this thesis, I will focus on classification. \subsection*{Object Detection in Open Set Conditions} More specifically, I will look at object detection in the open set conditions. In non-technical words this effectively describes the kind of situation you encounter with CCTV cameras or robots outside of a laboratory. Both use cameras that record images. Subsequently a neural network analyses the image and returns a list of detected and classified objects that it found in the image. The problem here is that networks can only classify what they know. If presented with an object type that the network was not trained with, as happens frequently in real environments, it will still classify the object and might even have a high confidence in doing so. Such an example would be a false positive. Any ordinary person who uses the results of such a network would falsely assume that a high confidence always means the classification is very likely correct. If they use a proprietary system they might not even be able to find out that the network was never trained on a particular type of object. Therefore it would be impossible for them to identify the output of the network as false positive. This goes back to the need for automatic explanation. Such a system should by itself recognize that the given object is unknown and hence mark any classification result of the network as meaningless. Technically there are two slightly different things that deal with this type of task: model uncertainty and novelty detection. Model uncertainty can be measured with dropout sampling. Dropout is usually used only during training but Miller et al.~\cite{Miller2018} use them also during testing to achieve different results for the same image making use of multiple forward passes. The output scores for the forward passes of the same image are then averaged. If the averaged class probabilities resemble a uniform distribution (every class has the same probability) this symbolises maximum uncertainty. Conversely, if there is one very high probability with every other being very low this signifies a low uncertainty. An unknown object is more likely to cause high uncertainty which allows for an identification of false positive cases. Novelty detection is the more direct approach to solve the task. In the realm of neural networks it is usually done with the help of auto-encoders that essentially solve a regression task of finding an identity function that reconstructs on the output the given input~\cite{Pimentel2014}. Auto-encoders have internally at least two components: an encoder, and a decoder or generator. The job of the encoder is to find an encoding that compresses the input as good as possible while simultaneously being as loss-free as possible. The decoder takes this latent representation of the input and has to find a decompression that reconstructs the input as accurate as possible. During training these auto-encoders learn to reproduce a certain group of object classes. The actual novelty detection takes place during testing. Given an image, and the output and loss of the auto-encoder, a novelty score is calculated. A low novelty score signals a known object. The opposite is true for a high novelty score. \subsection*{Research Question} Given these two approaches to solve the explanation task of above, it comes down to performance. At the end of the day the best theoretical idea does not help in solving the task if it cannot be implemented in a performant way. Miller et al. have shown some success in using dropout sampling. However, the many forward passes during testing for every image seem computationally expensive. In comparison a single run through a trained auto-encoder seems intuitively to be faster. This leads to the hypothesis (see below). For the purpose of this thesis, I will use the work of Miller et al. as baseline to compare against. They use the SSD~\cite{Liu2016} network for object detection, modified by added dropout layers, and the SceneNet RGB-D~\cite{McCormac2017} data set using the MS COCO~\cite{Lin2014} classes. Instead of dropout sampling my approach will use an auto-encoder for novelty detection with all else, like using SSD for object detection and the SceneNet RGB-D data set, being equal. With respect to auto-encoders a recent implementation of an adversarial auto-encoder~\cite{Pidhorskyi2018} will be used. \paragraph{Hypothesis} Novelty detection using auto-encoders delivers similar or better object detection performance under open set conditions while being less computationally expensive compared to dropout sampling. \paragraph{Contribution} The contribution of this thesis is a comparison between dropout sampling and auto-encoding with respect to the overall performance of both for object detection in the open set conditions using the SSD network for object detection and the SceneNet RGB-D data set with MS COCO classes. \chapter{Background and Contribution} \chapter{Methods} \section{Design of Source Code} \section{Preparation of data sets} \section{Replication of Miller et al.} \chapter{Results} \chapter{Discussion} \chapter{Closing}