% body thesis file that contains the actual content \chapter{Introduction} \subsection*{Motivation} Famous examples like the automatic soap dispenser which does not recognize the hand of a black person but dispenses soap when presented with a paper towel raise the question of bias in computer systems\cite{Friedman1996}. Related to this ethical question regarding the design of so called algorithms, a term often used in public discourse for applied neural networks, is the question of algorithmic accountability\cite{Diakopoulos2014}. The charme of supervised neural networks, that they can learn from input-output relations and figure out by themselves what connections are necessary for that, is also their achilles heel. This feature makes them effectively black boxes. It is possible to question the training environment, like potential biases inside the data sets, or the engineers constructing the networks but it is not really possible to question the internal calculations made by a network. On the one hand, one might argue, it is only math and nothing magical that happens inside these networks. Clearly it is possible, albeit a chore, to manually follow the calculations of any given trained network. After all it is executed on a computer and at the lowest level only uses basic math that does not differ between humans and computers. On the other hand not everyone is capable of doing so and more importantly it does not reveal any answers to questions of causality. However, these questions of causility are of enormous consequence when neural networks are used, for example, in predictive policing. Is a correlation, a coincidence, enough to bring forth negative consequences for a particular person? And if so, what is the possible defence against math? Similar questions can be raised when looking at computer vision networks that might be used together with so called smart CCTV cameras, for example, like those tested at the train station Berlin Südkreuz. What if a network implies you committed suspicious behaviour? This leads to the need for neural networks to explain their results. Such an explanation must come from the network or an attached piece of technology to allow adoption in mass. Obviously this setting poses the question, how such an endeavour can be achieved. For neural networks there are fundamentally two type of tasks: regression and classification. Regression deals with any case where the goal for the network is to come close to an ideal function that connects all data points. Classification, however, describes tasks where the network is supposed to identify the class of any given input. In this thesis, I will focus on classification. \subsection*{Object detection in open-set conditions} More specifically, I will look at object detection in the open-set conditions. In non-technical words this effectively describes the kind of situation you encounter with CCTV cameras or robots outside of a laboratory. Both use cameras that record images. Subsequently a neural network analyses the image and returns a list of detected and classified objects that it found in the image. The problem here is that networks can only classify what they know. If presented with an object type that the network was not trained with, as happens frequently in real environments, it will still classify the object and might even have a high confidence in doing so. Such an example would be a false positive. Any ordinary person who uses the results of such a network would falsely assume that a high confidence always means the classification is very likely correct. If they use a proprietary system they might not even be able to find out that the network was never trained on a particular type of object. Therefore it would be impossible for them to identify the output of the network as false positive. This goes back to the need for automatic explanation. Such a system should by itself recognize that the given object is unknown and hence mark any classification result of the network as meaningless. Technically there are two slightly different things that deal with this type of task: model uncertainty and novelty detection. Model uncertainty can be measured with dropout sampling. Dropout is usually used only during training but Miller et al\cite{Miller2018} use them also during testing to achieve different results for the same image making use of multiple forward passes. The output scores for the forward passes of the same image are then averaged. If the averaged class probabilities resemble a uniform distribution (every class has the same probability) this symbolises maximum uncertainty. Conversely, if there is one very high probability with every other being very low this signifies a low uncertainty. An unknown object is more likely to cause high uncertainty which allows for an identification of false positive cases. Novelty detection is the more direct approach to solve the task. In the realm of neural networks it is usually done with the help of auto-encoders that essentially solve a regression task of finding an identity function that reconstructs on the output the given input\cite{Pimentel2014}. Auto-encoders have internally at least two components: an encoder, and a decoder or generator. The job of the encoder is to find an encoding that compresses the input as good as possible while simultaneously being as loss-free as possible. The decoder takes this latent representation of the input and has to find a decompression that reconstructs the input as accurate as possible. During training these auto-encoders learn to reproduce a certain group of object classes. The actual novelty detection takes place during testing. Given an image, and the output and loss of the auto-encoder, a novelty score is calculated. A low novelty score signals a known object. The opposite is true for a high novelty score. \subsection*{Research question} Given these two approaches to solve the explanation task of above, it comes down to performance. At the end of the day the best theoretical idea does not help in solving the task if it cannot be implemented in a performant way. Miller et al have shown some success in using dropout sampling. However, the many forward passes during testing for every image seem computationally expensive. In comparison a single run through a trained auto-encoder seems intuitively to be faster. This leads to the hypothesis (see below). For the purpose of this thesis, I will use the work of Miller et al as baseline to compare against. They use the SSD\cite{Liu2016} network for object detection, modified by added dropout layers, and the SceneNet RGB-D\cite{McCormac2017} data set using the MS COCO\cite{Lin2014} classes. Instead of dropout sampling my approach will use an auto-encoder for novelty detection with all else, like using SSD for object detection and the SceneNet RGB-D data set, being equal. With respect to auto-encoders a recent implementation of an adversarial auto-encoder\cite{Pidhorskyi2018} will be used. \paragraph{Hypothesis} Novelty detection using auto-encoders delivers similar or better object detection performance under open-set conditions while being less computationally expensive compared to dropout sampling. \paragraph{Contribution} The contribution of this thesis is a comparison between dropout sampling and auto-encoding with respect to the overall performance of both for object detection in the open-set conditions using the SSD network for object detection and the SceneNet RGB-D data set with MS COCO classes. \chapter{Thesis as a project} After introducing the topic and the general task ahead, this part of the exposé will focus on how to get there. This includes a timetable with SMART goals as well as an outline of the software development practices used for implementing the code for this thesis. \section{Software Development} Most scientific implementations found on GitHub are not done with distribution in mind. They usually require manual cloning of the repository, have bad code documentation and don't follow common coding standards. This is bad enough by itself but becomes a real nuisance if you want to use those implementations in your own code. As they are not marked up as Python packages, using them usually requires manual workarounds to make them usable as library code, for example, in a Python package. The code of this thesis will be developed from the start inside a Python package structure which will make it easy to include it later on as dependency for other work. After the thesis has been graded the package will be uploaded to the PyPi package repository and the corresponding Git repository will be made publicly available. Any required third party implementations, like the SSD implementation for Keras, which are not already available as Python packages will be included as library code according to their respective licences. A large chunk of the code will be written as library-ready code that can be used in other applications. Only a small part will provide the interface to the library code. The specifics of the interface cannot be predicted ahead of time but it will certainly include a properly documented CLI as that will be necessary for the work of the thesis itself. Tensorflow will be used as the deep learning framework. To make the code future-proof, the eager execution mode will be used as it is the default for Tensorflow 2.0\footnote{\url{https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8}}. \section{Stretch Goals} There are a number of goals that are not tightly included in the following timetable. Those are optional addons that are nice-to-have but not critical for successful completion of the thesis. \begin{itemize} \item make own approach work on the YCB-Video data set\cite{Xiang2017} \item test dropout sampling and own approach on data set self-recorded with a robot arm and mounted Kinect \item provide GUI to select freely an image to be classified by the trained model and see visualization of result \end{itemize} \section{Timetable} This timetable is structured by milestones that I want to achieve. Every milestone has the related tasks grouped beneath it. The scheduling is done with respect to my full personal calendar and will only account Monday through Friday at most. Weekends will not be scheduled work time for the thesis. This allows for some additional unreliable emergency buffer in the end if things do not proceed as planned. Furthermore I will only be able to regularly plan the time between 11 am and 5 pm for working on the thesis as the evenings are mostly full and regardless of that fact I do want to reserve free time. \paragraph{Main tasks} Everything but the stretch goals are non-optional which makes the term "main task" rather difficult to grasp. The term implies that all other tasks are nice-to-have but not required. Therefore, I have chosen to use milestones instead as the highest grouping level. \subsection*{Milestones} The detailed timetable starts in the next subsection. A summary of the timetable regarding the milestones is presented here. \begin{enumerate} \item Environment set up: Due date 20th March \item Fine-tuned SSD on SceneNet RGB-D: Due date 5th April \item Fine-tuned GPND on SceneNet RGB-D: Due date 12th April \item Networks evaluated: Due date 10th May \item Visualizations created: Due date 31st May \item Stretch Goals/Buffer: Due date 27th June \item Thesis writing: Due date 30th August \item Finishing touches: Due date 13th September \end{enumerate} \subsection*{Environment set up} \textbf{Due date:} 20th March \begin{description} \item[Download SceneNet RGB-D to cvpc\{7,8\} computer] \hfill \\ Requires external resource. \end{description} \subsection*{Fine-tuned SSD on SceneNet RGB-D} \textbf{Due date:} 5th April \begin{description} \item[Download pre-trained weights of SSD for MS COCO] \hfill \\ This is trivial. Takes not more than two hours. \item[Modify SSD Keras implementation to work inside masterthesis package] \hfill \\ Should be possible to achieve within one day. \item[Implement integration of SSD into masterthesis package] \hfill \\ Implementing the glue code between the git submodule and my own code. Should be doable within one day. \item[Group SceneNet RGB-D classes to MS COCO classes] \hfill \\ SceneNet contains more classes than COCO. Miller et al have grouped, for example, various chair classes in SceneNet into one chair class of COCO. This grouping involves researching the 80 classes of COCO and finding all related SceneNet classes and then writing a mapper between them. All in all this could take up a full day and perhaps slip into a second one. \item[Implement variant of SSD with dropout layers (Bayesian SSD)] \hfill \\ This is a rather trivial task as it only involves adding two Keras dropout layers into SSD. Can be done in one hour. \item[Fine-tune vanilla SSD on SceneNet RGB-D] \hfill \\ Requires external resource and length of required training is unknown. Due to two unknown factors (availability of resource, and length of training) this task can be considered a project risk. \item[Fine-tune Bayesian SSD on SceneNet RGB-D] \hfill \\ Similar remarks like the previous task. \end{description} The tasks prior to the training could be achievable by the 21st March if work starts on the 18th. Buffer time will go to the 25th of March. Training is scheduled to commence as early as possible but no later than the 26th of March. Since the SSD network is a proven one, I am confident that this milestone can be reached and the time between 26th of March and 5th April should provide more than enough time for training. Once training has started, I can work on tasks from other milestones so that the training time is used as efficiently as possible. \subsection*{Fine-tuned GPND on SceneNet RGB-D} \textbf{Due date:} 12th April \begin{description} \item[Adapt GPND implementation for SceneNet RGB-D using COCO classes] \hfill \\ Requires research to figure out the exact architecture needed for a different data set. The code is not well documented and some logical variables like image size are sometimes hard-coded, which makes this adaption difficult and error-prone. Furthermore, some trial-and-error regarding training successes is likely needed, which makes this task a project risk. If the needed architecture was known the time to implement it would be at most one day. The uncertainty therefore lies with the research part. \item[Implement novelty score calculation for GPND] \hfill \\ There is an implementation for this in the original author's implementation. It would have to be ported to Tensorflow and integrated into the package structure. Takes likely one day or two. \item[Apply insights of GAN stability to GPND implementation] \hfill \\ The insights from the GAN stability\footnote{\url{https://avg.is.tuebingen.mpg.de/publications/meschedericml2018}} research should be applied to my GPND implementation. Requires research what, if any, insights can be used for this thesis. The research is doable within one day and the application of it within another. \item[Train GPND on SceneNet RGB-D] \hfill \\ Requires external resource. In contrast to the SSD network, there are no pre-trained weights available for the GPND. Therefore it has to be trained from scratch. Furthermore, it will have to be trained for every class separately, which prolongs the training even further. This task than be classified as project risk. \end{description} I will only be able to start working on these tasks on April 1st. Assuming that the research in the first task goes well, I will be able to finish the preparatory work on April 5th. Training could start as early as April 5th. The seven days to the due date April 12th are tight and maybe it takes longer but this is the aggressive date I will work towards. \subsection*{Networks evaluated} \textbf{Due date:} 10th May \begin{description} \item[Implement evaluation pipeline for vanilla SSD] \hfill \\ Involves the implementation of the evaluation steps according to the chosen metrics. Takes likely two days. \item[Implement evaluation pipeline for Bayesian SSD] \hfill \\ Involves the implementation of the evaluation steps for the Bayesian variant. As more is has to be done, it will likely take three days. \item[Implement evaluation pipeline for SSD with GPND for novelty score] \hfill \\ Implementation of the evaluation steps for my approach. It will probably take two days. \item[Run vanilla SSD on test data] \hfill \\ The trained network is run on the test data and the results are stored. Requires external resource but should be far quicker than the training and will probably be done in two days at most. \item[Run Bayesian SSD on test data] \hfill \\ Similar remarks to previous task. \item[Run vanilla SSD detections through GPND] \hfill \\ For my approach the SSD detections need to be run through the GPND to have all the relevant data for evaluation. Requires external resource. Will take likely two days. \item[Calculate evaluation metrics for vanilla SSD] \hfill \\ Takes one day. \item[Calculate evaluation metrics for Bayesian SSD] \hfill \\ Takes one day. \item[Calculate evaluation metrics for vanilla SSD with GPND] \hfill \\ Takes one day \end{description} If I can start on April 15th with the preparatory work it should be done by April 23rd. The testing runs can begin as early as April 24th and should finish around April 30th. This leaves the week from May 6th up to the due date to finish the calculations, which can happen on the CPU as all the data is already there by then. \subsection*{Visualizations created} \textbf{Due date:} 31st May\\ I won't be able to work on the thesis between May 13th and May 26th due to the election campaign. I am involved into the campaign already as of this writing but I hope that up until May 10th both thesis and campaign can somewhat co-exist. The visualizations should be creatable within one week from May 27th to May 31st. \subsection*{Stretch goals} \textbf{Due date:} 27th June\\ As I mentioned earlier, there are no specific tasks for the stretch goals. If the critical path is finished by the end of May as planned then the month of June is available for stretch goals. If the critical path is not finished then June serves as a buffer zone to prevent spillover into the writing period. \subsection*{Thesis writing} \textbf{Due date:} 30th August\\ A first complete draft of the thesis should be finished at the latest by August 16th. The following week I am not able to work on the thesis but it can be used for feedback. The last week of August should allow for polishing of the thesis with a submission-ready candidate by August 30th. \subsection*{Finishing touches} \textbf{Due date:} 13th September\\ The submission requires three printed copies of the thesis, together with any digital components on a CD glued to the back of the thesis. A non-editable CD ensures that the code submitted cannot be modified and will be exactly as submitted when reviewed. I will use these two weeks to print the copies and to make last publication steps for the code like improving the code documentation and adding usage examples. \subsection*{Colloquium} Last but not least is the colloquium which will probably take place within the second half of September. I will prepare a presentation for the colloquium in the time before such date. \section{Project Risks} In this section other project risks will be listed in addition to those indicated in the timetable section. The workload for the election campaign, in which I have an organizational responsibility in addition to being a candidate myself, could come into conflict with the progress of the thesis. Availability of the external resource can hinder the progress and delay steps of the thesis. In such a case dependent tasks cannot commence until the earlier task has been finished, resulting in an overall delay of the thesis. To deal with these risks, I have planned for one whole month of buffer time that can account for many delays. Furthermore, the writing time is intentionally that long as it is difficult to predict how inspired I will be. I know from my bachelor thesis that on some days it can be many pages you write and on others you might barely make one page progress. I would argue that the thesis success is largely dependent on the first part of the work as it can make or break it. Once the technical part is done, the way forward should be downhill.