From 8f45155cfffe473b0737b099059bcb725e840c23 Mon Sep 17 00:00:00 2001
From: Jim Martens <github@2martens.de>
Date: Fri, 16 Aug 2019 12:45:51 +0200
Subject: [PATCH] Reworked data sets section and removed obsolete sections (raw
 version)

Signed-off-by: Jim Martens <github@2martens.de>
---
 body.tex | 95 ++++++++++++++++++--------------------------------------
 1 file changed, 30 insertions(+), 65 deletions(-)

diff --git a/body.tex b/body.tex
index ad7ace0..8dfaa68 100644
--- a/body.tex
+++ b/body.tex
@@ -519,14 +519,21 @@ top \(k\) selection happen like in vanilla SSD.
 
 \label{chap:experiments-results}
 
+This chapter explains the used data sets, how the experiments were
+set up, and what the results were.
+
 \section{Data sets}
 
-% TODO: reword
+This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
+80 classes, from airplanes to toothbrushes many classes are present.
+The images are taken by camera from the real world, ground truth
+is provided for all images. The data set supports object detection,
+keypoint detection, and panoptic segmentation (scene segmentation).
 
-Usually, data sets are not perfect when it comes to neural
-networks: they contain outliers, invalid bounding boxes, and similar
-problematic things. Before a data set can be used, these problems
-need to be removed.
+The data of any data set has to be prepared for use in a neural
+network. Typical problems of data sets include, for example,
+outliers and invalid bounding boxes. Before a data set can be used,
+these problems need to be removed.
 
 For the MS COCO data set, all annotations were checked for
 impossible values: bounding box height or width lower than zero,
@@ -534,73 +541,31 @@ impossible values: bounding box height or width lower than zero,
 \(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\),
 \(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\),
 and image height lower than \(y_{max}\). In the last two cases the
-bounding box width or height was set to (image with - \(x_{min}\)) or
+bounding box width or height was set to (image width - \(x_{min}\)) or
 (image height - \(y_{min}\)) respectively;
 in the other cases the annotation was skipped.
 If the bounding box width or height afterwards is
 lower than or equal to zero the annotation is skipped.
 
-In this thesis, SceneNet RGB-D is always used with COCO classes.
-Therefore, a mapping between COCO and SceneNet RGB-D and vice versa
-was necessary. It was created my manually going through each
-Wordnet ID and searching for a fitting COCO class.
+SSD accepts 300x300 input images, the MS COCO data set images were
+resized to this resolution; the aspect ratio was not kept in the
+process. As all images of MS COCO have the same resolution,
+this led to a uniform distortion of the images. Furthermore,
+the colour channels were swapped from RGB to BGR in order to
+comply with the SSD implementation. The BGR requirement stems from
+the usage of Open CV in SSD: the internal channel order for
+Open CV is BGR.
 
-The ground truth for SceneNet RGB-D is stored in protobuf files
-and had to be converted into Python format to use it in the
-codebase. The trajectories are not sorted inside the protobuf,
-therefore, the first action was to sort them. For each trajectory,
-all instances are stored independently of the views in the
-trajectory. Therefore, the trajectories and their respective
-instances were looped through and all
-background instances and those without corresponding COCO class were
-skipped. The rest was stored in a dictionary per trajectory.
-Subsequently, all views of the trajectory were traversed and
-for every view all stored instances were looped through.
-For every instance, the segmentation map was modified by
-setting all pixels not having the instance ID as value to zero
-and the rest to one. If no objects were found then that instance
-was skipped. In the other case a copy of its data from the
-aforementioned dictionary plus the bounding box information was
-stored in a list of instances for that view. The list of instances
-per view was added to a list of such lists for the trajectory.
-Ultimately this list of lists was added to a global list across
-all trajectories: a list of lists of lists.
+For this thesis, the weights pre-trained on trainval35k of the
+COCO data set were used. These weights were created with closed set
+conditions in mind, therefore, they had to be sub-sampled to create
+an open set condition. To this end, the weights for the last
+20 classes were thrown away, making them effectively unknown.
 
-\section{Replication of Miller et al.}
-
-% TODO rework
-
-Miller et al. use SSD for the object detection part. They compare
-vanilla SSD, vanilla SSD with entropy thresholding, and the
-Bayesian SSD with each other. The Bayesian SSD was created by
-adding two dropout layers to the vanilla SSD; no other changes
-were made. Miller et al. use weights that were trained on MS COCO
-to predict on SceneNet RGB-D.
-
-As the source code was not available, I had to implement Miller's
-work myself. For the SSD network, I used an implementation that
-is compatible with
-Tensorflow\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}; this implementation had to be
-changed to work with eager mode. Further changes were made to
-support entropy thresholding.
-
-For the Bayesian variant, observations have to be calculated:
-detections of multiple forward passes for the same image are averaged
-into an observation. This algorithm was implemented based on the
-information available in the paper. Beyond the observation
-calculation, the Bayesian variant can use the same code as the
-vanilla version with one exception: the model had to be duplicated
-and two dropout layers added to transform SSD into a Bayesian
-network.
-
-The vanilla SSD did not provide meaningful detections on SceneNet
-RGB-D with the pre-trained weights and fine-tuning it on SceneNet
-did not work either. Therefore, to better understand the SceneNet
-RGB-D data set, I counted the number of instances per COCO class and
-a huge class imbalance was visible; not just globally but also
-between trajectories: some classes are only present in some
-trajectories. This makes training with SSD on SceneNet practically
-impossible.
+All images of the minival2014 data set were used but only ground truth
+belonging to the first 60 classes was loaded. The remaining 20
+classes were considered "unknown" and were not presented with bounding
+boxes during the inference phase.
 
 \section{Experimental Setup}