Reworked data sets section and removed obsolete sections (raw version)

Signed-off-by: Jim Martens <github@2martens.de>
2019-08-16 12:45:51 +02:00
parent 67879375b1
commit 8f45155cff
1 changed files with 30 additions and 65 deletions
--- a/body.tex
+++ b/body.tex
@ -519,14 +519,21 @@ top \(k\) selection happen like in vanilla SSD.
 \label{chap:experiments-results}
 This chapter explains the used data sets, how the experiments were
 set up, and what the results were.
 \section{Data sets}
-% TODO: reword
+This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
 80 classes, from airplanes to toothbrushes many classes are present.
 The images are taken by camera from the real world, ground truth
 is provided for all images. The data set supports object detection,
 keypoint detection, and panoptic segmentation (scene segmentation).
-Usually, data sets are not perfect when it comes to neural
+The data of any data set has to be prepared for use in a neural
-networks: they contain outliers, invalid bounding boxes, and similar
+network. Typical problems of data sets include, for example,
-problematic things. Before a data set can be used, these problems
+outliers and invalid bounding boxes. Before a data set can be used,
-need to be removed.
+these problems need to be removed.
 For the MS COCO data set, all annotations were checked for
 impossible values: bounding box height or width lower than zero,
@ -534,73 +541,31 @@ impossible values: bounding box height or width lower than zero,
 \(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\),
 \(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\),
 and image height lower than \(y_{max}\). In the last two cases the
-bounding box width or height was set to (image with - \(x_{min}\)) or
+bounding box width or height was set to (image width - \(x_{min}\)) or
 (image height - \(y_{min}\)) respectively;
 in the other cases the annotation was skipped.
 If the bounding box width or height afterwards is
 lower than or equal to zero the annotation is skipped.
-In this thesis, SceneNet RGB-D is always used with COCO classes.
+SSD accepts 300x300 input images, the MS COCO data set images were
-Therefore, a mapping between COCO and SceneNet RGB-D and vice versa
+resized to this resolution; the aspect ratio was not kept in the
-was necessary. It was created my manually going through each
+process. As all images of MS COCO have the same resolution,
-Wordnet ID and searching for a fitting COCO class.
+this led to a uniform distortion of the images. Furthermore,
 the colour channels were swapped from RGB to BGR in order to
 comply with the SSD implementation. The BGR requirement stems from
 the usage of Open CV in SSD: the internal channel order for
 Open CV is BGR.
-The ground truth for SceneNet RGB-D is stored in protobuf files
+For this thesis, the weights pre-trained on trainval35k of the
-and had to be converted into Python format to use it in the
+COCO data set were used. These weights were created with closed set
-codebase. The trajectories are not sorted inside the protobuf,
+conditions in mind, therefore, they had to be sub-sampled to create
-therefore, the first action was to sort them. For each trajectory,
+an open set condition. To this end, the weights for the last
-all instances are stored independently of the views in the
+20 classes were thrown away, making them effectively unknown.
 trajectory. Therefore, the trajectories and their respective
 instances were looped through and all
 background instances and those without corresponding COCO class were
 skipped. The rest was stored in a dictionary per trajectory.
 Subsequently, all views of the trajectory were traversed and
 for every view all stored instances were looped through.
 For every instance, the segmentation map was modified by
 setting all pixels not having the instance ID as value to zero
 and the rest to one. If no objects were found then that instance
 was skipped. In the other case a copy of its data from the
 aforementioned dictionary plus the bounding box information was
 stored in a list of instances for that view. The list of instances
 per view was added to a list of such lists for the trajectory.
 Ultimately this list of lists was added to a global list across
 all trajectories: a list of lists of lists.
-\section{Replication of Miller et al.}
+All images of the minival2014 data set were used but only ground truth
-
+belonging to the first 60 classes was loaded. The remaining 20
-% TODO rework
+classes were considered "unknown" and were not presented with bounding
-
+boxes during the inference phase.
 Miller et al. use SSD for the object detection part. They compare
 vanilla SSD, vanilla SSD with entropy thresholding, and the
 Bayesian SSD with each other. The Bayesian SSD was created by
 adding two dropout layers to the vanilla SSD; no other changes
 were made. Miller et al. use weights that were trained on MS COCO
 to predict on SceneNet RGB-D.
 As the source code was not available, I had to implement Miller's
 work myself. For the SSD network, I used an implementation that
 is compatible with
 Tensorflow\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}; this implementation had to be
 changed to work with eager mode. Further changes were made to
 support entropy thresholding.
 For the Bayesian variant, observations have to be calculated:
 detections of multiple forward passes for the same image are averaged
 into an observation. This algorithm was implemented based on the
 information available in the paper. Beyond the observation
 calculation, the Bayesian variant can use the same code as the
 vanilla version with one exception: the model had to be duplicated
 and two dropout layers added to transform SSD into a Bayesian
 network.
 The vanilla SSD did not provide meaningful detections on SceneNet
 RGB-D with the pre-trained weights and fine-tuning it on SceneNet
 did not work either. Therefore, to better understand the SceneNet
 RGB-D data set, I counted the number of instances per COCO class and
 a huge class imbalance was visible; not just globally but also
 between trajectories: some classes are only present in some
 trajectories. This makes training with SSD on SceneNet practically
 impossible.
 \section{Experimental Setup}