From 8f45155cfffe473b0737b099059bcb725e840c23 Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Fri, 16 Aug 2019 12:45:51 +0200 Subject: [PATCH] Reworked data sets section and removed obsolete sections (raw version) Signed-off-by: Jim Martens --- body.tex | 95 ++++++++++++++++++-------------------------------------- 1 file changed, 30 insertions(+), 65 deletions(-) diff --git a/body.tex b/body.tex index ad7ace0..8dfaa68 100644 --- a/body.tex +++ b/body.tex @@ -519,14 +519,21 @@ top \(k\) selection happen like in vanilla SSD. \label{chap:experiments-results} +This chapter explains the used data sets, how the experiments were +set up, and what the results were. + \section{Data sets} -% TODO: reword +This thesis uses the MS COCO~\cite{Lin2014} data set. It contains +80 classes, from airplanes to toothbrushes many classes are present. +The images are taken by camera from the real world, ground truth +is provided for all images. The data set supports object detection, +keypoint detection, and panoptic segmentation (scene segmentation). -Usually, data sets are not perfect when it comes to neural -networks: they contain outliers, invalid bounding boxes, and similar -problematic things. Before a data set can be used, these problems -need to be removed. +The data of any data set has to be prepared for use in a neural +network. Typical problems of data sets include, for example, +outliers and invalid bounding boxes. Before a data set can be used, +these problems need to be removed. For the MS COCO data set, all annotations were checked for impossible values: bounding box height or width lower than zero, @@ -534,73 +541,31 @@ impossible values: bounding box height or width lower than zero, \(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\), \(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\), and image height lower than \(y_{max}\). In the last two cases the -bounding box width or height was set to (image with - \(x_{min}\)) or +bounding box width or height was set to (image width - \(x_{min}\)) or (image height - \(y_{min}\)) respectively; in the other cases the annotation was skipped. If the bounding box width or height afterwards is lower than or equal to zero the annotation is skipped. -In this thesis, SceneNet RGB-D is always used with COCO classes. -Therefore, a mapping between COCO and SceneNet RGB-D and vice versa -was necessary. It was created my manually going through each -Wordnet ID and searching for a fitting COCO class. +SSD accepts 300x300 input images, the MS COCO data set images were +resized to this resolution; the aspect ratio was not kept in the +process. As all images of MS COCO have the same resolution, +this led to a uniform distortion of the images. Furthermore, +the colour channels were swapped from RGB to BGR in order to +comply with the SSD implementation. The BGR requirement stems from +the usage of Open CV in SSD: the internal channel order for +Open CV is BGR. -The ground truth for SceneNet RGB-D is stored in protobuf files -and had to be converted into Python format to use it in the -codebase. The trajectories are not sorted inside the protobuf, -therefore, the first action was to sort them. For each trajectory, -all instances are stored independently of the views in the -trajectory. Therefore, the trajectories and their respective -instances were looped through and all -background instances and those without corresponding COCO class were -skipped. The rest was stored in a dictionary per trajectory. -Subsequently, all views of the trajectory were traversed and -for every view all stored instances were looped through. -For every instance, the segmentation map was modified by -setting all pixels not having the instance ID as value to zero -and the rest to one. If no objects were found then that instance -was skipped. In the other case a copy of its data from the -aforementioned dictionary plus the bounding box information was -stored in a list of instances for that view. The list of instances -per view was added to a list of such lists for the trajectory. -Ultimately this list of lists was added to a global list across -all trajectories: a list of lists of lists. +For this thesis, the weights pre-trained on trainval35k of the +COCO data set were used. These weights were created with closed set +conditions in mind, therefore, they had to be sub-sampled to create +an open set condition. To this end, the weights for the last +20 classes were thrown away, making them effectively unknown. -\section{Replication of Miller et al.} - -% TODO rework - -Miller et al. use SSD for the object detection part. They compare -vanilla SSD, vanilla SSD with entropy thresholding, and the -Bayesian SSD with each other. The Bayesian SSD was created by -adding two dropout layers to the vanilla SSD; no other changes -were made. Miller et al. use weights that were trained on MS COCO -to predict on SceneNet RGB-D. - -As the source code was not available, I had to implement Miller's -work myself. For the SSD network, I used an implementation that -is compatible with -Tensorflow\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}; this implementation had to be -changed to work with eager mode. Further changes were made to -support entropy thresholding. - -For the Bayesian variant, observations have to be calculated: -detections of multiple forward passes for the same image are averaged -into an observation. This algorithm was implemented based on the -information available in the paper. Beyond the observation -calculation, the Bayesian variant can use the same code as the -vanilla version with one exception: the model had to be duplicated -and two dropout layers added to transform SSD into a Bayesian -network. - -The vanilla SSD did not provide meaningful detections on SceneNet -RGB-D with the pre-trained weights and fine-tuning it on SceneNet -did not work either. Therefore, to better understand the SceneNet -RGB-D data set, I counted the number of instances per COCO class and -a huge class imbalance was visible; not just globally but also -between trajectories: some classes are only present in some -trajectories. This makes training with SSD on SceneNet practically -impossible. +All images of the minival2014 data set were used but only ground truth +belonging to the first 60 classes was loaded. The remaining 20 +classes were considered "unknown" and were not presented with bounding +boxes during the inference phase. \section{Experimental Setup}