Reworked data sets section and removed obsolete sections (raw version)

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
Jim Martens 2019-08-16 12:45:51 +02:00
parent 67879375b1
commit 8f45155cff
1 changed files with 30 additions and 65 deletions

View File

@ -519,14 +519,21 @@ top \(k\) selection happen like in vanilla SSD.
\label{chap:experiments-results}
This chapter explains the used data sets, how the experiments were
set up, and what the results were.
\section{Data sets}
% TODO: reword
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
80 classes, from airplanes to toothbrushes many classes are present.
The images are taken by camera from the real world, ground truth
is provided for all images. The data set supports object detection,
keypoint detection, and panoptic segmentation (scene segmentation).
Usually, data sets are not perfect when it comes to neural
networks: they contain outliers, invalid bounding boxes, and similar
problematic things. Before a data set can be used, these problems
need to be removed.
The data of any data set has to be prepared for use in a neural
network. Typical problems of data sets include, for example,
outliers and invalid bounding boxes. Before a data set can be used,
these problems need to be removed.
For the MS COCO data set, all annotations were checked for
impossible values: bounding box height or width lower than zero,
@ -534,73 +541,31 @@ impossible values: bounding box height or width lower than zero,
\(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\),
\(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\),
and image height lower than \(y_{max}\). In the last two cases the
bounding box width or height was set to (image with - \(x_{min}\)) or
bounding box width or height was set to (image width - \(x_{min}\)) or
(image height - \(y_{min}\)) respectively;
in the other cases the annotation was skipped.
If the bounding box width or height afterwards is
lower than or equal to zero the annotation is skipped.
In this thesis, SceneNet RGB-D is always used with COCO classes.
Therefore, a mapping between COCO and SceneNet RGB-D and vice versa
was necessary. It was created my manually going through each
Wordnet ID and searching for a fitting COCO class.
SSD accepts 300x300 input images, the MS COCO data set images were
resized to this resolution; the aspect ratio was not kept in the
process. As all images of MS COCO have the same resolution,
this led to a uniform distortion of the images. Furthermore,
the colour channels were swapped from RGB to BGR in order to
comply with the SSD implementation. The BGR requirement stems from
the usage of Open CV in SSD: the internal channel order for
Open CV is BGR.
The ground truth for SceneNet RGB-D is stored in protobuf files
and had to be converted into Python format to use it in the
codebase. The trajectories are not sorted inside the protobuf,
therefore, the first action was to sort them. For each trajectory,
all instances are stored independently of the views in the
trajectory. Therefore, the trajectories and their respective
instances were looped through and all
background instances and those without corresponding COCO class were
skipped. The rest was stored in a dictionary per trajectory.
Subsequently, all views of the trajectory were traversed and
for every view all stored instances were looped through.
For every instance, the segmentation map was modified by
setting all pixels not having the instance ID as value to zero
and the rest to one. If no objects were found then that instance
was skipped. In the other case a copy of its data from the
aforementioned dictionary plus the bounding box information was
stored in a list of instances for that view. The list of instances
per view was added to a list of such lists for the trajectory.
Ultimately this list of lists was added to a global list across
all trajectories: a list of lists of lists.
For this thesis, the weights pre-trained on trainval35k of the
COCO data set were used. These weights were created with closed set
conditions in mind, therefore, they had to be sub-sampled to create
an open set condition. To this end, the weights for the last
20 classes were thrown away, making them effectively unknown.
\section{Replication of Miller et al.}
% TODO rework
Miller et al. use SSD for the object detection part. They compare
vanilla SSD, vanilla SSD with entropy thresholding, and the
Bayesian SSD with each other. The Bayesian SSD was created by
adding two dropout layers to the vanilla SSD; no other changes
were made. Miller et al. use weights that were trained on MS COCO
to predict on SceneNet RGB-D.
As the source code was not available, I had to implement Miller's
work myself. For the SSD network, I used an implementation that
is compatible with
Tensorflow\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}; this implementation had to be
changed to work with eager mode. Further changes were made to
support entropy thresholding.
For the Bayesian variant, observations have to be calculated:
detections of multiple forward passes for the same image are averaged
into an observation. This algorithm was implemented based on the
information available in the paper. Beyond the observation
calculation, the Bayesian variant can use the same code as the
vanilla version with one exception: the model had to be duplicated
and two dropout layers added to transform SSD into a Bayesian
network.
The vanilla SSD did not provide meaningful detections on SceneNet
RGB-D with the pre-trained weights and fine-tuning it on SceneNet
did not work either. Therefore, to better understand the SceneNet
RGB-D data set, I counted the number of instances per COCO class and
a huge class imbalance was visible; not just globally but also
between trajectories: some classes are only present in some
trajectories. This makes training with SSD on SceneNet practically
impossible.
All images of the minival2014 data set were used but only ground truth
belonging to the first 60 classes was loaded. The remaining 20
classes were considered "unknown" and were not presented with bounding
boxes during the inference phase.
\section{Experimental Setup}