Reworked data sets section and removed obsolete sections (raw version)
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
95
body.tex
95
body.tex
@ -519,14 +519,21 @@ top \(k\) selection happen like in vanilla SSD.
|
|||||||
|
|
||||||
\label{chap:experiments-results}
|
\label{chap:experiments-results}
|
||||||
|
|
||||||
|
This chapter explains the used data sets, how the experiments were
|
||||||
|
set up, and what the results were.
|
||||||
|
|
||||||
\section{Data sets}
|
\section{Data sets}
|
||||||
|
|
||||||
% TODO: reword
|
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
|
||||||
|
80 classes, from airplanes to toothbrushes many classes are present.
|
||||||
|
The images are taken by camera from the real world, ground truth
|
||||||
|
is provided for all images. The data set supports object detection,
|
||||||
|
keypoint detection, and panoptic segmentation (scene segmentation).
|
||||||
|
|
||||||
Usually, data sets are not perfect when it comes to neural
|
The data of any data set has to be prepared for use in a neural
|
||||||
networks: they contain outliers, invalid bounding boxes, and similar
|
network. Typical problems of data sets include, for example,
|
||||||
problematic things. Before a data set can be used, these problems
|
outliers and invalid bounding boxes. Before a data set can be used,
|
||||||
need to be removed.
|
these problems need to be removed.
|
||||||
|
|
||||||
For the MS COCO data set, all annotations were checked for
|
For the MS COCO data set, all annotations were checked for
|
||||||
impossible values: bounding box height or width lower than zero,
|
impossible values: bounding box height or width lower than zero,
|
||||||
@ -534,73 +541,31 @@ impossible values: bounding box height or width lower than zero,
|
|||||||
\(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\),
|
\(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\),
|
||||||
\(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\),
|
\(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\),
|
||||||
and image height lower than \(y_{max}\). In the last two cases the
|
and image height lower than \(y_{max}\). In the last two cases the
|
||||||
bounding box width or height was set to (image with - \(x_{min}\)) or
|
bounding box width or height was set to (image width - \(x_{min}\)) or
|
||||||
(image height - \(y_{min}\)) respectively;
|
(image height - \(y_{min}\)) respectively;
|
||||||
in the other cases the annotation was skipped.
|
in the other cases the annotation was skipped.
|
||||||
If the bounding box width or height afterwards is
|
If the bounding box width or height afterwards is
|
||||||
lower than or equal to zero the annotation is skipped.
|
lower than or equal to zero the annotation is skipped.
|
||||||
|
|
||||||
In this thesis, SceneNet RGB-D is always used with COCO classes.
|
SSD accepts 300x300 input images, the MS COCO data set images were
|
||||||
Therefore, a mapping between COCO and SceneNet RGB-D and vice versa
|
resized to this resolution; the aspect ratio was not kept in the
|
||||||
was necessary. It was created my manually going through each
|
process. As all images of MS COCO have the same resolution,
|
||||||
Wordnet ID and searching for a fitting COCO class.
|
this led to a uniform distortion of the images. Furthermore,
|
||||||
|
the colour channels were swapped from RGB to BGR in order to
|
||||||
|
comply with the SSD implementation. The BGR requirement stems from
|
||||||
|
the usage of Open CV in SSD: the internal channel order for
|
||||||
|
Open CV is BGR.
|
||||||
|
|
||||||
The ground truth for SceneNet RGB-D is stored in protobuf files
|
For this thesis, the weights pre-trained on trainval35k of the
|
||||||
and had to be converted into Python format to use it in the
|
COCO data set were used. These weights were created with closed set
|
||||||
codebase. The trajectories are not sorted inside the protobuf,
|
conditions in mind, therefore, they had to be sub-sampled to create
|
||||||
therefore, the first action was to sort them. For each trajectory,
|
an open set condition. To this end, the weights for the last
|
||||||
all instances are stored independently of the views in the
|
20 classes were thrown away, making them effectively unknown.
|
||||||
trajectory. Therefore, the trajectories and their respective
|
|
||||||
instances were looped through and all
|
|
||||||
background instances and those without corresponding COCO class were
|
|
||||||
skipped. The rest was stored in a dictionary per trajectory.
|
|
||||||
Subsequently, all views of the trajectory were traversed and
|
|
||||||
for every view all stored instances were looped through.
|
|
||||||
For every instance, the segmentation map was modified by
|
|
||||||
setting all pixels not having the instance ID as value to zero
|
|
||||||
and the rest to one. If no objects were found then that instance
|
|
||||||
was skipped. In the other case a copy of its data from the
|
|
||||||
aforementioned dictionary plus the bounding box information was
|
|
||||||
stored in a list of instances for that view. The list of instances
|
|
||||||
per view was added to a list of such lists for the trajectory.
|
|
||||||
Ultimately this list of lists was added to a global list across
|
|
||||||
all trajectories: a list of lists of lists.
|
|
||||||
|
|
||||||
\section{Replication of Miller et al.}
|
All images of the minival2014 data set were used but only ground truth
|
||||||
|
belonging to the first 60 classes was loaded. The remaining 20
|
||||||
% TODO rework
|
classes were considered "unknown" and were not presented with bounding
|
||||||
|
boxes during the inference phase.
|
||||||
Miller et al. use SSD for the object detection part. They compare
|
|
||||||
vanilla SSD, vanilla SSD with entropy thresholding, and the
|
|
||||||
Bayesian SSD with each other. The Bayesian SSD was created by
|
|
||||||
adding two dropout layers to the vanilla SSD; no other changes
|
|
||||||
were made. Miller et al. use weights that were trained on MS COCO
|
|
||||||
to predict on SceneNet RGB-D.
|
|
||||||
|
|
||||||
As the source code was not available, I had to implement Miller's
|
|
||||||
work myself. For the SSD network, I used an implementation that
|
|
||||||
is compatible with
|
|
||||||
Tensorflow\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}; this implementation had to be
|
|
||||||
changed to work with eager mode. Further changes were made to
|
|
||||||
support entropy thresholding.
|
|
||||||
|
|
||||||
For the Bayesian variant, observations have to be calculated:
|
|
||||||
detections of multiple forward passes for the same image are averaged
|
|
||||||
into an observation. This algorithm was implemented based on the
|
|
||||||
information available in the paper. Beyond the observation
|
|
||||||
calculation, the Bayesian variant can use the same code as the
|
|
||||||
vanilla version with one exception: the model had to be duplicated
|
|
||||||
and two dropout layers added to transform SSD into a Bayesian
|
|
||||||
network.
|
|
||||||
|
|
||||||
The vanilla SSD did not provide meaningful detections on SceneNet
|
|
||||||
RGB-D with the pre-trained weights and fine-tuning it on SceneNet
|
|
||||||
did not work either. Therefore, to better understand the SceneNet
|
|
||||||
RGB-D data set, I counted the number of instances per COCO class and
|
|
||||||
a huge class imbalance was visible; not just globally but also
|
|
||||||
between trajectories: some classes are only present in some
|
|
||||||
trajectories. This makes training with SSD on SceneNet practically
|
|
||||||
impossible.
|
|
||||||
|
|
||||||
\section{Experimental Setup}
|
\section{Experimental Setup}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user