Reworked data sets section and removed obsolete sections (raw version)
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
67879375b1
commit
8f45155cff
95
body.tex
95
body.tex
|
@ -519,14 +519,21 @@ top \(k\) selection happen like in vanilla SSD.
|
|||
|
||||
\label{chap:experiments-results}
|
||||
|
||||
This chapter explains the used data sets, how the experiments were
|
||||
set up, and what the results were.
|
||||
|
||||
\section{Data sets}
|
||||
|
||||
% TODO: reword
|
||||
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
|
||||
80 classes, from airplanes to toothbrushes many classes are present.
|
||||
The images are taken by camera from the real world, ground truth
|
||||
is provided for all images. The data set supports object detection,
|
||||
keypoint detection, and panoptic segmentation (scene segmentation).
|
||||
|
||||
Usually, data sets are not perfect when it comes to neural
|
||||
networks: they contain outliers, invalid bounding boxes, and similar
|
||||
problematic things. Before a data set can be used, these problems
|
||||
need to be removed.
|
||||
The data of any data set has to be prepared for use in a neural
|
||||
network. Typical problems of data sets include, for example,
|
||||
outliers and invalid bounding boxes. Before a data set can be used,
|
||||
these problems need to be removed.
|
||||
|
||||
For the MS COCO data set, all annotations were checked for
|
||||
impossible values: bounding box height or width lower than zero,
|
||||
|
@ -534,73 +541,31 @@ impossible values: bounding box height or width lower than zero,
|
|||
\(x_{max}\) and \(y_{max}\) coordinates lower than or equal to zero, \(x_{min}\) greater than \(x_{max}\),
|
||||
\(y_{min}\) greater than \(y_{max}\), image width lower than \(x_{max}\),
|
||||
and image height lower than \(y_{max}\). In the last two cases the
|
||||
bounding box width or height was set to (image with - \(x_{min}\)) or
|
||||
bounding box width or height was set to (image width - \(x_{min}\)) or
|
||||
(image height - \(y_{min}\)) respectively;
|
||||
in the other cases the annotation was skipped.
|
||||
If the bounding box width or height afterwards is
|
||||
lower than or equal to zero the annotation is skipped.
|
||||
|
||||
In this thesis, SceneNet RGB-D is always used with COCO classes.
|
||||
Therefore, a mapping between COCO and SceneNet RGB-D and vice versa
|
||||
was necessary. It was created my manually going through each
|
||||
Wordnet ID and searching for a fitting COCO class.
|
||||
SSD accepts 300x300 input images, the MS COCO data set images were
|
||||
resized to this resolution; the aspect ratio was not kept in the
|
||||
process. As all images of MS COCO have the same resolution,
|
||||
this led to a uniform distortion of the images. Furthermore,
|
||||
the colour channels were swapped from RGB to BGR in order to
|
||||
comply with the SSD implementation. The BGR requirement stems from
|
||||
the usage of Open CV in SSD: the internal channel order for
|
||||
Open CV is BGR.
|
||||
|
||||
The ground truth for SceneNet RGB-D is stored in protobuf files
|
||||
and had to be converted into Python format to use it in the
|
||||
codebase. The trajectories are not sorted inside the protobuf,
|
||||
therefore, the first action was to sort them. For each trajectory,
|
||||
all instances are stored independently of the views in the
|
||||
trajectory. Therefore, the trajectories and their respective
|
||||
instances were looped through and all
|
||||
background instances and those without corresponding COCO class were
|
||||
skipped. The rest was stored in a dictionary per trajectory.
|
||||
Subsequently, all views of the trajectory were traversed and
|
||||
for every view all stored instances were looped through.
|
||||
For every instance, the segmentation map was modified by
|
||||
setting all pixels not having the instance ID as value to zero
|
||||
and the rest to one. If no objects were found then that instance
|
||||
was skipped. In the other case a copy of its data from the
|
||||
aforementioned dictionary plus the bounding box information was
|
||||
stored in a list of instances for that view. The list of instances
|
||||
per view was added to a list of such lists for the trajectory.
|
||||
Ultimately this list of lists was added to a global list across
|
||||
all trajectories: a list of lists of lists.
|
||||
For this thesis, the weights pre-trained on trainval35k of the
|
||||
COCO data set were used. These weights were created with closed set
|
||||
conditions in mind, therefore, they had to be sub-sampled to create
|
||||
an open set condition. To this end, the weights for the last
|
||||
20 classes were thrown away, making them effectively unknown.
|
||||
|
||||
\section{Replication of Miller et al.}
|
||||
|
||||
% TODO rework
|
||||
|
||||
Miller et al. use SSD for the object detection part. They compare
|
||||
vanilla SSD, vanilla SSD with entropy thresholding, and the
|
||||
Bayesian SSD with each other. The Bayesian SSD was created by
|
||||
adding two dropout layers to the vanilla SSD; no other changes
|
||||
were made. Miller et al. use weights that were trained on MS COCO
|
||||
to predict on SceneNet RGB-D.
|
||||
|
||||
As the source code was not available, I had to implement Miller's
|
||||
work myself. For the SSD network, I used an implementation that
|
||||
is compatible with
|
||||
Tensorflow\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}; this implementation had to be
|
||||
changed to work with eager mode. Further changes were made to
|
||||
support entropy thresholding.
|
||||
|
||||
For the Bayesian variant, observations have to be calculated:
|
||||
detections of multiple forward passes for the same image are averaged
|
||||
into an observation. This algorithm was implemented based on the
|
||||
information available in the paper. Beyond the observation
|
||||
calculation, the Bayesian variant can use the same code as the
|
||||
vanilla version with one exception: the model had to be duplicated
|
||||
and two dropout layers added to transform SSD into a Bayesian
|
||||
network.
|
||||
|
||||
The vanilla SSD did not provide meaningful detections on SceneNet
|
||||
RGB-D with the pre-trained weights and fine-tuning it on SceneNet
|
||||
did not work either. Therefore, to better understand the SceneNet
|
||||
RGB-D data set, I counted the number of instances per COCO class and
|
||||
a huge class imbalance was visible; not just globally but also
|
||||
between trajectories: some classes are only present in some
|
||||
trajectories. This makes training with SSD on SceneNet practically
|
||||
impossible.
|
||||
All images of the minival2014 data set were used but only ground truth
|
||||
belonging to the first 60 classes was loaded. The remaining 20
|
||||
classes were considered "unknown" and were not presented with bounding
|
||||
boxes during the inference phase.
|
||||
|
||||
\section{Experimental Setup}
|
||||
|
||||
|
|
Loading…
Reference in New Issue