Written the decoding pipelines (raw version)

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-08-14 15:36:46 +02:00
parent 5cda3089c9
commit e5662ed48d

View File

@ -366,11 +366,9 @@ be used to identify and reject these false positive cases.
\label{chap:methods} \label{chap:methods}
This chapter explains the functionality of the Bayesian SSD and This chapter explains the functionality of the Bayesian SSD, the
decoding pipelines, and
provides some information on the software and source code design. provides some information on the software and source code design.
In particular, the model architecture of both vanilla SSD and
Bayesian SSD, the calculation of model uncertainty, and relevant
implementation details will be presented.
\section{Bayesian SSD for Model Uncertainty} \section{Bayesian SSD for Model Uncertainty}
@ -411,6 +409,47 @@ Entropy works to detect uncertainty because uncertain networks
will produce different classifications for the same object in an will produce different classifications for the same object in an
image across multiple forward passes. image across multiple forward passes.
\subsection{Implementation Details}
For this thesis, an SSD implementation based on Tensorflow and
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
was used. It was modified to support entropy thresholding,
partitioning of observations, and dropout
layers in the SSD model. %Entropy thresholding takes place before
%the per-class confidence threshold is applied.
\section{Decoding Pipelines}
The raw output of SSD is not very useful: it contains thousands of
boxes per image. Among them are many boxes with very low confidences
or background classifications, those need to be filtered out to
get any meaningful output of the network. The process of
filtering is called decoding and presented for the three variants
of SSD used in the thesis.
\subsection{Vanilla SSD}
Liu et al.~\cite{Liu2016} used Caffe for their original SSD
implementation. The decoding process contains largely two
phases: decoding and filtering. Decoding transforms the relative
coordinates predicted by SSD into absolute coordinates. At this point
the shape of the output per batch is \((batch\_size, \#nr\_boxes, \#nr\_classes + 12)\). The last twelve elements are split into
the four bounding box offsets, the four anchor box coordinates, and
the four variances; there are 8732 boxes.
Filtering of these boxes is first done per class:
only the class id, confidence of that class, and the bounding box
coordinates are kept per box. The filtering consists of
confidence thresholding and a subsequent non-maximum suppression.
All boxes that pass non-maximum suppression are added to a
per image maxima list. One box could make the confidence threshold
for multiple classes and, hence, be present multiple times in the
maxima list for the image. Lastly, a total of \(k\) boxes with the
highest confidences is kept per image across all classes. The
original implementation uses a confidence threshold of \(0.01\), an
IOU threshold for non-maximum suppression of \(0.45\) and a top \(k\)
value of 200.
The vanilla SSD The vanilla SSD
per-class confidence threshold and non-maximum suppression has one per-class confidence threshold and non-maximum suppression has one
weakness: even if SSD correctly predicts all objects as the weakness: even if SSD correctly predicts all objects as the
@ -434,17 +473,48 @@ detections are the correct behaviour for unknown classes.
In order to get useful detections out of the decoding, a higher In order to get useful detections out of the decoding, a higher
confidence threshold is required. confidence threshold is required.
\subsection{Implementation Details} \subsection{Vanilla SSD with Entropy Thresholding}
For this thesis, an SSD implementation based on Tensorflow and Vanilla SSD with entropy tresholding adds an additional component
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}} to the filtering already done for vanilla SSD. The entropy is
was used. It was modified to support entropy thresholding, and dropout calculated from all \(\#nr\_classes\) softmax scores in a prediction.
layers in the SSD model. Entropy thresholding takes place before Only predictions with a low enough entropy pass the entropy
the per-class confidence threshold is applied. threshold and move on to the aforementioned per class filtering.
This excludes very uniform predictions but cannot identify
false positive or false negative cases with high confidence values.
\subsection{Bayesian SSD with Entropy Thresholding}
Bayesian SSD has the speciality of multiple forward passes. Based
on the information in the paper, the detections of all forward passes
are grouped per image but not by forward pass. This leads
to the following shape of the network output after all
forward passes: \((batch\_size, \#nr\_boxes \cdot \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
increases linearly with more forward passes.
These detections have to be decoded first. Afterwards they are
partitioned into observations to reduce the size of the output, and
to identify uncertainty. This is accomplished by calculating the
mutual IOU of every detection with all other detections. Detections
with a mutual IOU score of 0.95 or higher are partitioned into an
observation. Next, the softmax scores and bounding box coordinates of
all detections in an observation are averaged.
There can be a different number of observations for every image which
destroys homogenity and prevents batch-wise calculation of the
results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
Entropy is measured in the next step. All observations with too high
entropy are discarded. Entropy thresholding in combination with
dropout sampling should improve identification of false positives of
unknown classes. This is due to multiple forward passes and
the assumption that uncertainty in some objects will result
in different classifications in multiple forward passes. These
varying classifications are averaged into multiple lower confidence
values which should increase the entropy and, hence, flag an
observation for removal.
Per class confidence thresholding, non-maximum suppression, and
top \(k\) selection happen like in vanilla SSD.
\section{Software and Source Code Design} \section{Software and Source Code Design}