Written the decoding pipelines (raw version)
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
90
body.tex
90
body.tex
@ -366,11 +366,9 @@ be used to identify and reject these false positive cases.
|
||||
|
||||
\label{chap:methods}
|
||||
|
||||
This chapter explains the functionality of the Bayesian SSD and
|
||||
This chapter explains the functionality of the Bayesian SSD, the
|
||||
decoding pipelines, and
|
||||
provides some information on the software and source code design.
|
||||
In particular, the model architecture of both vanilla SSD and
|
||||
Bayesian SSD, the calculation of model uncertainty, and relevant
|
||||
implementation details will be presented.
|
||||
|
||||
\section{Bayesian SSD for Model Uncertainty}
|
||||
|
||||
@ -411,6 +409,47 @@ Entropy works to detect uncertainty because uncertain networks
|
||||
will produce different classifications for the same object in an
|
||||
image across multiple forward passes.
|
||||
|
||||
\subsection{Implementation Details}
|
||||
|
||||
For this thesis, an SSD implementation based on Tensorflow and
|
||||
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
|
||||
was used. It was modified to support entropy thresholding,
|
||||
partitioning of observations, and dropout
|
||||
layers in the SSD model. %Entropy thresholding takes place before
|
||||
%the per-class confidence threshold is applied.
|
||||
|
||||
\section{Decoding Pipelines}
|
||||
|
||||
The raw output of SSD is not very useful: it contains thousands of
|
||||
boxes per image. Among them are many boxes with very low confidences
|
||||
or background classifications, those need to be filtered out to
|
||||
get any meaningful output of the network. The process of
|
||||
filtering is called decoding and presented for the three variants
|
||||
of SSD used in the thesis.
|
||||
|
||||
\subsection{Vanilla SSD}
|
||||
|
||||
Liu et al.~\cite{Liu2016} used Caffe for their original SSD
|
||||
implementation. The decoding process contains largely two
|
||||
phases: decoding and filtering. Decoding transforms the relative
|
||||
coordinates predicted by SSD into absolute coordinates. At this point
|
||||
the shape of the output per batch is \((batch\_size, \#nr\_boxes, \#nr\_classes + 12)\). The last twelve elements are split into
|
||||
the four bounding box offsets, the four anchor box coordinates, and
|
||||
the four variances; there are 8732 boxes.
|
||||
|
||||
Filtering of these boxes is first done per class:
|
||||
only the class id, confidence of that class, and the bounding box
|
||||
coordinates are kept per box. The filtering consists of
|
||||
confidence thresholding and a subsequent non-maximum suppression.
|
||||
All boxes that pass non-maximum suppression are added to a
|
||||
per image maxima list. One box could make the confidence threshold
|
||||
for multiple classes and, hence, be present multiple times in the
|
||||
maxima list for the image. Lastly, a total of \(k\) boxes with the
|
||||
highest confidences is kept per image across all classes. The
|
||||
original implementation uses a confidence threshold of \(0.01\), an
|
||||
IOU threshold for non-maximum suppression of \(0.45\) and a top \(k\)
|
||||
value of 200.
|
||||
|
||||
The vanilla SSD
|
||||
per-class confidence threshold and non-maximum suppression has one
|
||||
weakness: even if SSD correctly predicts all objects as the
|
||||
@ -434,17 +473,48 @@ detections are the correct behaviour for unknown classes.
|
||||
In order to get useful detections out of the decoding, a higher
|
||||
confidence threshold is required.
|
||||
|
||||
\subsection{Implementation Details}
|
||||
\subsection{Vanilla SSD with Entropy Thresholding}
|
||||
|
||||
For this thesis, an SSD implementation based on Tensorflow and
|
||||
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
|
||||
was used. It was modified to support entropy thresholding, and dropout
|
||||
layers in the SSD model. Entropy thresholding takes place before
|
||||
the per-class confidence threshold is applied.
|
||||
Vanilla SSD with entropy tresholding adds an additional component
|
||||
to the filtering already done for vanilla SSD. The entropy is
|
||||
calculated from all \(\#nr\_classes\) softmax scores in a prediction.
|
||||
Only predictions with a low enough entropy pass the entropy
|
||||
threshold and move on to the aforementioned per class filtering.
|
||||
This excludes very uniform predictions but cannot identify
|
||||
false positive or false negative cases with high confidence values.
|
||||
|
||||
\subsection{Bayesian SSD with Entropy Thresholding}
|
||||
|
||||
Bayesian SSD has the speciality of multiple forward passes. Based
|
||||
on the information in the paper, the detections of all forward passes
|
||||
are grouped per image but not by forward pass. This leads
|
||||
to the following shape of the network output after all
|
||||
forward passes: \((batch\_size, \#nr\_boxes \cdot \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
|
||||
increases linearly with more forward passes.
|
||||
|
||||
These detections have to be decoded first. Afterwards they are
|
||||
partitioned into observations to reduce the size of the output, and
|
||||
to identify uncertainty. This is accomplished by calculating the
|
||||
mutual IOU of every detection with all other detections. Detections
|
||||
with a mutual IOU score of 0.95 or higher are partitioned into an
|
||||
observation. Next, the softmax scores and bounding box coordinates of
|
||||
all detections in an observation are averaged.
|
||||
There can be a different number of observations for every image which
|
||||
destroys homogenity and prevents batch-wise calculation of the
|
||||
results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
|
||||
|
||||
Entropy is measured in the next step. All observations with too high
|
||||
entropy are discarded. Entropy thresholding in combination with
|
||||
dropout sampling should improve identification of false positives of
|
||||
unknown classes. This is due to multiple forward passes and
|
||||
the assumption that uncertainty in some objects will result
|
||||
in different classifications in multiple forward passes. These
|
||||
varying classifications are averaged into multiple lower confidence
|
||||
values which should increase the entropy and, hence, flag an
|
||||
observation for removal.
|
||||
|
||||
Per class confidence thresholding, non-maximum suppression, and
|
||||
top \(k\) selection happen like in vanilla SSD.
|
||||
|
||||
\section{Software and Source Code Design}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user