Written the decoding pipelines (raw version)

Signed-off-by: Jim Martens <github@2martens.de>
2019-08-14 15:36:46 +02:00
parent 5cda3089c9
commit e5662ed48d
1 changed files with 80 additions and 10 deletions
--- a/body.tex
+++ b/body.tex
@ -366,11 +366,9 @@ be used to identify and reject these false positive cases.
 \label{chap:methods}
-This chapter explains the functionality of the Bayesian SSD and
+This chapter explains the functionality of the Bayesian SSD, the
 decoding pipelines, and
 provides some information on the software and source code design.
 In particular, the model architecture of both vanilla SSD and
 Bayesian SSD, the calculation of model uncertainty, and relevant
 implementation details will be presented.
 \section{Bayesian SSD for Model Uncertainty}
@ -411,6 +409,47 @@ Entropy works to detect uncertainty because uncertain networks
 will produce different classifications for the same object in an
 image across multiple forward passes.
 \subsection{Implementation Details}
 For this thesis, an SSD implementation based on Tensorflow and
 Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
 was used. It was modified to support entropy thresholding,
 partitioning of observations, and dropout
 layers in the SSD model. %Entropy thresholding takes place before
 %the per-class confidence threshold is applied.
 \section{Decoding Pipelines}
 The raw output of SSD is not very useful: it contains thousands of
 boxes per image. Among them are many boxes with very low confidences
 or background classifications, those need to be filtered out to
 get any meaningful output of the network. The process of
 filtering is called decoding and presented for the three variants
 of SSD used in the thesis.
 \subsection{Vanilla SSD}
 Liu et al.~\cite{Liu2016} used Caffe for their original SSD
 implementation. The decoding process contains largely two
 phases: decoding and filtering. Decoding transforms the relative
 coordinates predicted by SSD into absolute coordinates. At this point
 the shape of the output per batch is \((batch\_size, \#nr\_boxes, \#nr\_classes + 12)\). The last twelve elements are split into
 the four bounding box offsets, the four anchor box coordinates, and
 the four variances; there are 8732 boxes.
 Filtering of these boxes is first done per class:
 only the class id, confidence of that class, and the bounding box
 coordinates are kept per box. The filtering consists of
 confidence thresholding and a subsequent non-maximum suppression.
 All boxes that pass non-maximum suppression are added to a
 per image maxima list. One box could make the confidence threshold
 for multiple classes and, hence, be present multiple times in the
 maxima list for the image. Lastly, a total of \(k\) boxes with the
 highest confidences is kept per image across all classes. The
 original implementation uses a confidence threshold of \(0.01\), an
 IOU threshold for non-maximum suppression of \(0.45\) and a top \(k\)
 value of 200.
 The vanilla SSD
 per-class confidence threshold and non-maximum suppression has one
 weakness: even if SSD correctly predicts all objects as the
@ -434,17 +473,48 @@ detections are the correct behaviour for unknown classes.
 In order to get useful detections out of the decoding, a higher
 confidence threshold is required.
-\subsection{Implementation Details}
+\subsection{Vanilla SSD with Entropy Thresholding}
-For this thesis, an SSD implementation based on Tensorflow and
+Vanilla SSD with entropy tresholding adds an additional component
-Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
+to the filtering already done for vanilla SSD. The entropy is
-was used. It was modified to support entropy thresholding, and dropout
+calculated from all \(\#nr\_classes\) softmax scores in a prediction.
-layers in the SSD model. Entropy thresholding takes place before
+Only predictions with a low enough entropy pass the entropy
-the per-class confidence threshold is applied.
+threshold and move on to the aforementioned per class filtering.
 This excludes very uniform predictions but cannot identify
 false positive or false negative cases with high confidence values.
 \subsection{Bayesian SSD with Entropy Thresholding}
 Bayesian SSD has the speciality of multiple forward passes. Based
 on the information in the paper, the detections of all forward passes
 are grouped per image but not by forward pass. This leads
 to the following shape of the network output after all
 forward passes: \((batch\_size, \#nr\_boxes \cdot \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
 increases linearly with more forward passes.
 These detections have to be decoded first. Afterwards they are
 partitioned into observations to reduce the size of the output, and
 to identify uncertainty. This is accomplished by calculating the
 mutual IOU of every detection with all other detections. Detections
 with a mutual IOU  score of 0.95 or higher are partitioned into an
 observation. Next, the softmax scores and bounding box coordinates of
 all detections in an observation are averaged.
 There can be a different number of observations for every image which
 destroys homogenity and prevents batch-wise calculation of the
 results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
 Entropy is measured in the next step. All observations with too high
 entropy are discarded. Entropy thresholding in combination with
 dropout sampling should improve identification of false positives of
 unknown classes. This is due to multiple forward passes and
 the assumption that uncertainty in some objects will result
 in different classifications in multiple forward passes. These
 varying classifications are averaged into multiple lower confidence
 values which should increase the entropy and, hence, flag an
 observation for removal.
 Per class confidence thresholding, non-maximum suppression, and
 top \(k\) selection happen like in vanilla SSD.
 \section{Software and Source Code Design}