From e5662ed48d74e303f6736a00abc6ee09d9045180 Mon Sep 17 00:00:00 2001
From: Jim Martens <github@2martens.de>
Date: Wed, 14 Aug 2019 15:36:46 +0200
Subject: [PATCH] Written the decoding pipelines (raw version)

Signed-off-by: Jim Martens <github@2martens.de>
---
 body.tex | 90 +++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 80 insertions(+), 10 deletions(-)

diff --git a/body.tex b/body.tex
index a1a04bc..f3d83b6 100644
--- a/body.tex
+++ b/body.tex
@@ -366,11 +366,9 @@ be used to identify and reject these false positive cases.
 
 \label{chap:methods}
 
-This chapter explains the functionality of the Bayesian SSD and
+This chapter explains the functionality of the Bayesian SSD, the
+decoding pipelines, and
 provides some information on the software and source code design.
-In particular, the model architecture of both vanilla SSD and
-Bayesian SSD, the calculation of model uncertainty, and relevant
-implementation details will be presented.
 
 \section{Bayesian SSD for Model Uncertainty}
 
@@ -411,6 +409,47 @@ Entropy works to detect uncertainty because uncertain networks
 will produce different classifications for the same object in an
 image across multiple forward passes.
 
+\subsection{Implementation Details}
+
+For this thesis, an SSD implementation based on Tensorflow and
+Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
+was used. It was modified to support entropy thresholding,
+partitioning of observations, and dropout
+layers in the SSD model. %Entropy thresholding takes place before
+%the per-class confidence threshold is applied.
+
+\section{Decoding Pipelines}
+
+The raw output of SSD is not very useful: it contains thousands of
+boxes per image. Among them are many boxes with very low confidences
+or background classifications, those need to be filtered out to
+get any meaningful output of the network. The process of
+filtering is called decoding and presented for the three variants
+of SSD used in the thesis.
+
+\subsection{Vanilla SSD}
+
+Liu et al.~\cite{Liu2016} used Caffe for their original SSD
+implementation. The decoding process contains largely two
+phases: decoding and filtering. Decoding transforms the relative
+coordinates predicted by SSD into absolute coordinates. At this point
+the shape of the output per batch is \((batch\_size, \#nr\_boxes, \#nr\_classes + 12)\). The last twelve elements are split into
+the four bounding box offsets, the four anchor box coordinates, and
+the four variances; there are 8732 boxes.
+
+Filtering of these boxes is first done per class:
+only the class id, confidence of that class, and the bounding box
+coordinates are kept per box. The filtering consists of
+confidence thresholding and a subsequent non-maximum suppression.
+All boxes that pass non-maximum suppression are added to a
+per image maxima list. One box could make the confidence threshold
+for multiple classes and, hence, be present multiple times in the
+maxima list for the image. Lastly, a total of \(k\) boxes with the
+highest confidences is kept per image across all classes. The
+original implementation uses a confidence threshold of \(0.01\), an
+IOU threshold for non-maximum suppression of \(0.45\) and a top \(k\)
+value of 200.
+
 The vanilla SSD
 per-class confidence threshold and non-maximum suppression has one
 weakness: even if SSD correctly predicts all objects as the
@@ -434,17 +473,48 @@ detections are the correct behaviour for unknown classes.
 In order to get useful detections out of the decoding, a higher
 confidence threshold is required.
 
-\subsection{Implementation Details}
+\subsection{Vanilla SSD with Entropy Thresholding}
 
-For this thesis, an SSD implementation based on Tensorflow and
-Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
-was used. It was modified to support entropy thresholding, and dropout
-layers in the SSD model. Entropy thresholding takes place before
-the per-class confidence threshold is applied.
+Vanilla SSD with entropy tresholding adds an additional component
+to the filtering already done for vanilla SSD. The entropy is
+calculated from all \(\#nr\_classes\) softmax scores in a prediction.
+Only predictions with a low enough entropy pass the entropy
+threshold and move on to the aforementioned per class filtering.
+This excludes very uniform predictions but cannot identify
+false positive or false negative cases with high confidence values.
 
+\subsection{Bayesian SSD with Entropy Thresholding}
 
+Bayesian SSD has the speciality of multiple forward passes. Based
+on the information in the paper, the detections of all forward passes
+are grouped per image but not by forward pass. This leads
+to the following shape of the network output after all
+forward passes: \((batch\_size, \#nr\_boxes \cdot \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
+increases linearly with more forward passes.
 
+These detections have to be decoded first. Afterwards they are
+partitioned into observations to reduce the size of the output, and
+to identify uncertainty. This is accomplished by calculating the
+mutual IOU of every detection with all other detections. Detections
+with a mutual IOU  score of 0.95 or higher are partitioned into an
+observation. Next, the softmax scores and bounding box coordinates of
+all detections in an observation are averaged.
+There can be a different number of observations for every image which
+destroys homogenity and prevents batch-wise calculation of the
+results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
 
+Entropy is measured in the next step. All observations with too high
+entropy are discarded. Entropy thresholding in combination with
+dropout sampling should improve identification of false positives of
+unknown classes. This is due to multiple forward passes and
+the assumption that uncertainty in some objects will result
+in different classifications in multiple forward passes. These
+varying classifications are averaged into multiple lower confidence
+values which should increase the entropy and, hence, flag an
+observation for removal.
+
+Per class confidence thresholding, non-maximum suppression, and
+top \(k\) selection happen like in vanilla SSD.
 
 \section{Software and Source Code Design}