From e5662ed48d74e303f6736a00abc6ee09d9045180 Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Wed, 14 Aug 2019 15:36:46 +0200 Subject: [PATCH] Written the decoding pipelines (raw version) Signed-off-by: Jim Martens --- body.tex | 90 +++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 80 insertions(+), 10 deletions(-) diff --git a/body.tex b/body.tex index a1a04bc..f3d83b6 100644 --- a/body.tex +++ b/body.tex @@ -366,11 +366,9 @@ be used to identify and reject these false positive cases. \label{chap:methods} -This chapter explains the functionality of the Bayesian SSD and +This chapter explains the functionality of the Bayesian SSD, the +decoding pipelines, and provides some information on the software and source code design. -In particular, the model architecture of both vanilla SSD and -Bayesian SSD, the calculation of model uncertainty, and relevant -implementation details will be presented. \section{Bayesian SSD for Model Uncertainty} @@ -411,6 +409,47 @@ Entropy works to detect uncertainty because uncertain networks will produce different classifications for the same object in an image across multiple forward passes. +\subsection{Implementation Details} + +For this thesis, an SSD implementation based on Tensorflow and +Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}} +was used. It was modified to support entropy thresholding, +partitioning of observations, and dropout +layers in the SSD model. %Entropy thresholding takes place before +%the per-class confidence threshold is applied. + +\section{Decoding Pipelines} + +The raw output of SSD is not very useful: it contains thousands of +boxes per image. Among them are many boxes with very low confidences +or background classifications, those need to be filtered out to +get any meaningful output of the network. The process of +filtering is called decoding and presented for the three variants +of SSD used in the thesis. + +\subsection{Vanilla SSD} + +Liu et al.~\cite{Liu2016} used Caffe for their original SSD +implementation. The decoding process contains largely two +phases: decoding and filtering. Decoding transforms the relative +coordinates predicted by SSD into absolute coordinates. At this point +the shape of the output per batch is \((batch\_size, \#nr\_boxes, \#nr\_classes + 12)\). The last twelve elements are split into +the four bounding box offsets, the four anchor box coordinates, and +the four variances; there are 8732 boxes. + +Filtering of these boxes is first done per class: +only the class id, confidence of that class, and the bounding box +coordinates are kept per box. The filtering consists of +confidence thresholding and a subsequent non-maximum suppression. +All boxes that pass non-maximum suppression are added to a +per image maxima list. One box could make the confidence threshold +for multiple classes and, hence, be present multiple times in the +maxima list for the image. Lastly, a total of \(k\) boxes with the +highest confidences is kept per image across all classes. The +original implementation uses a confidence threshold of \(0.01\), an +IOU threshold for non-maximum suppression of \(0.45\) and a top \(k\) +value of 200. + The vanilla SSD per-class confidence threshold and non-maximum suppression has one weakness: even if SSD correctly predicts all objects as the @@ -434,17 +473,48 @@ detections are the correct behaviour for unknown classes. In order to get useful detections out of the decoding, a higher confidence threshold is required. -\subsection{Implementation Details} +\subsection{Vanilla SSD with Entropy Thresholding} -For this thesis, an SSD implementation based on Tensorflow and -Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}} -was used. It was modified to support entropy thresholding, and dropout -layers in the SSD model. Entropy thresholding takes place before -the per-class confidence threshold is applied. +Vanilla SSD with entropy tresholding adds an additional component +to the filtering already done for vanilla SSD. The entropy is +calculated from all \(\#nr\_classes\) softmax scores in a prediction. +Only predictions with a low enough entropy pass the entropy +threshold and move on to the aforementioned per class filtering. +This excludes very uniform predictions but cannot identify +false positive or false negative cases with high confidence values. +\subsection{Bayesian SSD with Entropy Thresholding} +Bayesian SSD has the speciality of multiple forward passes. Based +on the information in the paper, the detections of all forward passes +are grouped per image but not by forward pass. This leads +to the following shape of the network output after all +forward passes: \((batch\_size, \#nr\_boxes \cdot \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output +increases linearly with more forward passes. +These detections have to be decoded first. Afterwards they are +partitioned into observations to reduce the size of the output, and +to identify uncertainty. This is accomplished by calculating the +mutual IOU of every detection with all other detections. Detections +with a mutual IOU score of 0.95 or higher are partitioned into an +observation. Next, the softmax scores and bounding box coordinates of +all detections in an observation are averaged. +There can be a different number of observations for every image which +destroys homogenity and prevents batch-wise calculation of the +results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\). +Entropy is measured in the next step. All observations with too high +entropy are discarded. Entropy thresholding in combination with +dropout sampling should improve identification of false positives of +unknown classes. This is due to multiple forward passes and +the assumption that uncertainty in some objects will result +in different classifications in multiple forward passes. These +varying classifications are averaged into multiple lower confidence +values which should increase the entropy and, hence, flag an +observation for removal. + +Per class confidence thresholding, non-maximum suppression, and +top \(k\) selection happen like in vanilla SSD. \section{Software and Source Code Design}