Written discussion and outlook

Signed-off-by: Jim Martens <github@2martens.de>
2019-09-10 15:46:31 +02:00
parent 0af1033567
commit 29c2ebbda4
1 changed files with 100 additions and 3 deletions
--- a/body.tex
+++ b/body.tex
@ -799,10 +799,107 @@ recall values. However, all variants with multiple forward passes and
 non-maximum suppression have lower open set errors than all vanilla SSD
 variants.

-\chapter{Discussion}
+\chapter{Discussion and Outlook}

 \label{chap:discussion}

+First the results will be discussed, then possible future research and open
+questions will be addressed.

-\chapter{Closing}
-\label{chap:closing}
+\section*{Discussion}
+
+The results clearly do not support the hypothesis: \textit{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it}. With the exception of open set errors, there
+is no area where dropout sampling performs better than vanilla SSD. In the
+remainder of the section the individual results will be interpreted.
+
+\subsection*{Impact of Entropy}
+
+There is no visible impact of entropy thresholding on the object detection
+performance for vanilla SSD. This indicates that the network has almost no
+uniform or close to uniform predictions, the vast majority of predictions
+has a high confidence in one class - including the background.
+However, the entropy plays a larger role for the Bayesian variants - as
+expected: the best performing thresholds are 1.3 and 1.4 for micro averaging,
+and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
+threshold is not the largest threshold tested. A lower threshold likely
+eliminated some false positives from the result set. On the other hand a
+too low threshold likely eliminated true positives as well.
+
+\subsection*{Non-maximum suppression}
+
+Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
+in their implementation of dropout sampling. Therefore, a variant with disabled
+non-maximum suppression (NMS) was tested. The disastrous results heavily imply
+that NMS is crucial and pose serious questions about the implementation of
+Miller et al., who still have not released source code.
+
+Without NMS all detections passing the per-class confidence threshold are
+directly ordered in descending order by their confidence value. Afterwards the
+top \(k\) detections are kept. This enables the following scenario:
+the first top \(k\) detections all belong to the same class and potentially
+object. Detections of other classes and objects could be discarded, reducing
+recall in the process. Multiple detections of the same object also increase
+the number of false positives, further reducing the \(F_1\) score.
+
+\subsection*{Dropout}
+
+The dropout variants have largely worse performance than the Bayesian variants
+without dropout. This is expected as the network was not trained with
+dropout and the weights are not prepared for it.
+
+Gal~\cite{Gal2017}
+showed that networks \textbf{trained} with dropout are approximate Bayesian
+models. Miller et al. never fine-tuned or properly trained SSD after
+the dropout layers were inserted. Therefore, the Bayesian variant of SSD
+implemented in this thesis is not guaranteed to be such an approximate
+model.
+
+These results further question the reported results of Miller et al., who
+reported significantly better results of dropout sampling compared to vanilla
+SSD. Admittedly, they used the network not on COCO but SceneNet RGB-D~\cite{McCormac2017}. However, they also claim that no fine-tuning
+for SceneNet took place. Applying SSD to an unknown data set should result
+in overall worse performance. Attempts to replicate their work on SceneNet RGB-D
+failed with miserable results even for vanilla SSD, further attempts for this
+thesis were not made. But Miller et al. used
+a different implementation of SSD, therefore, it is possible that their
+implementation worked on SceneNet without fine-tuning.
+
+\subsection*{Sampling and Observations}
+
+It is remarkable that the Bayesian variant with disabled dropout and
+non-maximum suppression performed better than vanilla SSD with respect to
+open set errors. This indicates a relevant impact of multiple forward
+passes and the grouping of observations on the result. With disabled
+dropout, the ten forward passes should all produce the same results,
+resulting in ten identical detections for every detection in vanilla SSD.
+The variation in the result can only originate from the grouping into
+observations.
+
+All detections that overlap by at least 95\% with each other
+are grouped into an observation. For every ten identical detections one
+observation should be the result. However, due to the 95\% overlap rather than
+100\%, more than ten detections could be grouped together. This would result
+in fewer overall observations compared to the number of detections
+in vanilla SSD. Such a lower number reduces the chance for the network
+to make mistakes.
+
+\section*{Outlook}
+
+The attempted replication of the work of Miller et al. raises a series of
+questions that cannot be answered in this thesis. This thesis offers
+one possible implementation of dropout sampling that technically works.
+However, this thesis cannot answer why this implementation differs significantly
+from Miller et al. The complete source code or otherwise exhaustive
+implementation details would be required to attempt an answer.
+
+Future work could explore the performance of this implementation when used
+on an SSD variant that was fine-tuned or trained with dropout. In this case, it
+should also look into the impact of training with both dropout and batch
+normalisation.
+Other avenues include the application to other data sets or object detection
+networks.
+
+To facilitate future work based on this thesis, the source code will be
+made available and an installable Python package will be uploaded to the
+PyPi package index. In the appendices can be found more details about the
+source code implementation and further figures.