diff --git a/body.tex b/body.tex index 321d156..c8c6b76 100644 --- a/body.tex +++ b/body.tex @@ -799,10 +799,107 @@ recall values. However, all variants with multiple forward passes and non-maximum suppression have lower open set errors than all vanilla SSD variants. -\chapter{Discussion} +\chapter{Discussion and Outlook} \label{chap:discussion} +First the results will be discussed, then possible future research and open +questions will be addressed. -\chapter{Closing} -\label{chap:closing} +\section*{Discussion} + +The results clearly do not support the hypothesis: \textit{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it}. With the exception of open set errors, there +is no area where dropout sampling performs better than vanilla SSD. In the +remainder of the section the individual results will be interpreted. + +\subsection*{Impact of Entropy} + +There is no visible impact of entropy thresholding on the object detection +performance for vanilla SSD. This indicates that the network has almost no +uniform or close to uniform predictions, the vast majority of predictions +has a high confidence in one class - including the background. +However, the entropy plays a larger role for the Bayesian variants - as +expected: the best performing thresholds are 1.3 and 1.4 for micro averaging, +and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best +threshold is not the largest threshold tested. A lower threshold likely +eliminated some false positives from the result set. On the other hand a +too low threshold likely eliminated true positives as well. + +\subsection*{Non-maximum suppression} + +Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression +in their implementation of dropout sampling. Therefore, a variant with disabled +non-maximum suppression (NMS) was tested. The disastrous results heavily imply +that NMS is crucial and pose serious questions about the implementation of +Miller et al., who still have not released source code. + +Without NMS all detections passing the per-class confidence threshold are +directly ordered in descending order by their confidence value. Afterwards the +top \(k\) detections are kept. This enables the following scenario: +the first top \(k\) detections all belong to the same class and potentially +object. Detections of other classes and objects could be discarded, reducing +recall in the process. Multiple detections of the same object also increase +the number of false positives, further reducing the \(F_1\) score. + +\subsection*{Dropout} + +The dropout variants have largely worse performance than the Bayesian variants +without dropout. This is expected as the network was not trained with +dropout and the weights are not prepared for it. + +Gal~\cite{Gal2017} +showed that networks \textbf{trained} with dropout are approximate Bayesian +models. Miller et al. never fine-tuned or properly trained SSD after +the dropout layers were inserted. Therefore, the Bayesian variant of SSD +implemented in this thesis is not guaranteed to be such an approximate +model. + +These results further question the reported results of Miller et al., who +reported significantly better results of dropout sampling compared to vanilla +SSD. Admittedly, they used the network not on COCO but SceneNet RGB-D~\cite{McCormac2017}. However, they also claim that no fine-tuning +for SceneNet took place. Applying SSD to an unknown data set should result +in overall worse performance. Attempts to replicate their work on SceneNet RGB-D +failed with miserable results even for vanilla SSD, further attempts for this +thesis were not made. But Miller et al. used +a different implementation of SSD, therefore, it is possible that their +implementation worked on SceneNet without fine-tuning. + +\subsection*{Sampling and Observations} + +It is remarkable that the Bayesian variant with disabled dropout and +non-maximum suppression performed better than vanilla SSD with respect to +open set errors. This indicates a relevant impact of multiple forward +passes and the grouping of observations on the result. With disabled +dropout, the ten forward passes should all produce the same results, +resulting in ten identical detections for every detection in vanilla SSD. +The variation in the result can only originate from the grouping into +observations. + +All detections that overlap by at least 95\% with each other +are grouped into an observation. For every ten identical detections one +observation should be the result. However, due to the 95\% overlap rather than +100\%, more than ten detections could be grouped together. This would result +in fewer overall observations compared to the number of detections +in vanilla SSD. Such a lower number reduces the chance for the network +to make mistakes. + +\section*{Outlook} + +The attempted replication of the work of Miller et al. raises a series of +questions that cannot be answered in this thesis. This thesis offers +one possible implementation of dropout sampling that technically works. +However, this thesis cannot answer why this implementation differs significantly +from Miller et al. The complete source code or otherwise exhaustive +implementation details would be required to attempt an answer. + +Future work could explore the performance of this implementation when used +on an SSD variant that was fine-tuned or trained with dropout. In this case, it +should also look into the impact of training with both dropout and batch +normalisation. +Other avenues include the application to other data sets or object detection +networks. + +To facilitate future work based on this thesis, the source code will be +made available and an installable Python package will be uploaded to the +PyPi package index. In the appendices can be found more details about the +source code implementation and further figures.