Written discussion and outlook

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
Jim Martens 2019-09-10 15:46:31 +02:00
parent 0af1033567
commit 29c2ebbda4
1 changed files with 100 additions and 3 deletions

103
body.tex
View File

@ -799,10 +799,107 @@ recall values. However, all variants with multiple forward passes and
non-maximum suppression have lower open set errors than all vanilla SSD
variants.
\chapter{Discussion}
\chapter{Discussion and Outlook}
\label{chap:discussion}
First the results will be discussed, then possible future research and open
questions will be addressed.
\chapter{Closing}
\label{chap:closing}
\section*{Discussion}
The results clearly do not support the hypothesis: \textit{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it}. With the exception of open set errors, there
is no area where dropout sampling performs better than vanilla SSD. In the
remainder of the section the individual results will be interpreted.
\subsection*{Impact of Entropy}
There is no visible impact of entropy thresholding on the object detection
performance for vanilla SSD. This indicates that the network has almost no
uniform or close to uniform predictions, the vast majority of predictions
has a high confidence in one class - including the background.
However, the entropy plays a larger role for the Bayesian variants - as
expected: the best performing thresholds are 1.3 and 1.4 for micro averaging,
and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
threshold is not the largest threshold tested. A lower threshold likely
eliminated some false positives from the result set. On the other hand a
too low threshold likely eliminated true positives as well.
\subsection*{Non-maximum suppression}
Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
in their implementation of dropout sampling. Therefore, a variant with disabled
non-maximum suppression (NMS) was tested. The disastrous results heavily imply
that NMS is crucial and pose serious questions about the implementation of
Miller et al., who still have not released source code.
Without NMS all detections passing the per-class confidence threshold are
directly ordered in descending order by their confidence value. Afterwards the
top \(k\) detections are kept. This enables the following scenario:
the first top \(k\) detections all belong to the same class and potentially
object. Detections of other classes and objects could be discarded, reducing
recall in the process. Multiple detections of the same object also increase
the number of false positives, further reducing the \(F_1\) score.
\subsection*{Dropout}
The dropout variants have largely worse performance than the Bayesian variants
without dropout. This is expected as the network was not trained with
dropout and the weights are not prepared for it.
Gal~\cite{Gal2017}
showed that networks \textbf{trained} with dropout are approximate Bayesian
models. Miller et al. never fine-tuned or properly trained SSD after
the dropout layers were inserted. Therefore, the Bayesian variant of SSD
implemented in this thesis is not guaranteed to be such an approximate
model.
These results further question the reported results of Miller et al., who
reported significantly better results of dropout sampling compared to vanilla
SSD. Admittedly, they used the network not on COCO but SceneNet RGB-D~\cite{McCormac2017}. However, they also claim that no fine-tuning
for SceneNet took place. Applying SSD to an unknown data set should result
in overall worse performance. Attempts to replicate their work on SceneNet RGB-D
failed with miserable results even for vanilla SSD, further attempts for this
thesis were not made. But Miller et al. used
a different implementation of SSD, therefore, it is possible that their
implementation worked on SceneNet without fine-tuning.
\subsection*{Sampling and Observations}
It is remarkable that the Bayesian variant with disabled dropout and
non-maximum suppression performed better than vanilla SSD with respect to
open set errors. This indicates a relevant impact of multiple forward
passes and the grouping of observations on the result. With disabled
dropout, the ten forward passes should all produce the same results,
resulting in ten identical detections for every detection in vanilla SSD.
The variation in the result can only originate from the grouping into
observations.
All detections that overlap by at least 95\% with each other
are grouped into an observation. For every ten identical detections one
observation should be the result. However, due to the 95\% overlap rather than
100\%, more than ten detections could be grouped together. This would result
in fewer overall observations compared to the number of detections
in vanilla SSD. Such a lower number reduces the chance for the network
to make mistakes.
\section*{Outlook}
The attempted replication of the work of Miller et al. raises a series of
questions that cannot be answered in this thesis. This thesis offers
one possible implementation of dropout sampling that technically works.
However, this thesis cannot answer why this implementation differs significantly
from Miller et al. The complete source code or otherwise exhaustive
implementation details would be required to attempt an answer.
Future work could explore the performance of this implementation when used
on an SSD variant that was fine-tuned or trained with dropout. In this case, it
should also look into the impact of training with both dropout and batch
normalisation.
Other avenues include the application to other data sets or object detection
networks.
To facilitate future work based on this thesis, the source code will be
made available and an installable Python package will be uploaded to the
PyPi package index. In the appendices can be found more details about the
source code implementation and further figures.