Written discussion and outlook
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
0af1033567
commit
29c2ebbda4
103
body.tex
103
body.tex
|
@ -799,10 +799,107 @@ recall values. However, all variants with multiple forward passes and
|
|||
non-maximum suppression have lower open set errors than all vanilla SSD
|
||||
variants.
|
||||
|
||||
\chapter{Discussion}
|
||||
\chapter{Discussion and Outlook}
|
||||
|
||||
\label{chap:discussion}
|
||||
|
||||
First the results will be discussed, then possible future research and open
|
||||
questions will be addressed.
|
||||
|
||||
\chapter{Closing}
|
||||
\label{chap:closing}
|
||||
\section*{Discussion}
|
||||
|
||||
The results clearly do not support the hypothesis: \textit{Dropout sampling delivers better object detection performance under open set conditions compared to object detection without it}. With the exception of open set errors, there
|
||||
is no area where dropout sampling performs better than vanilla SSD. In the
|
||||
remainder of the section the individual results will be interpreted.
|
||||
|
||||
\subsection*{Impact of Entropy}
|
||||
|
||||
There is no visible impact of entropy thresholding on the object detection
|
||||
performance for vanilla SSD. This indicates that the network has almost no
|
||||
uniform or close to uniform predictions, the vast majority of predictions
|
||||
has a high confidence in one class - including the background.
|
||||
However, the entropy plays a larger role for the Bayesian variants - as
|
||||
expected: the best performing thresholds are 1.3 and 1.4 for micro averaging,
|
||||
and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
|
||||
threshold is not the largest threshold tested. A lower threshold likely
|
||||
eliminated some false positives from the result set. On the other hand a
|
||||
too low threshold likely eliminated true positives as well.
|
||||
|
||||
\subsection*{Non-maximum suppression}
|
||||
|
||||
Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
|
||||
in their implementation of dropout sampling. Therefore, a variant with disabled
|
||||
non-maximum suppression (NMS) was tested. The disastrous results heavily imply
|
||||
that NMS is crucial and pose serious questions about the implementation of
|
||||
Miller et al., who still have not released source code.
|
||||
|
||||
Without NMS all detections passing the per-class confidence threshold are
|
||||
directly ordered in descending order by their confidence value. Afterwards the
|
||||
top \(k\) detections are kept. This enables the following scenario:
|
||||
the first top \(k\) detections all belong to the same class and potentially
|
||||
object. Detections of other classes and objects could be discarded, reducing
|
||||
recall in the process. Multiple detections of the same object also increase
|
||||
the number of false positives, further reducing the \(F_1\) score.
|
||||
|
||||
\subsection*{Dropout}
|
||||
|
||||
The dropout variants have largely worse performance than the Bayesian variants
|
||||
without dropout. This is expected as the network was not trained with
|
||||
dropout and the weights are not prepared for it.
|
||||
|
||||
Gal~\cite{Gal2017}
|
||||
showed that networks \textbf{trained} with dropout are approximate Bayesian
|
||||
models. Miller et al. never fine-tuned or properly trained SSD after
|
||||
the dropout layers were inserted. Therefore, the Bayesian variant of SSD
|
||||
implemented in this thesis is not guaranteed to be such an approximate
|
||||
model.
|
||||
|
||||
These results further question the reported results of Miller et al., who
|
||||
reported significantly better results of dropout sampling compared to vanilla
|
||||
SSD. Admittedly, they used the network not on COCO but SceneNet RGB-D~\cite{McCormac2017}. However, they also claim that no fine-tuning
|
||||
for SceneNet took place. Applying SSD to an unknown data set should result
|
||||
in overall worse performance. Attempts to replicate their work on SceneNet RGB-D
|
||||
failed with miserable results even for vanilla SSD, further attempts for this
|
||||
thesis were not made. But Miller et al. used
|
||||
a different implementation of SSD, therefore, it is possible that their
|
||||
implementation worked on SceneNet without fine-tuning.
|
||||
|
||||
\subsection*{Sampling and Observations}
|
||||
|
||||
It is remarkable that the Bayesian variant with disabled dropout and
|
||||
non-maximum suppression performed better than vanilla SSD with respect to
|
||||
open set errors. This indicates a relevant impact of multiple forward
|
||||
passes and the grouping of observations on the result. With disabled
|
||||
dropout, the ten forward passes should all produce the same results,
|
||||
resulting in ten identical detections for every detection in vanilla SSD.
|
||||
The variation in the result can only originate from the grouping into
|
||||
observations.
|
||||
|
||||
All detections that overlap by at least 95\% with each other
|
||||
are grouped into an observation. For every ten identical detections one
|
||||
observation should be the result. However, due to the 95\% overlap rather than
|
||||
100\%, more than ten detections could be grouped together. This would result
|
||||
in fewer overall observations compared to the number of detections
|
||||
in vanilla SSD. Such a lower number reduces the chance for the network
|
||||
to make mistakes.
|
||||
|
||||
\section*{Outlook}
|
||||
|
||||
The attempted replication of the work of Miller et al. raises a series of
|
||||
questions that cannot be answered in this thesis. This thesis offers
|
||||
one possible implementation of dropout sampling that technically works.
|
||||
However, this thesis cannot answer why this implementation differs significantly
|
||||
from Miller et al. The complete source code or otherwise exhaustive
|
||||
implementation details would be required to attempt an answer.
|
||||
|
||||
Future work could explore the performance of this implementation when used
|
||||
on an SSD variant that was fine-tuned or trained with dropout. In this case, it
|
||||
should also look into the impact of training with both dropout and batch
|
||||
normalisation.
|
||||
Other avenues include the application to other data sets or object detection
|
||||
networks.
|
||||
|
||||
To facilitate future work based on this thesis, the source code will be
|
||||
made available and an installable Python package will be uploaded to the
|
||||
PyPi package index. In the appendices can be found more details about the
|
||||
source code implementation and further figures.
|
||||
|
|
Loading…
Reference in New Issue