Added explanation for NMS and top k

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-09-18 15:19:56 +02:00
parent 075c35b7d8
commit 8ba241a8d7

View File

@ -931,7 +931,7 @@ threshold is not the largest threshold tested. A lower threshold likely
eliminated some false positives from the result set. On the other hand a
too low threshold likely eliminated true positives as well.
\subsection*{Non-maximum suppression}
\subsection*{Non-maximum suppression and top \(k\)}
Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
in their implementation of dropout sampling. Therefore, a variant with disabled
@ -943,10 +943,39 @@ a lot more false positives remain and have a negative impact on precision.
In combination with top \(k\) selection, recall can be affected:
duplicate detections could stay and maxima boxes could be removed.
The number of observations was measured before and after the entropy threshold/NMS filter: both Bayesian SSD without
NMS and dropout, and Bayesian SSD with NMS and disabled dropout
have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with NMS has roughly 23\% of its observations left.
Without NMS 79\% of observations are left. Irrespective of the absolute
number, this discrepancy clearly shows the impact of non-maximum suppression and also explains a higher count of false positives:
more than 50\% of the original observations were removed with NMS and
stayed without - all of these are very likely to be false positives.
A clear distinction between micro and macro averaging can be observed:
recall is hardly effected with micro averaging (0.300) but goes down equally with
macro averaging (0.229). % TODO: explain why micro and macro differ in result
% TODO: give evidence for claim that more false positives are left without NMS
recall is hardly effected with micro averaging (0.300) but goes down equally with macro averaging (0.229). For micro averaging, it does
not matter which class the true positives belong to: every detection
counts the same way. This also means that top \(k\) will have only
a marginal effect: some true positives might be removed without NMS but overall that does not have a big impact. With macro averaging, however,
the class of the true positives matters a lot: for example, if two
true positives are removed from a class with only few true positives
to begin with than their removal will have a drastic influence on
the class recall value and hence the overall result.
The impact of top \(k\) was measured by counting the number of observations
after top \(k\) has been applied: the variant with NMS keeps about 94\%
of the observations left after NMS, without NMS only about 59\% of observations
are kept. This shows a significant impact on the result by top \(k\)
in the case of disabled non-maximum suppression. Furthermore, some
classes are hit harder by top \(k\) then others: for example,
dogs keep around 82\% of the observations but persons only 57\%.
This indicates that detected dogs are mostly on images with few detections
overall and/or have a high enough prediction confidence to be
kept by top \(k\). However, persons are likely often on images
with many detections and/or have too low confidences.
In this example, the likelihood for true positives to be removed in
the person category is quite high. For dogs, the probability is far lower.
This goes back to micro and macro averaging and their impact on recall.
\subsection*{Dropout Sampling and Observations}