Added explanation for NMS and top k
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
37
body.tex
37
body.tex
@ -931,7 +931,7 @@ threshold is not the largest threshold tested. A lower threshold likely
|
|||||||
eliminated some false positives from the result set. On the other hand a
|
eliminated some false positives from the result set. On the other hand a
|
||||||
too low threshold likely eliminated true positives as well.
|
too low threshold likely eliminated true positives as well.
|
||||||
|
|
||||||
\subsection*{Non-maximum suppression}
|
\subsection*{Non-maximum suppression and top \(k\)}
|
||||||
|
|
||||||
Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
|
Miller et al.~\cite{Miller2018} supposedly did not use non-maximum suppression
|
||||||
in their implementation of dropout sampling. Therefore, a variant with disabled
|
in their implementation of dropout sampling. Therefore, a variant with disabled
|
||||||
@ -943,10 +943,39 @@ a lot more false positives remain and have a negative impact on precision.
|
|||||||
In combination with top \(k\) selection, recall can be affected:
|
In combination with top \(k\) selection, recall can be affected:
|
||||||
duplicate detections could stay and maxima boxes could be removed.
|
duplicate detections could stay and maxima boxes could be removed.
|
||||||
|
|
||||||
|
The number of observations was measured before and after the entropy threshold/NMS filter: both Bayesian SSD without
|
||||||
|
NMS and dropout, and Bayesian SSD with NMS and disabled dropout
|
||||||
|
have the same number of observations everywhere before the entropy threshold. After the entropy threshold (the value 1.5 was used for both) and NMS, the variant with NMS has roughly 23\% of its observations left.
|
||||||
|
Without NMS 79\% of observations are left. Irrespective of the absolute
|
||||||
|
number, this discrepancy clearly shows the impact of non-maximum suppression and also explains a higher count of false positives:
|
||||||
|
more than 50\% of the original observations were removed with NMS and
|
||||||
|
stayed without - all of these are very likely to be false positives.
|
||||||
|
|
||||||
A clear distinction between micro and macro averaging can be observed:
|
A clear distinction between micro and macro averaging can be observed:
|
||||||
recall is hardly effected with micro averaging (0.300) but goes down equally with
|
recall is hardly effected with micro averaging (0.300) but goes down equally with macro averaging (0.229). For micro averaging, it does
|
||||||
macro averaging (0.229). % TODO: explain why micro and macro differ in result
|
not matter which class the true positives belong to: every detection
|
||||||
% TODO: give evidence for claim that more false positives are left without NMS
|
counts the same way. This also means that top \(k\) will have only
|
||||||
|
a marginal effect: some true positives might be removed without NMS but overall that does not have a big impact. With macro averaging, however,
|
||||||
|
the class of the true positives matters a lot: for example, if two
|
||||||
|
true positives are removed from a class with only few true positives
|
||||||
|
to begin with than their removal will have a drastic influence on
|
||||||
|
the class recall value and hence the overall result.
|
||||||
|
|
||||||
|
The impact of top \(k\) was measured by counting the number of observations
|
||||||
|
after top \(k\) has been applied: the variant with NMS keeps about 94\%
|
||||||
|
of the observations left after NMS, without NMS only about 59\% of observations
|
||||||
|
are kept. This shows a significant impact on the result by top \(k\)
|
||||||
|
in the case of disabled non-maximum suppression. Furthermore, some
|
||||||
|
classes are hit harder by top \(k\) then others: for example,
|
||||||
|
dogs keep around 82\% of the observations but persons only 57\%.
|
||||||
|
This indicates that detected dogs are mostly on images with few detections
|
||||||
|
overall and/or have a high enough prediction confidence to be
|
||||||
|
kept by top \(k\). However, persons are likely often on images
|
||||||
|
with many detections and/or have too low confidences.
|
||||||
|
In this example, the likelihood for true positives to be removed in
|
||||||
|
the person category is quite high. For dogs, the probability is far lower.
|
||||||
|
This goes back to micro and macro averaging and their impact on recall.
|
||||||
|
|
||||||
|
|
||||||
\subsection*{Dropout Sampling and Observations}
|
\subsection*{Dropout Sampling and Observations}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user