Added explanation for averaging
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
79acbf78bf
commit
4a608bcef6
31
body.tex
31
body.tex
|
@ -920,9 +920,34 @@ averaging has a significant performance increase towards the end
|
|||
of the list of predictions. This is signaled by the near horizontal movement
|
||||
of the plot in both the \(F_1\) versus absolute open set error graph (see figure \ref{fig:ose-f1-micro}) and
|
||||
the precision-recall curve (see figure \ref{fig:precision-recall-micro}).
|
||||
There are potentially true positive detections of one class that significantly
|
||||
improve recall when compared to all detections across the classes but are
|
||||
insignificant when solely compared to other detections of their own class.
|
||||
|
||||
This behaviour is caused by a large imbalance of detections between
|
||||
the classes. For vanilla SSD with 0.2 confidence threshold there are
|
||||
a total of 36,863 detections after non-maximum suppression and top \(k\).
|
||||
The persons class contributes 14,640 detections or around 40\% to that number. Another strong class is cars with 2,252 detections or around
|
||||
6\%. This means that two classes have together almost as many detections
|
||||
as the remaining 58 classes combined.
|
||||
|
||||
In macro averaging, the cumulative precision and recall values are
|
||||
calculated per class and then averaged across all classes. Smaller
|
||||
classes quickly reach high recall values as the total number of
|
||||
ground truth is small as well. The last recall and precision value
|
||||
of the smaller classes is repeated to achieve homogenity with the largest
|
||||
class. As a consequence, early on the average recall is quite high. Later on, only the values of the largest class still change which has only
|
||||
a small impact on the overall result.
|
||||
|
||||
Conversely, in micro averaging the cumulative true positives
|
||||
are added up across classes and then divided by the total number of
|
||||
ground truth. Here, the effect is the opposite: the total number of
|
||||
ground truth is very large which means the combined true positives
|
||||
of 58 classes have only a smaller impact on the average recall.
|
||||
As a result, the open set error rises quicker than the \(F_1\) score
|
||||
in micro averaging, creating the sharp rise of open set error at a lower
|
||||
\(F_1\) score than in macro averaging. The open set error
|
||||
reaches a high value early on and changes little afterwards. This allows
|
||||
the \(F_1\) score to catch up and produces the almost horizontal line
|
||||
in the graph. Eventually, the \(F_1\) score decreases again while the
|
||||
open set error further rises a bit.
|
||||
|
||||
Furthermore, the plotted behaviour implies that Miller et al.~\cite{Miller2018}
|
||||
use macro averaging in their paper as the unique behaviour of micro
|
||||
|
|
Loading…
Reference in New Issue