Added graphs and interpretation for averaging
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
29c2ebbda4
commit
f083224b87
63
body.tex
63
body.tex
|
@ -702,6 +702,20 @@ with many detections.
|
|||
\label{tab:results-micro}
|
||||
\end{table}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\begin{minipage}[t]{0.48\textwidth}
|
||||
\includegraphics[width=\textwidth]{ose-f1-all-micro}
|
||||
\caption{Micro averaged \(F_1\) score versus open set error for each variant. Perfect performance is an \(F_1\) score of 1 and an absolute OSE of 0.}
|
||||
\label{fig:ose-f1-micro}
|
||||
\end{minipage}%
|
||||
\hfill
|
||||
\begin{minipage}[t]{0.48\textwidth}
|
||||
\includegraphics[width=\textwidth]{precision-recall-all-micro}
|
||||
\caption{Micro averaged precision-recall curves for each variant tested.}
|
||||
\label{fig:precision-recall-micro}
|
||||
\end{minipage}
|
||||
\end{figure}
|
||||
|
||||
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
|
||||
table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score
|
||||
(0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither
|
||||
|
@ -737,6 +751,16 @@ ratio has worse recall (0.342) than the variant with disabled dropout.
|
|||
However, all variants with multiple forward passes have lower open set errors
|
||||
than all vanilla SSD variants.
|
||||
|
||||
The relation of \(F_1\) score to absolute open set error can be observed
|
||||
in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants
|
||||
can be seen in figure \ref{fig:precision-recall-micro}. Both vanilla SSD
|
||||
variants with 0.01 confidence threshold reach much higher open set errors
|
||||
and a higher recall. This behaviour is expected as more and worse predictions
|
||||
are included. The Bayesian variant without non-maximum suppression was not
|
||||
plotted.
|
||||
All plotted variants show a similar behaviour that is in line with previously
|
||||
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
||||
|
||||
\subsection{Macro Averaging}
|
||||
|
||||
\begin{table}[t]
|
||||
|
@ -769,6 +793,20 @@ than all vanilla SSD variants.
|
|||
\label{tab:results-macro}
|
||||
\end{table}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\begin{minipage}[t]{0.48\textwidth}
|
||||
\includegraphics[width=\textwidth]{ose-f1-all-macro}
|
||||
\caption{Macro averaged \(F_1\) score versus open set error for each variant. Perfect performance is an \(F_1\) score of 1 and an absolute OSE of 0.}
|
||||
\label{fig:ose-f1-macro}
|
||||
\end{minipage}%
|
||||
\hfill
|
||||
\begin{minipage}[t]{0.48\textwidth}
|
||||
\includegraphics[width=\textwidth]{precision-recall-all-macro}
|
||||
\caption{Macro averaged precision-recall curves for each variant tested.}
|
||||
\label{fig:precision-recall-macro}
|
||||
\end{minipage}
|
||||
\end{figure}
|
||||
|
||||
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
|
||||
table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score
|
||||
(0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD
|
||||
|
@ -799,6 +837,16 @@ recall values. However, all variants with multiple forward passes and
|
|||
non-maximum suppression have lower open set errors than all vanilla SSD
|
||||
variants.
|
||||
|
||||
The relation of \(F_1\) score to absolute open set error can be observed
|
||||
in figure \ref{fig:ose-f1-macro}. Precision-recall curves for all variants
|
||||
can be seen in figure \ref{fig:precision-recall-macro}. Both vanilla SSD
|
||||
variants with 0.01 confidence threshold reach much higher open set errors
|
||||
and a higher recall. This behaviour is expected as more and worse predictions
|
||||
are included. The Bayesian variant without non-maximum suppression was not
|
||||
plotted.
|
||||
All plotted variants show a similar behaviour that is in line with previously
|
||||
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
|
||||
|
||||
\chapter{Discussion and Outlook}
|
||||
|
||||
\label{chap:discussion}
|
||||
|
@ -812,6 +860,21 @@ The results clearly do not support the hypothesis: \textit{Dropout sampling deli
|
|||
is no area where dropout sampling performs better than vanilla SSD. In the
|
||||
remainder of the section the individual results will be interpreted.
|
||||
|
||||
\subsection*{Impact of averaging}
|
||||
|
||||
Micro and macro averaging create largely similar results. Notably, micro
|
||||
averaging has a significant performance increase towards the end
|
||||
of the list of predictions. This is signaled by the near horizontal movement
|
||||
of the plot in both the \(F_1\) versus absolute open set error graph (see figure \ref{fig:ose-f1-micro}) and
|
||||
the precision-recall curve (see figure \ref{fig:precision-recall-micro}).
|
||||
There are potentially true positive detections of one class that significantly
|
||||
improve recall when compared to all detections across the classes but are
|
||||
insignificant when solely compared to other detections of their own class.
|
||||
|
||||
Furthermore, the plotted behaviour implies that Miller et al.~\cite{Miller2018}
|
||||
use macro averaging in their paper as the unique behaviour of micro
|
||||
averaging was not reported in their paper.
|
||||
|
||||
\subsection*{Impact of Entropy}
|
||||
|
||||
There is no visible impact of entropy thresholding on the object detection
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 60 KiB |
Binary file not shown.
After Width: | Height: | Size: 58 KiB |
Binary file not shown.
After Width: | Height: | Size: 57 KiB |
Binary file not shown.
After Width: | Height: | Size: 54 KiB |
Loading…
Reference in New Issue