Finished results section
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
81
body.tex
81
body.tex
@ -671,6 +671,7 @@ However, in case of a class imbalance the macro averaging
|
||||
favours classes with few detections whereas micro averaging benefits classes
|
||||
with many detections.
|
||||
|
||||
\subsection{Micro Averaging}
|
||||
\begin{table}[ht]
|
||||
\begin{tabular}{rcccc}
|
||||
\hline
|
||||
@ -690,7 +691,7 @@ with many detections.
|
||||
% 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{Results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
||||
\caption{Rounded results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
||||
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
|
||||
entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 0.5,
|
||||
and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy
|
||||
@ -701,26 +702,42 @@ with many detections.
|
||||
\label{tab:results-micro}
|
||||
\end{table}
|
||||
|
||||
In both cases, vanilla SSD with a per-class confidence threshold of 0.2
|
||||
performs best (see tables \ref{tab:results-micro} and \ref{tab:results-macro})
|
||||
with a maximum \(F_1\) score of 0.376/0.375 (always micro/macro) compared to both vanilla SSD with a per-class
|
||||
threshold of 0.01 (0.255/0.370) and vanilla SSD with entropy thresholding (0.255/0.370).
|
||||
It has the fewest open set errors (2939/1218 to 3176/1426 and 3168/1373), and the best recall
|
||||
(0.382/0.338 to 0.214/0.328 and 0.214/0.329). For micro averaging it has the best precision (0.372 to 0.318 and 0.318), macro averaging is won by vanilla SSD with entropy test (0.425 to 0.424 and 0.424).
|
||||
This shows: a higher per-class confidence threshold removes many bad detections and hence the end
|
||||
result is that much better. These comparisons also show that the network is
|
||||
not very uncertain. The best performing entropy threshold is not any better than
|
||||
the corresponding vanilla SSD without entropy threshold. Therefore, in this
|
||||
case the per-class confidence score is far more important for the result.
|
||||
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
|
||||
table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score
|
||||
(0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither
|
||||
the vanilla SSD variant with a confidence threshold of 0.01 nor the SSD with
|
||||
an entropy test can outperform the 0.2 variant. Among the vanilla SSD variants,
|
||||
the 0.2 variant also has the lowest number of open set errors (2939) and the
|
||||
highest precision (0.372).
|
||||
|
||||
The comparison of the vanilla SSD variants with a confidence threshold of 0.01
|
||||
shows no significant impact of an entropy test. Only the open set errors
|
||||
are lower but in an insignificant way. The rest of the performance metrics is
|
||||
identical after rounding.
|
||||
|
||||
The results for Bayesian SSD show a massive impact of the existance of
|
||||
non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006 (without NMS)
|
||||
with micro averaging and 0.363 (with NMS) to 0.006 (without NMS) with macro averaging.
|
||||
Dropout was disabled in both cases, making them effectively a vanilla SSD run
|
||||
with multiple forward passes. Therefore, the low number of open set errors with
|
||||
non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006
|
||||
(without NMS). Dropout was disabled in both cases, making them effectively a
|
||||
vanilla SSD run with multiple forward passes.
|
||||
Therefore, the low number of open set errors with
|
||||
micro averaging (164 without NMS) does not qualify as a good result and is not
|
||||
marked bold, although it is the lowest number.
|
||||
|
||||
With 2335 open set errors, the Bayesian SSD variant with disabled dropout and
|
||||
enabled non-maximum suppression offers the best performance with respect
|
||||
to open set errors. It also has the best precision (0.378) of all tested
|
||||
variants. Furthermore, it provides the best performance among all variants
|
||||
with multiple forward passes except for recall.
|
||||
|
||||
Dropout decreases the performance of the network, this can be seen
|
||||
in the lower \(F_1\) scores, higher open set errors, and lower precision
|
||||
values. The variant with 0.9 keep ratio outperforms all other Bayesian
|
||||
variants with respect to recall (0.367). The variant with 0.5 keep
|
||||
ratio has worse recall (0.342) than the variant with disabled dropout.
|
||||
However, all variants with multiple forward passes have lower open set errors
|
||||
than all vanilla SSD variants.
|
||||
|
||||
\subsection{Macro Averaging}
|
||||
|
||||
\begin{table}[ht]
|
||||
\begin{tabular}{rcccc}
|
||||
@ -742,7 +759,7 @@ marked bold, although it is the lowest number.
|
||||
% 1.7 for 8, 2.0 for 9
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{Results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
||||
\caption{Rounded results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
||||
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
|
||||
entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 0.7,
|
||||
and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy
|
||||
@ -752,6 +769,36 @@ marked bold, although it is the lowest number.
|
||||
\label{tab:results-macro}
|
||||
\end{table}
|
||||
|
||||
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
|
||||
table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score
|
||||
(0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD
|
||||
with an entropy test slightly outperforms the 0.2 variant with respect to
|
||||
precision (0.425). Additionally, this is the best precision overall. Among
|
||||
the vanilla SSD variants, the 0.2 variant also has the lowest
|
||||
number of open set errors (1218).
|
||||
|
||||
The comparison of the vanilla SSD variants with a confidence threshold of 0.01
|
||||
shows no significant impact of an entropy test. Only the open set errors
|
||||
are lower but in an insignificant way. The rest of the performance metrics is
|
||||
almost identical after rounding.
|
||||
|
||||
The results for Bayesian SSD show a massive impact of the existance of
|
||||
non-maximum suppression: maximum \(F_1\) score of 0.363 (with NMS) to 0.006
|
||||
(without NMS). Dropout was disabled in both cases, making them effectively a
|
||||
vanilla SSD run with multiple forward passes.
|
||||
|
||||
With 1057 open set errors, the Bayesian SSD variant with disabled dropout and
|
||||
enabled non-maximum suppression offers the best performance with respect
|
||||
to open set errors. It also has the best \(F_1\) score (0.363) and best
|
||||
precision (0.420) of all Bayesian variants, and ties with the 0.9 keep ratio
|
||||
variant on recall (0.321).
|
||||
|
||||
Dropout decreases the performance of the network, this can be seen
|
||||
in the lower \(F_1\) scores, higher open set errors, and lower precision and
|
||||
recall values. However, all variants with multiple forward passes and
|
||||
non-maximum suppression have lower open set errors than all vanilla SSD
|
||||
variants.
|
||||
|
||||
\chapter{Discussion}
|
||||
|
||||
\label{chap:discussion}
|
||||
|
||||
Reference in New Issue
Block a user