Finished results section

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-09-10 11:58:26 +02:00
parent 0e7f735517
commit c848b09ac2

View File

@ -671,6 +671,7 @@ However, in case of a class imbalance the macro averaging
favours classes with few detections whereas micro averaging benefits classes
with many detections.
\subsection{Micro Averaging}
\begin{table}[ht]
\begin{tabular}{rcccc}
\hline
@ -690,7 +691,7 @@ with many detections.
% 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
\hline
\end{tabular}
\caption{Results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
\caption{Rounded results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 0.5,
and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy
@ -701,26 +702,42 @@ with many detections.
\label{tab:results-micro}
\end{table}
In both cases, vanilla SSD with a per-class confidence threshold of 0.2
performs best (see tables \ref{tab:results-micro} and \ref{tab:results-macro})
with a maximum \(F_1\) score of 0.376/0.375 (always micro/macro) compared to both vanilla SSD with a per-class
threshold of 0.01 (0.255/0.370) and vanilla SSD with entropy thresholding (0.255/0.370).
It has the fewest open set errors (2939/1218 to 3176/1426 and 3168/1373), and the best recall
(0.382/0.338 to 0.214/0.328 and 0.214/0.329). For micro averaging it has the best precision (0.372 to 0.318 and 0.318), macro averaging is won by vanilla SSD with entropy test (0.425 to 0.424 and 0.424).
This shows: a higher per-class confidence threshold removes many bad detections and hence the end
result is that much better. These comparisons also show that the network is
not very uncertain. The best performing entropy threshold is not any better than
the corresponding vanilla SSD without entropy threshold. Therefore, in this
case the per-class confidence score is far more important for the result.
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score
(0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither
the vanilla SSD variant with a confidence threshold of 0.01 nor the SSD with
an entropy test can outperform the 0.2 variant. Among the vanilla SSD variants,
the 0.2 variant also has the lowest number of open set errors (2939) and the
highest precision (0.372).
The comparison of the vanilla SSD variants with a confidence threshold of 0.01
shows no significant impact of an entropy test. Only the open set errors
are lower but in an insignificant way. The rest of the performance metrics is
identical after rounding.
The results for Bayesian SSD show a massive impact of the existance of
non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006 (without NMS)
with micro averaging and 0.363 (with NMS) to 0.006 (without NMS) with macro averaging.
Dropout was disabled in both cases, making them effectively a vanilla SSD run
with multiple forward passes. Therefore, the low number of open set errors with
non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006
(without NMS). Dropout was disabled in both cases, making them effectively a
vanilla SSD run with multiple forward passes.
Therefore, the low number of open set errors with
micro averaging (164 without NMS) does not qualify as a good result and is not
marked bold, although it is the lowest number.
With 2335 open set errors, the Bayesian SSD variant with disabled dropout and
enabled non-maximum suppression offers the best performance with respect
to open set errors. It also has the best precision (0.378) of all tested
variants. Furthermore, it provides the best performance among all variants
with multiple forward passes except for recall.
Dropout decreases the performance of the network, this can be seen
in the lower \(F_1\) scores, higher open set errors, and lower precision
values. The variant with 0.9 keep ratio outperforms all other Bayesian
variants with respect to recall (0.367). The variant with 0.5 keep
ratio has worse recall (0.342) than the variant with disabled dropout.
However, all variants with multiple forward passes have lower open set errors
than all vanilla SSD variants.
\subsection{Macro Averaging}
\begin{table}[ht]
\begin{tabular}{rcccc}
@ -742,7 +759,7 @@ marked bold, although it is the lowest number.
% 1.7 for 8, 2.0 for 9
\hline
\end{tabular}
\caption{Results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
\caption{Rounded results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 0.7,
and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy
@ -752,6 +769,36 @@ marked bold, although it is the lowest number.
\label{tab:results-macro}
\end{table}
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score
(0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD
with an entropy test slightly outperforms the 0.2 variant with respect to
precision (0.425). Additionally, this is the best precision overall. Among
the vanilla SSD variants, the 0.2 variant also has the lowest
number of open set errors (1218).
The comparison of the vanilla SSD variants with a confidence threshold of 0.01
shows no significant impact of an entropy test. Only the open set errors
are lower but in an insignificant way. The rest of the performance metrics is
almost identical after rounding.
The results for Bayesian SSD show a massive impact of the existance of
non-maximum suppression: maximum \(F_1\) score of 0.363 (with NMS) to 0.006
(without NMS). Dropout was disabled in both cases, making them effectively a
vanilla SSD run with multiple forward passes.
With 1057 open set errors, the Bayesian SSD variant with disabled dropout and
enabled non-maximum suppression offers the best performance with respect
to open set errors. It also has the best \(F_1\) score (0.363) and best
precision (0.420) of all Bayesian variants, and ties with the 0.9 keep ratio
variant on recall (0.321).
Dropout decreases the performance of the network, this can be seen
in the lower \(F_1\) scores, higher open set errors, and lower precision and
recall values. However, all variants with multiple forward passes and
non-maximum suppression have lower open set errors than all vanilla SSD
variants.
\chapter{Discussion}
\label{chap:discussion}