Finished results section
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
81
body.tex
81
body.tex
@ -671,6 +671,7 @@ However, in case of a class imbalance the macro averaging
|
|||||||
favours classes with few detections whereas micro averaging benefits classes
|
favours classes with few detections whereas micro averaging benefits classes
|
||||||
with many detections.
|
with many detections.
|
||||||
|
|
||||||
|
\subsection{Micro Averaging}
|
||||||
\begin{table}[ht]
|
\begin{table}[ht]
|
||||||
\begin{tabular}{rcccc}
|
\begin{tabular}{rcccc}
|
||||||
\hline
|
\hline
|
||||||
@ -690,7 +691,7 @@ with many detections.
|
|||||||
% 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
|
% 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
|
||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\caption{Results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
\caption{Rounded results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
||||||
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
|
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
|
||||||
entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 0.5,
|
entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 0.5,
|
||||||
and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy
|
and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy
|
||||||
@ -701,26 +702,42 @@ with many detections.
|
|||||||
\label{tab:results-micro}
|
\label{tab:results-micro}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
In both cases, vanilla SSD with a per-class confidence threshold of 0.2
|
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
|
||||||
performs best (see tables \ref{tab:results-micro} and \ref{tab:results-macro})
|
table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score
|
||||||
with a maximum \(F_1\) score of 0.376/0.375 (always micro/macro) compared to both vanilla SSD with a per-class
|
(0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither
|
||||||
threshold of 0.01 (0.255/0.370) and vanilla SSD with entropy thresholding (0.255/0.370).
|
the vanilla SSD variant with a confidence threshold of 0.01 nor the SSD with
|
||||||
It has the fewest open set errors (2939/1218 to 3176/1426 and 3168/1373), and the best recall
|
an entropy test can outperform the 0.2 variant. Among the vanilla SSD variants,
|
||||||
(0.382/0.338 to 0.214/0.328 and 0.214/0.329). For micro averaging it has the best precision (0.372 to 0.318 and 0.318), macro averaging is won by vanilla SSD with entropy test (0.425 to 0.424 and 0.424).
|
the 0.2 variant also has the lowest number of open set errors (2939) and the
|
||||||
This shows: a higher per-class confidence threshold removes many bad detections and hence the end
|
highest precision (0.372).
|
||||||
result is that much better. These comparisons also show that the network is
|
|
||||||
not very uncertain. The best performing entropy threshold is not any better than
|
The comparison of the vanilla SSD variants with a confidence threshold of 0.01
|
||||||
the corresponding vanilla SSD without entropy threshold. Therefore, in this
|
shows no significant impact of an entropy test. Only the open set errors
|
||||||
case the per-class confidence score is far more important for the result.
|
are lower but in an insignificant way. The rest of the performance metrics is
|
||||||
|
identical after rounding.
|
||||||
|
|
||||||
The results for Bayesian SSD show a massive impact of the existance of
|
The results for Bayesian SSD show a massive impact of the existance of
|
||||||
non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006 (without NMS)
|
non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006
|
||||||
with micro averaging and 0.363 (with NMS) to 0.006 (without NMS) with macro averaging.
|
(without NMS). Dropout was disabled in both cases, making them effectively a
|
||||||
Dropout was disabled in both cases, making them effectively a vanilla SSD run
|
vanilla SSD run with multiple forward passes.
|
||||||
with multiple forward passes. Therefore, the low number of open set errors with
|
Therefore, the low number of open set errors with
|
||||||
micro averaging (164 without NMS) does not qualify as a good result and is not
|
micro averaging (164 without NMS) does not qualify as a good result and is not
|
||||||
marked bold, although it is the lowest number.
|
marked bold, although it is the lowest number.
|
||||||
|
|
||||||
|
With 2335 open set errors, the Bayesian SSD variant with disabled dropout and
|
||||||
|
enabled non-maximum suppression offers the best performance with respect
|
||||||
|
to open set errors. It also has the best precision (0.378) of all tested
|
||||||
|
variants. Furthermore, it provides the best performance among all variants
|
||||||
|
with multiple forward passes except for recall.
|
||||||
|
|
||||||
|
Dropout decreases the performance of the network, this can be seen
|
||||||
|
in the lower \(F_1\) scores, higher open set errors, and lower precision
|
||||||
|
values. The variant with 0.9 keep ratio outperforms all other Bayesian
|
||||||
|
variants with respect to recall (0.367). The variant with 0.5 keep
|
||||||
|
ratio has worse recall (0.342) than the variant with disabled dropout.
|
||||||
|
However, all variants with multiple forward passes have lower open set errors
|
||||||
|
than all vanilla SSD variants.
|
||||||
|
|
||||||
|
\subsection{Macro Averaging}
|
||||||
|
|
||||||
\begin{table}[ht]
|
\begin{table}[ht]
|
||||||
\begin{tabular}{rcccc}
|
\begin{tabular}{rcccc}
|
||||||
@ -742,7 +759,7 @@ marked bold, although it is the lowest number.
|
|||||||
% 1.7 for 8, 2.0 for 9
|
% 1.7 for 8, 2.0 for 9
|
||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\caption{Results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
\caption{Rounded results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
|
||||||
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
|
their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
|
||||||
entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 0.7,
|
entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 0.7,
|
||||||
and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy
|
and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy
|
||||||
@ -752,6 +769,36 @@ marked bold, although it is the lowest number.
|
|||||||
\label{tab:results-macro}
|
\label{tab:results-macro}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
|
||||||
|
table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score
|
||||||
|
(0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD
|
||||||
|
with an entropy test slightly outperforms the 0.2 variant with respect to
|
||||||
|
precision (0.425). Additionally, this is the best precision overall. Among
|
||||||
|
the vanilla SSD variants, the 0.2 variant also has the lowest
|
||||||
|
number of open set errors (1218).
|
||||||
|
|
||||||
|
The comparison of the vanilla SSD variants with a confidence threshold of 0.01
|
||||||
|
shows no significant impact of an entropy test. Only the open set errors
|
||||||
|
are lower but in an insignificant way. The rest of the performance metrics is
|
||||||
|
almost identical after rounding.
|
||||||
|
|
||||||
|
The results for Bayesian SSD show a massive impact of the existance of
|
||||||
|
non-maximum suppression: maximum \(F_1\) score of 0.363 (with NMS) to 0.006
|
||||||
|
(without NMS). Dropout was disabled in both cases, making them effectively a
|
||||||
|
vanilla SSD run with multiple forward passes.
|
||||||
|
|
||||||
|
With 1057 open set errors, the Bayesian SSD variant with disabled dropout and
|
||||||
|
enabled non-maximum suppression offers the best performance with respect
|
||||||
|
to open set errors. It also has the best \(F_1\) score (0.363) and best
|
||||||
|
precision (0.420) of all Bayesian variants, and ties with the 0.9 keep ratio
|
||||||
|
variant on recall (0.321).
|
||||||
|
|
||||||
|
Dropout decreases the performance of the network, this can be seen
|
||||||
|
in the lower \(F_1\) scores, higher open set errors, and lower precision and
|
||||||
|
recall values. However, all variants with multiple forward passes and
|
||||||
|
non-maximum suppression have lower open set errors than all vanilla SSD
|
||||||
|
variants.
|
||||||
|
|
||||||
\chapter{Discussion}
|
\chapter{Discussion}
|
||||||
|
|
||||||
\label{chap:discussion}
|
\label{chap:discussion}
|
||||||
|
|||||||
Reference in New Issue
Block a user