Finished results section

Signed-off-by: Jim Martens <github@2martens.de>
2019-09-10 11:58:26 +02:00
parent 0e7f735517
commit c848b09ac2
1 changed files with 64 additions and 17 deletions
--- a/body.tex
+++ b/body.tex
@ -671,6 +671,7 @@ However, in case of a class imbalance the macro averaging
 favours classes with few detections whereas micro averaging benefits classes
 with many detections.

+\subsection{Micro Averaging}
 \begin{table}[ht]
    \begin{tabular}{rcccc}
        \hline
@ -690,7 +691,7 @@ with many detections.
            % 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
        \hline
    \end{tabular}
-    \caption{Results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
+    \caption{Rounded results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
    their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
    entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 0.5,
    and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy
@ -701,26 +702,42 @@ with many detections.
    \label{tab:results-micro}
 \end{table}

-In both cases, vanilla SSD with a per-class confidence threshold of 0.2
-performs best (see tables \ref{tab:results-micro} and \ref{tab:results-macro})
-with a maximum \(F_1\) score of 0.376/0.375 (always micro/macro) compared to both vanilla SSD with a per-class
-threshold of 0.01 (0.255/0.370) and vanilla SSD with entropy thresholding (0.255/0.370).
-It has the fewest open set errors (2939/1218 to 3176/1426 and 3168/1373), and the best recall
-(0.382/0.338 to 0.214/0.328 and 0.214/0.329). For micro averaging it has the best precision (0.372 to 0.318 and 0.318), macro averaging is won by vanilla SSD with entropy test (0.425 to 0.424 and 0.424).
-This shows: a higher per-class confidence threshold removes many bad detections and hence the end
-result is that much better. These comparisons also show that the network is
-not very uncertain. The best performing entropy threshold is not any better than
-the corresponding vanilla SSD without entropy threshold. Therefore, in this
-case the per-class confidence score is far more important for the result.
+Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
+table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score
+(0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither
+the vanilla SSD variant with a confidence threshold of 0.01 nor the SSD with
+an entropy test can outperform the 0.2 variant. Among the vanilla SSD variants,
+the 0.2 variant also has the lowest number of open set errors (2939) and the
+highest precision (0.372).
+
+The comparison of the vanilla SSD variants with a confidence threshold of 0.01
+shows no significant impact of an entropy test. Only the open set errors
+are lower but in an insignificant way. The rest of the performance metrics is
+identical after rounding.

 The results for Bayesian SSD show a massive impact of the existance of
-non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006 (without NMS)
-with micro averaging and 0.363 (with NMS) to 0.006 (without NMS) with macro averaging.
-Dropout was disabled in both cases, making them effectively a vanilla SSD run
-with multiple forward passes. Therefore, the low number of open set errors with
+non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006
+(without NMS). Dropout was disabled in both cases, making them effectively a
+vanilla SSD run with multiple forward passes.
+Therefore, the low number of open set errors with
 micro averaging (164 without NMS) does not qualify as a good result and is not
 marked bold, although it is the lowest number.

+With 2335 open set errors, the Bayesian SSD variant with disabled dropout and
+enabled non-maximum suppression offers the best performance with respect
+to open set errors. It also has the best precision (0.378) of all tested
+variants. Furthermore, it provides the best performance among all variants
+with multiple forward passes except for recall.
+
+Dropout decreases the performance of the network, this can be seen
+in the lower \(F_1\) scores, higher open set errors, and lower precision
+values. The variant with 0.9 keep ratio outperforms all other Bayesian
+variants with respect to recall (0.367). The variant with 0.5 keep
+ratio has worse recall (0.342) than the variant with disabled dropout.
+However, all variants with multiple forward passes have lower open set errors
+than all vanilla SSD variants.
+
+\subsection{Macro Averaging}

 \begin{table}[ht]
    \begin{tabular}{rcccc}
@ -742,7 +759,7 @@ marked bold, although it is the lowest number.
            % 1.7 for 8, 2.0 for 9
        \hline
    \end{tabular}
-    \caption{Results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
+    \caption{Rounded results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
    their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
    entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 0.7,
    and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy
@ -752,6 +769,36 @@ marked bold, although it is the lowest number.
    \label{tab:results-macro}
 \end{table}

+Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
+table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score
+(0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD
+with an entropy test slightly outperforms the 0.2 variant with respect to
+precision (0.425). Additionally, this is the best precision overall. Among
+the vanilla SSD variants, the 0.2 variant also has the lowest
+number of open set errors (1218).
+
+The comparison of the vanilla SSD variants with a confidence threshold of 0.01
+shows no significant impact of an entropy test. Only the open set errors
+are lower but in an insignificant way. The rest of the performance metrics is
+almost identical after rounding.
+
+The results for Bayesian SSD show a massive impact of the existance of
+non-maximum suppression: maximum \(F_1\) score of 0.363 (with NMS) to 0.006
+(without NMS). Dropout was disabled in both cases, making them effectively a
+vanilla SSD run with multiple forward passes.
+
+With 1057 open set errors, the Bayesian SSD variant with disabled dropout and
+enabled non-maximum suppression offers the best performance with respect
+to open set errors. It also has the best \(F_1\) score (0.363) and best
+precision (0.420) of all Bayesian variants, and ties with the 0.9 keep ratio
+variant on recall (0.321).
+
+Dropout decreases the performance of the network, this can be seen
+in the lower \(F_1\) scores, higher open set errors, and lower precision and
+recall values. However, all variants with multiple forward passes and
+non-maximum suppression have lower open set errors than all vanilla SSD
+variants.
+
 \chapter{Discussion}

 \label{chap:discussion}