Finished results section

Signed-off-by: Jim Martens <github@2martens.de>
2019-09-10 11:58:26 +02:00
parent 0e7f735517
commit c848b09ac2
1 changed files with 64 additions and 17 deletions
--- a/body.tex
+++ b/body.tex
@ -671,6 +671,7 @@ However, in case of a class imbalance the macro averaging
 favours classes with few detections whereas micro averaging benefits classes
 with many detections.
 \subsection{Micro Averaging}
 \begin{table}[ht]
    \begin{tabular}{rcccc}
        \hline
@ -690,7 +691,7 @@ with many detections.
            % 0.5 for Bayesian - 6, 1.4 for 7, 1.4 for 8, 1.3 for 9
        \hline
    \end{tabular}
-    \caption{Results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
+    \caption{Rounded results for micro averaging. SSD with Entropy test and Bayesian SSD are represented with
    their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
    entropy threshold of 2.4, Bayesian SSD without non-maximum suppression performed best for 0.5,
    and Bayesian SSD with non-maximum suppression performed best for 1.4 as entropy
@ -701,26 +702,42 @@ with many detections.
    \label{tab:results-micro}
 \end{table}
-In both cases, vanilla SSD with a per-class confidence threshold of 0.2
+Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
-performs best (see tables \ref{tab:results-micro} and \ref{tab:results-macro})
+table \ref{tab:results-micro}) with respect to the maximum \(F_1\) score
-with a maximum \(F_1\) score of 0.376/0.375 (always micro/macro) compared to both vanilla SSD with a per-class
+(0.376) and recall at the maximum \(F_1\) point (0.382). In comparison, neither
-threshold of 0.01 (0.255/0.370) and vanilla SSD with entropy thresholding (0.255/0.370).
+the vanilla SSD variant with a confidence threshold of 0.01 nor the SSD with
-It has the fewest open set errors (2939/1218 to 3176/1426 and 3168/1373), and the best recall
+an entropy test can outperform the 0.2 variant. Among the vanilla SSD variants,
-(0.382/0.338 to 0.214/0.328 and 0.214/0.329). For micro averaging it has the best precision (0.372 to 0.318 and 0.318), macro averaging is won by vanilla SSD with entropy test (0.425 to 0.424 and 0.424).
+the 0.2 variant also has the lowest number of open set errors (2939) and the
-This shows: a higher per-class confidence threshold removes many bad detections and hence the end
+highest precision (0.372).
-result is that much better. These comparisons also show that the network is
+
-not very uncertain. The best performing entropy threshold is not any better than
+The comparison of the vanilla SSD variants with a confidence threshold of 0.01
-the corresponding vanilla SSD without entropy threshold. Therefore, in this
+shows no significant impact of an entropy test. Only the open set errors
-case the per-class confidence score is far more important for the result.
+are lower but in an insignificant way. The rest of the performance metrics is
 identical after rounding.
 The results for Bayesian SSD show a massive impact of the existance of
-non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006 (without NMS)
+non-maximum suppression: maximum \(F_1\) score of 0.371 (with NMS) to 0.006
-with micro averaging and 0.363 (with NMS) to 0.006 (without NMS) with macro averaging.
+(without NMS). Dropout was disabled in both cases, making them effectively a
-Dropout was disabled in both cases, making them effectively a vanilla SSD run
+vanilla SSD run with multiple forward passes.
-with multiple forward passes. Therefore, the low number of open set errors with
+Therefore, the low number of open set errors with
 micro averaging (164 without NMS) does not qualify as a good result and is not
 marked bold, although it is the lowest number.
 With 2335 open set errors, the Bayesian SSD variant with disabled dropout and
 enabled non-maximum suppression offers the best performance with respect
 to open set errors. It also has the best precision (0.378) of all tested
 variants. Furthermore, it provides the best performance among all variants
 with multiple forward passes except for recall.
 Dropout decreases the performance of the network, this can be seen
 in the lower \(F_1\) scores, higher open set errors, and lower precision
 values. The variant with 0.9 keep ratio outperforms all other Bayesian
 variants with respect to recall (0.367). The variant with 0.5 keep
 ratio has worse recall (0.342) than the variant with disabled dropout.
 However, all variants with multiple forward passes have lower open set errors
 than all vanilla SSD variants.
 \subsection{Macro Averaging}
 \begin{table}[ht]
    \begin{tabular}{rcccc}
@ -742,7 +759,7 @@ marked bold, although it is the lowest number.
            % 1.7 for 8, 2.0 for 9
        \hline
    \end{tabular}
-    \caption{Results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
+    \caption{Rounded results for macro averaging. SSD with Entropy test and Bayesian SSD are represented with
    their best performing entropy threshold. Vanilla SSD with Entropy test performed best with an
    entropy threshold of 1.7, Bayesian SSD without non-maximum suppression performed best for 0.7,
    and Bayesian SSD with non-maximum suppression performed best for 1.5 as entropy
@ -752,6 +769,36 @@ marked bold, although it is the lowest number.
    \label{tab:results-macro}
 \end{table}
 Vanilla SSD with a per-class confidence threshold of 0.2 performs best (see
 table \ref{tab:results-macro}) with respect to the maximum \(F_1\) score
 (0.375) and recall at the maximum \(F_1\) point (0.338). In comparison, the SSD
 with an entropy test slightly outperforms the 0.2 variant with respect to
 precision (0.425). Additionally, this is the best precision overall. Among
 the vanilla SSD variants, the 0.2 variant also has the lowest
 number of open set errors (1218).
 The comparison of the vanilla SSD variants with a confidence threshold of 0.01
 shows no significant impact of an entropy test. Only the open set errors
 are lower but in an insignificant way. The rest of the performance metrics is
 almost identical after rounding.
 The results for Bayesian SSD show a massive impact of the existance of
 non-maximum suppression: maximum \(F_1\) score of 0.363 (with NMS) to 0.006
 (without NMS). Dropout was disabled in both cases, making them effectively a
 vanilla SSD run with multiple forward passes.
 With 1057 open set errors, the Bayesian SSD variant with disabled dropout and
 enabled non-maximum suppression offers the best performance with respect
 to open set errors. It also has the best \(F_1\) score (0.363) and best
 precision (0.420) of all Bayesian variants, and ties with the 0.9 keep ratio
 variant on recall (0.321).
 Dropout decreases the performance of the network, this can be seen
 in the lower \(F_1\) scores, higher open set errors, and lower precision and
 recall values. However, all variants with multiple forward passes and
 non-maximum suppression have lower open set errors than all vanilla SSD
 variants.
 \chapter{Discussion}
 \label{chap:discussion}