Added class-specific results

Signed-off-by: Jim Martens <github@2martens.de>
2019-09-26 13:39:17 +02:00
parent fd7908a064
commit e2a7fcb16c
1 changed files with 153 additions and 0 deletions
--- a/body.tex
+++ b/body.tex
@ -861,6 +861,159 @@ are included.
 All plotted variants show a similar behaviour that is in line with previously
 reported figures, such as the ones in Miller et al.~\cite{Miller2018}

+\subsection{Class-specific results}
+
+As mentioned before, the data set is imbalanced with respect to its
+classes: four classes make up roughly 50\% of all ground truth
+detections. Therefore, it is interesting to see the performance
+of the tested variants with respect to these classes: persons, cars,
+chairs, and bottles. Additionally, the results of the giraffe class are
+presented as these are exceptionally good, although the class makes up
+only 0.7\% of the ground truth.
+
+\begin{table}[htbp]
+    \begin{tabular}{rccc}
+        \hline
+        Forward & max & Recall & Precision\\
+        Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
+            \hline
+            vanilla SSD - 0.01 conf & 0.460 & \textbf{0.405} & 0.532 \\
+            vanilla SSD - 0.2 conf & \textbf{0.460} & \textbf{0.405} & \textbf{0.533} \\
+            SSD with Entropy test - 0.01 conf & 0.460 & 0.405 & 0.532 \\
+            % entropy thresh: 1.7 for vanilla SSD is best
+            \hline
+            Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.272 & 0.292 & 0.256 \\
+            no dropout - 0.2 conf - NMS \; 10 & 0.451 & 0.403 & 0.514 \\
+            0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.447 & 0.401 & 0.505 \\
+            0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.410 & 0.368 & 0.465 \\
+            % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
+            % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
+            % 1.7 for 8, 2.0 for 9
+        \hline
+    \end{tabular}
+    \caption{Rounded results for persons class. SSD with Entropy test and Bayesian SSD are represented with
+    their best performing macro averaging entropy threshold with respect to \(F_1\) score.}
+    \label{tab:results-persons}
+\end{table}
+
+It is clearly visible that the overall trend continues in the individual
+classes (see tables \ref{tab:results-persons} through \ref{tab:results-giraffes}). However, the two vanilla SSD variants with only 0.01 confidence
+threshold perform better than in the averaged results presented earlier.
+Only in the chairs class, a Bayesian SSD variant performs better (in
+precision) than any of the vanilla SSD variants. Moreover, there are
+multiple classes where two or all of the vanilla SSD variants perform
+equally well. When compared with the macro averaged results,
+giraffes and persons perform better across the board. Cars have a higher
+precision than average but lower recall values for all but the Bayesian
+SSD variant without NMS and dropout. Chairs and bottles perform
+worse than average.
+
+The giraffe class illustrates the difference between macro and micro
+averaging very well: in macro averaging, the persons class and the giraffe
+class have the same impact. With micro averaging, the outliers of the
+giraffe class are negligible.
+
+\begin{table}[htbp]
+    \begin{tabular}{rccc}
+        \hline
+        Forward & max & Recall & Precision\\
+        Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
+            \hline
+            vanilla SSD - 0.01 conf & 0.364 & \textbf{0.305} & 0.452 \\
+            vanilla SSD - 0.2 conf & 0.363 & 0.294 & \textbf{0.476} \\
+            SSD with Entropy test - 0.01 conf & \textbf{0.364} & \textbf{0.305} & 0.453 \\
+            % entropy thresh: 1.7 for vanilla SSD is best
+            \hline
+            Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.236 & 0.244 & 0.229 \\
+            no dropout - 0.2 conf - NMS \; 10 & 0.336 & 0.266 & 0.460 \\
+            0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.332 & 0.262 & 0.454 \\
+            0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.309 & 0.264 & 0.374 \\
+            % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
+            % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
+            % 1.7 for 8, 2.0 for 9
+        \hline
+    \end{tabular}
+    \caption{Rounded results for cars class. SSD with Entropy test and Bayesian SSD are represented with
+    their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
+    \label{tab:results-cars}
+\end{table}
+
+\begin{table}[htbp]
+    \begin{tabular}{rccc}
+        \hline
+        Forward & max & Recall & Precision\\
+        Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
+            \hline
+            vanilla SSD - 0.01 conf & 0.287 & \textbf{0.251} & 0.335 \\
+            vanilla SSD - 0.2 conf & 0.283 & 0.242 & 0.341 \\
+            SSD with Entropy test - 0.01 conf & \textbf{0.288} & \textbf{0.251} & 0.338 \\
+            % entropy thresh: 1.7 for vanilla SSD is best
+            \hline
+            Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.172 & 0.168 & 0.178 \\
+            no dropout - 0.2 conf - NMS \; 10 & 0.280 & 0.229 & \textbf{0.360} \\
+            0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.274 & 0.228 & 0.343 \\
+            0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.240 & 0.220 & 0.265 \\
+            % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
+            % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
+            % 1.7 for 8, 2.0 for 9
+        \hline
+    \end{tabular}
+    \caption{Rounded results for chairs class. SSD with Entropy test and Bayesian SSD are represented with
+    their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
+    \label{tab:results-chairs}
+\end{table}
+
+
+\begin{table}[htbp]
+    \begin{tabular}{rccc}
+        \hline
+        Forward & max & Recall & Precision\\
+        Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
+            \hline
+            vanilla SSD - 0.01 conf & 0.233 & \textbf{0.175} & 0.348 \\
+            vanilla SSD - 0.2 conf & 0.231 & 0.173 & \textbf{0.350} \\
+            SSD with Entropy test - 0.01 conf & \textbf{0.233} & \textbf{0.175} & 0.350 \\
+            % entropy thresh: 1.7 for vanilla SSD is best
+            \hline
+            Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.160 & 0.140 & 0.188 \\
+            no dropout - 0.2 conf - NMS \; 10 & 0.224 & 0.170 & 0.328 \\
+            0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.220 & 0.170 & 0.311 \\
+            0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.202 & 0.172 & 0.245 \\
+            % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
+            % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
+            % 1.7 for 8, 2.0 for 9
+        \hline
+    \end{tabular}
+    \caption{Rounded results for bottles class. SSD with Entropy test and Bayesian SSD are represented with
+    their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
+    \label{tab:results-bottles}
+\end{table}
+
+\begin{table}[htbp]
+    \begin{tabular}{rccc}
+        \hline
+        Forward & max & Recall & Precision\\
+        Passes & \(F_1\) Score & \multicolumn{2}{c}{at max \(F_1\) point} \\
+            \hline
+            vanilla SSD - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
+            vanilla SSD - 0.2 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
+            SSD with Entropy test - 0.01 conf & \textbf{0.650} & \textbf{0.647} & \textbf{0.655} \\
+            % entropy thresh: 1.7 for vanilla SSD is best
+            \hline
+            Bay. SSD - no DO - 0.2 conf - no NMS \; 10 & 0.415 & 0.414 & 0.417 \\
+            no dropout - 0.2 conf - NMS \; 10 & 0.647 & 0.642 & 0.654 \\
+            0.9 keep ratio - 0.2 conf - NMS \; 10 & 0.637 & 0.634 & 0.642 \\
+            0.5 keep ratio - 0.2 conf - NMS \; 10 & 0.586 & 0.578 & 0.596 \\
+            % entropy thresh: 1.2 for Bayesian - 2 is best, 0.4 for 3
+            % entropy thresh: 0.7 for Bayesian - 6 is best, 1.5 for 7
+            % 1.7 for 8, 2.0 for 9
+        \hline
+    \end{tabular}
+    \caption{Rounded results for giraffe class. SSD with Entropy test and Bayesian SSD are represented with
+    their best performing macro averaging entropy threshold with respect to \(F_1\) score. }
+    \label{tab:results-giraffes}
+\end{table}
+
 \subsection*{Qualitative Analysis}

 % TODO: expand