From dbfd527f3041a4710a40506007b3c9ec21ad0ca4 Mon Sep 17 00:00:00 2001
From: Jim Martens <github@2martens.de>
Date: Tue, 1 Oct 2019 12:36:44 +0200
Subject: [PATCH] Added more detail to class-specific results

Signed-off-by: Jim Martens <github@2martens.de>
---
 body.tex | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/body.tex b/body.tex
index cd738b6..59c3289 100644
--- a/body.tex
+++ b/body.tex
@@ -904,17 +904,14 @@ they had the exact same performance before rounding.
     \label{tab:results-persons}
 \end{table}
 
-It is clearly visible that the overall trend continues in the individual
-classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two \gls{vanilla} \gls{SSD} variants with only 0.01 confidence
-threshold perform better than in the averaged results presented earlier.
-Only in the chairs class, a Bayesian \gls{SSD} variant performs better (in
-precision) than any of the \gls{vanilla} \gls{SSD} variants. Moreover, there are
-multiple classes where two or all of the \gls{vanilla} \gls{SSD} variants perform
-equally well. When compared with the macro averaged results,
-giraffes and persons perform better across the board. Cars have a higher
-precision than average but lower recall values for all but the Bayesian
-SSD variant without \gls{NMS} and dropout. Chairs and bottles perform
-worse than average.
+The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
+best in the persons class with a max \(F_1\) score of 0.460, as well as
+recall of 0.405 and precision of 0.533 at the max \(F_1\) score.
+It shares the first place in recall with the \gls{vanilla} \gls{SSD}
+variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants
+perform worse than the \gls{vanilla} \gls{SSD} variants (see table
+\ref{tab:results-persons}). With respect to the macro averaged result,
+all variants perform better than the average of all classes.
 
 \begin{table}[tbp]
     \begin{tabular}{rccc}
@@ -941,6 +938,18 @@ worse than average.
     \label{tab:results-cars}
 \end{table}
 
+The performance for cars is slightly different (see table
+\ref{tab:results-cars}): the \gls{vanilla} \gls{SSD}
+variant with entropy threshold and 0.01 confidence threshold has
+the best \(F_1\) score and recall. Vanilla SSD with 0.2 confidence
+threshold, however, has the best precision. Both the Bayesian SSD
+variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep
+ratio have a better precision (0.460 and 0.454 respectively) than the
+\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
+0.453). With respect to the macro averaged result, all variants have
+a better precision than the average and the Bayesian variant without
+\gls{NMS} and dropout also has a better recall and \(F_1\) score.
+
 \begin{table}[tbp]
     \begin{tabular}{rccc}
         \hline
@@ -966,6 +975,13 @@ worse than average.
     \label{tab:results-chairs}
 \end{table}
 
+The best \(F_1\) score (0.288) and recall (0.251) for the chairs class
+belongs to \gls{vanilla} \gls{SSD} with entropy threshold. Precision
+is mastered by Bayesian SSD with \gls{NMS} and disabled dropout (0.360).
+The variant with 0.9 keep ratio has the second-highest precision (0.343)
+of all variants. Both in \(F_1\) score and recall all Bayesian variants
+are worse than the \gls{vanilla} variants. Compared with the macro averaged
+results, all variants perform worse than the average.
 
 \begin{table}[tbp]
     \begin{tabular}{rccc}
@@ -992,6 +1008,14 @@ worse than average.
     \label{tab:results-bottles}
 \end{table}
 
+Bottles show similar performance to cars with overall lower numbers
+(see table \ref{tab:results-bottles}).
+Again, all Bayesian variants are worse than all vanilla variants.
+The Bayesian SSD variant with \gls{NMS} and disabled dropout has the
+best \(F_1\) score (0.224) and precision (0.328) among the Bayesian variants; the
+variant with 0.5 keep ratio has the best recall (0.172). All variants
+perform worse than in the averaged results.
+
 \begin{table}[tbp]
     \begin{tabular}{rccc}
         \hline
@@ -1017,6 +1041,14 @@ worse than average.
     \label{tab:results-giraffes}
 \end{table}
 
+Last but not least the giraffe class (see table
+\ref{tab:results-giraffes}) is analysed. Remarkably, all three
+vanilla SSD variants have the identical performance, even before rounding.
+The Bayesian variant with \gls{NMS} and disabled dropout outperforms
+all the other Bayesian variants with an \(F_1\) score of 0.647,
+recall of 0.642, and 0.654 as precision. All variants perform
+better than in the macro averaged result.
+
 \subsection{Qualitative Analysis}
 
 This subsection compares \gls{vanilla} \gls{SSD}