Added more detail to class-specific results

Signed-off-by: Jim Martens <github@2martens.de>
2019-10-01 12:36:44 +02:00
parent ffbfd48d9c
commit dbfd527f30
1 changed files with 43 additions and 11 deletions
--- a/body.tex
+++ b/body.tex
@ -904,17 +904,14 @@ they had the exact same performance before rounding.
    \label{tab:results-persons}
 \end{table}
-It is clearly visible that the overall trend continues in the individual
+The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
-classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two \gls{vanilla} \gls{SSD} variants with only 0.01 confidence
+best in the persons class with a max \(F_1\) score of 0.460, as well as
-threshold perform better than in the averaged results presented earlier.
+recall of 0.405 and precision of 0.533 at the max \(F_1\) score.
-Only in the chairs class, a Bayesian \gls{SSD} variant performs better (in
+It shares the first place in recall with the \gls{vanilla} \gls{SSD}
-precision) than any of the \gls{vanilla} \gls{SSD} variants. Moreover, there are
+variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants
-multiple classes where two or all of the \gls{vanilla} \gls{SSD} variants perform
+perform worse than the \gls{vanilla} \gls{SSD} variants (see table
-equally well. When compared with the macro averaged results,
+\ref{tab:results-persons}). With respect to the macro averaged result,
-giraffes and persons perform better across the board. Cars have a higher
+all variants perform better than the average of all classes.
 precision than average but lower recall values for all but the Bayesian
 SSD variant without \gls{NMS} and dropout. Chairs and bottles perform
 worse than average.
 \begin{table}[tbp]
    \begin{tabular}{rccc}
@ -941,6 +938,18 @@ worse than average.
    \label{tab:results-cars}
 \end{table}
 The performance for cars is slightly different (see table
 \ref{tab:results-cars}): the \gls{vanilla} \gls{SSD}
 variant with entropy threshold and 0.01 confidence threshold has
 the best \(F_1\) score and recall. Vanilla SSD with 0.2 confidence
 threshold, however, has the best precision. Both the Bayesian SSD
 variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep
 ratio have a better precision (0.460 and 0.454 respectively) than the
 \gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
 0.453). With respect to the macro averaged result, all variants have
 a better precision than the average and the Bayesian variant without
 \gls{NMS} and dropout also has a better recall and \(F_1\) score.
 \begin{table}[tbp]
    \begin{tabular}{rccc}
        \hline
@ -966,6 +975,13 @@ worse than average.
    \label{tab:results-chairs}
 \end{table}
 The best \(F_1\) score (0.288) and recall (0.251) for the chairs class
 belongs to \gls{vanilla} \gls{SSD} with entropy threshold. Precision
 is mastered by Bayesian SSD with \gls{NMS} and disabled dropout (0.360).
 The variant with 0.9 keep ratio has the second-highest precision (0.343)
 of all variants. Both in \(F_1\) score and recall all Bayesian variants
 are worse than the \gls{vanilla} variants. Compared with the macro averaged
 results, all variants perform worse than the average.
 \begin{table}[tbp]
    \begin{tabular}{rccc}
@ -992,6 +1008,14 @@ worse than average.
    \label{tab:results-bottles}
 \end{table}
 Bottles show similar performance to cars with overall lower numbers
 (see table \ref{tab:results-bottles}).
 Again, all Bayesian variants are worse than all vanilla variants.
 The Bayesian SSD variant with \gls{NMS} and disabled dropout has the
 best \(F_1\) score (0.224) and precision (0.328) among the Bayesian variants; the
 variant with 0.5 keep ratio has the best recall (0.172). All variants
 perform worse than in the averaged results.
 \begin{table}[tbp]
    \begin{tabular}{rccc}
        \hline
@ -1017,6 +1041,14 @@ worse than average.
    \label{tab:results-giraffes}
 \end{table}
 Last but not least the giraffe class (see table
 \ref{tab:results-giraffes}) is analysed. Remarkably, all three
 vanilla SSD variants have the identical performance, even before rounding.
 The Bayesian variant with \gls{NMS} and disabled dropout outperforms
 all the other Bayesian variants with an \(F_1\) score of 0.647,
 recall of 0.642, and 0.654 as precision. All variants perform
 better than in the macro averaged result.
 \subsection{Qualitative Analysis}
 This subsection compares \gls{vanilla} \gls{SSD}