From dbfd527f3041a4710a40506007b3c9ec21ad0ca4 Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Tue, 1 Oct 2019 12:36:44 +0200 Subject: [PATCH] Added more detail to class-specific results Signed-off-by: Jim Martens --- body.tex | 54 +++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 11 deletions(-) diff --git a/body.tex b/body.tex index cd738b6..59c3289 100644 --- a/body.tex +++ b/body.tex @@ -904,17 +904,14 @@ they had the exact same performance before rounding. \label{tab:results-persons} \end{table} -It is clearly visible that the overall trend continues in the individual -classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two \gls{vanilla} \gls{SSD} variants with only 0.01 confidence -threshold perform better than in the averaged results presented earlier. -Only in the chairs class, a Bayesian \gls{SSD} variant performs better (in -precision) than any of the \gls{vanilla} \gls{SSD} variants. Moreover, there are -multiple classes where two or all of the \gls{vanilla} \gls{SSD} variants perform -equally well. When compared with the macro averaged results, -giraffes and persons perform better across the board. Cars have a higher -precision than average but lower recall values for all but the Bayesian -SSD variant without \gls{NMS} and dropout. Chairs and bottles perform -worse than average. +The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs +best in the persons class with a max \(F_1\) score of 0.460, as well as +recall of 0.405 and precision of 0.533 at the max \(F_1\) score. +It shares the first place in recall with the \gls{vanilla} \gls{SSD} +variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants +perform worse than the \gls{vanilla} \gls{SSD} variants (see table +\ref{tab:results-persons}). With respect to the macro averaged result, +all variants perform better than the average of all classes. \begin{table}[tbp] \begin{tabular}{rccc} @@ -941,6 +938,18 @@ worse than average. \label{tab:results-cars} \end{table} +The performance for cars is slightly different (see table +\ref{tab:results-cars}): the \gls{vanilla} \gls{SSD} +variant with entropy threshold and 0.01 confidence threshold has +the best \(F_1\) score and recall. Vanilla SSD with 0.2 confidence +threshold, however, has the best precision. Both the Bayesian SSD +variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep +ratio have a better precision (0.460 and 0.454 respectively) than the +\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and +0.453). With respect to the macro averaged result, all variants have +a better precision than the average and the Bayesian variant without +\gls{NMS} and dropout also has a better recall and \(F_1\) score. + \begin{table}[tbp] \begin{tabular}{rccc} \hline @@ -966,6 +975,13 @@ worse than average. \label{tab:results-chairs} \end{table} +The best \(F_1\) score (0.288) and recall (0.251) for the chairs class +belongs to \gls{vanilla} \gls{SSD} with entropy threshold. Precision +is mastered by Bayesian SSD with \gls{NMS} and disabled dropout (0.360). +The variant with 0.9 keep ratio has the second-highest precision (0.343) +of all variants. Both in \(F_1\) score and recall all Bayesian variants +are worse than the \gls{vanilla} variants. Compared with the macro averaged +results, all variants perform worse than the average. \begin{table}[tbp] \begin{tabular}{rccc} @@ -992,6 +1008,14 @@ worse than average. \label{tab:results-bottles} \end{table} +Bottles show similar performance to cars with overall lower numbers +(see table \ref{tab:results-bottles}). +Again, all Bayesian variants are worse than all vanilla variants. +The Bayesian SSD variant with \gls{NMS} and disabled dropout has the +best \(F_1\) score (0.224) and precision (0.328) among the Bayesian variants; the +variant with 0.5 keep ratio has the best recall (0.172). All variants +perform worse than in the averaged results. + \begin{table}[tbp] \begin{tabular}{rccc} \hline @@ -1017,6 +1041,14 @@ worse than average. \label{tab:results-giraffes} \end{table} +Last but not least the giraffe class (see table +\ref{tab:results-giraffes}) is analysed. Remarkably, all three +vanilla SSD variants have the identical performance, even before rounding. +The Bayesian variant with \gls{NMS} and disabled dropout outperforms +all the other Bayesian variants with an \(F_1\) score of 0.647, +recall of 0.642, and 0.654 as precision. All variants perform +better than in the macro averaged result. + \subsection{Qualitative Analysis} This subsection compares \gls{vanilla} \gls{SSD}