Added more detail to class-specific results

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
2019-10-01 12:36:44 +02:00
parent ffbfd48d9c
commit dbfd527f30

View File

@ -904,17 +904,14 @@ they had the exact same performance before rounding.
\label{tab:results-persons} \label{tab:results-persons}
\end{table} \end{table}
It is clearly visible that the overall trend continues in the individual The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two \gls{vanilla} \gls{SSD} variants with only 0.01 confidence best in the persons class with a max \(F_1\) score of 0.460, as well as
threshold perform better than in the averaged results presented earlier. recall of 0.405 and precision of 0.533 at the max \(F_1\) score.
Only in the chairs class, a Bayesian \gls{SSD} variant performs better (in It shares the first place in recall with the \gls{vanilla} \gls{SSD}
precision) than any of the \gls{vanilla} \gls{SSD} variants. Moreover, there are variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants
multiple classes where two or all of the \gls{vanilla} \gls{SSD} variants perform perform worse than the \gls{vanilla} \gls{SSD} variants (see table
equally well. When compared with the macro averaged results, \ref{tab:results-persons}). With respect to the macro averaged result,
giraffes and persons perform better across the board. Cars have a higher all variants perform better than the average of all classes.
precision than average but lower recall values for all but the Bayesian
SSD variant without \gls{NMS} and dropout. Chairs and bottles perform
worse than average.
\begin{table}[tbp] \begin{table}[tbp]
\begin{tabular}{rccc} \begin{tabular}{rccc}
@ -941,6 +938,18 @@ worse than average.
\label{tab:results-cars} \label{tab:results-cars}
\end{table} \end{table}
The performance for cars is slightly different (see table
\ref{tab:results-cars}): the \gls{vanilla} \gls{SSD}
variant with entropy threshold and 0.01 confidence threshold has
the best \(F_1\) score and recall. Vanilla SSD with 0.2 confidence
threshold, however, has the best precision. Both the Bayesian SSD
variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep
ratio have a better precision (0.460 and 0.454 respectively) than the
\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
0.453). With respect to the macro averaged result, all variants have
a better precision than the average and the Bayesian variant without
\gls{NMS} and dropout also has a better recall and \(F_1\) score.
\begin{table}[tbp] \begin{table}[tbp]
\begin{tabular}{rccc} \begin{tabular}{rccc}
\hline \hline
@ -966,6 +975,13 @@ worse than average.
\label{tab:results-chairs} \label{tab:results-chairs}
\end{table} \end{table}
The best \(F_1\) score (0.288) and recall (0.251) for the chairs class
belongs to \gls{vanilla} \gls{SSD} with entropy threshold. Precision
is mastered by Bayesian SSD with \gls{NMS} and disabled dropout (0.360).
The variant with 0.9 keep ratio has the second-highest precision (0.343)
of all variants. Both in \(F_1\) score and recall all Bayesian variants
are worse than the \gls{vanilla} variants. Compared with the macro averaged
results, all variants perform worse than the average.
\begin{table}[tbp] \begin{table}[tbp]
\begin{tabular}{rccc} \begin{tabular}{rccc}
@ -992,6 +1008,14 @@ worse than average.
\label{tab:results-bottles} \label{tab:results-bottles}
\end{table} \end{table}
Bottles show similar performance to cars with overall lower numbers
(see table \ref{tab:results-bottles}).
Again, all Bayesian variants are worse than all vanilla variants.
The Bayesian SSD variant with \gls{NMS} and disabled dropout has the
best \(F_1\) score (0.224) and precision (0.328) among the Bayesian variants; the
variant with 0.5 keep ratio has the best recall (0.172). All variants
perform worse than in the averaged results.
\begin{table}[tbp] \begin{table}[tbp]
\begin{tabular}{rccc} \begin{tabular}{rccc}
\hline \hline
@ -1017,6 +1041,14 @@ worse than average.
\label{tab:results-giraffes} \label{tab:results-giraffes}
\end{table} \end{table}
Last but not least the giraffe class (see table
\ref{tab:results-giraffes}) is analysed. Remarkably, all three
vanilla SSD variants have the identical performance, even before rounding.
The Bayesian variant with \gls{NMS} and disabled dropout outperforms
all the other Bayesian variants with an \(F_1\) score of 0.647,
recall of 0.642, and 0.654 as precision. All variants perform
better than in the macro averaged result.
\subsection{Qualitative Analysis} \subsection{Qualitative Analysis}
This subsection compares \gls{vanilla} \gls{SSD} This subsection compares \gls{vanilla} \gls{SSD}