Added more detail to class-specific results
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
54
body.tex
54
body.tex
@ -904,17 +904,14 @@ they had the exact same performance before rounding.
|
|||||||
\label{tab:results-persons}
|
\label{tab:results-persons}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
It is clearly visible that the overall trend continues in the individual
|
The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
|
||||||
classes (see tables \ref{tab:results-persons}, \ref{tab:results-cars}, \ref{tab:results-chairs}, \ref{tab:results-bottles}, and \ref{tab:results-giraffes}). However, the two \gls{vanilla} \gls{SSD} variants with only 0.01 confidence
|
best in the persons class with a max \(F_1\) score of 0.460, as well as
|
||||||
threshold perform better than in the averaged results presented earlier.
|
recall of 0.405 and precision of 0.533 at the max \(F_1\) score.
|
||||||
Only in the chairs class, a Bayesian \gls{SSD} variant performs better (in
|
It shares the first place in recall with the \gls{vanilla} \gls{SSD}
|
||||||
precision) than any of the \gls{vanilla} \gls{SSD} variants. Moreover, there are
|
variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants
|
||||||
multiple classes where two or all of the \gls{vanilla} \gls{SSD} variants perform
|
perform worse than the \gls{vanilla} \gls{SSD} variants (see table
|
||||||
equally well. When compared with the macro averaged results,
|
\ref{tab:results-persons}). With respect to the macro averaged result,
|
||||||
giraffes and persons perform better across the board. Cars have a higher
|
all variants perform better than the average of all classes.
|
||||||
precision than average but lower recall values for all but the Bayesian
|
|
||||||
SSD variant without \gls{NMS} and dropout. Chairs and bottles perform
|
|
||||||
worse than average.
|
|
||||||
|
|
||||||
\begin{table}[tbp]
|
\begin{table}[tbp]
|
||||||
\begin{tabular}{rccc}
|
\begin{tabular}{rccc}
|
||||||
@ -941,6 +938,18 @@ worse than average.
|
|||||||
\label{tab:results-cars}
|
\label{tab:results-cars}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
The performance for cars is slightly different (see table
|
||||||
|
\ref{tab:results-cars}): the \gls{vanilla} \gls{SSD}
|
||||||
|
variant with entropy threshold and 0.01 confidence threshold has
|
||||||
|
the best \(F_1\) score and recall. Vanilla SSD with 0.2 confidence
|
||||||
|
threshold, however, has the best precision. Both the Bayesian SSD
|
||||||
|
variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep
|
||||||
|
ratio have a better precision (0.460 and 0.454 respectively) than the
|
||||||
|
\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
|
||||||
|
0.453). With respect to the macro averaged result, all variants have
|
||||||
|
a better precision than the average and the Bayesian variant without
|
||||||
|
\gls{NMS} and dropout also has a better recall and \(F_1\) score.
|
||||||
|
|
||||||
\begin{table}[tbp]
|
\begin{table}[tbp]
|
||||||
\begin{tabular}{rccc}
|
\begin{tabular}{rccc}
|
||||||
\hline
|
\hline
|
||||||
@ -966,6 +975,13 @@ worse than average.
|
|||||||
\label{tab:results-chairs}
|
\label{tab:results-chairs}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
The best \(F_1\) score (0.288) and recall (0.251) for the chairs class
|
||||||
|
belongs to \gls{vanilla} \gls{SSD} with entropy threshold. Precision
|
||||||
|
is mastered by Bayesian SSD with \gls{NMS} and disabled dropout (0.360).
|
||||||
|
The variant with 0.9 keep ratio has the second-highest precision (0.343)
|
||||||
|
of all variants. Both in \(F_1\) score and recall all Bayesian variants
|
||||||
|
are worse than the \gls{vanilla} variants. Compared with the macro averaged
|
||||||
|
results, all variants perform worse than the average.
|
||||||
|
|
||||||
\begin{table}[tbp]
|
\begin{table}[tbp]
|
||||||
\begin{tabular}{rccc}
|
\begin{tabular}{rccc}
|
||||||
@ -992,6 +1008,14 @@ worse than average.
|
|||||||
\label{tab:results-bottles}
|
\label{tab:results-bottles}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
Bottles show similar performance to cars with overall lower numbers
|
||||||
|
(see table \ref{tab:results-bottles}).
|
||||||
|
Again, all Bayesian variants are worse than all vanilla variants.
|
||||||
|
The Bayesian SSD variant with \gls{NMS} and disabled dropout has the
|
||||||
|
best \(F_1\) score (0.224) and precision (0.328) among the Bayesian variants; the
|
||||||
|
variant with 0.5 keep ratio has the best recall (0.172). All variants
|
||||||
|
perform worse than in the averaged results.
|
||||||
|
|
||||||
\begin{table}[tbp]
|
\begin{table}[tbp]
|
||||||
\begin{tabular}{rccc}
|
\begin{tabular}{rccc}
|
||||||
\hline
|
\hline
|
||||||
@ -1017,6 +1041,14 @@ worse than average.
|
|||||||
\label{tab:results-giraffes}
|
\label{tab:results-giraffes}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
Last but not least the giraffe class (see table
|
||||||
|
\ref{tab:results-giraffes}) is analysed. Remarkably, all three
|
||||||
|
vanilla SSD variants have the identical performance, even before rounding.
|
||||||
|
The Bayesian variant with \gls{NMS} and disabled dropout outperforms
|
||||||
|
all the other Bayesian variants with an \(F_1\) score of 0.647,
|
||||||
|
recall of 0.642, and 0.654 as precision. All variants perform
|
||||||
|
better than in the macro averaged result.
|
||||||
|
|
||||||
\subsection{Qualitative Analysis}
|
\subsection{Qualitative Analysis}
|
||||||
|
|
||||||
This subsection compares \gls{vanilla} \gls{SSD}
|
This subsection compares \gls{vanilla} \gls{SSD}
|
||||||
|
|||||||
Reference in New Issue
Block a user