From fdcf1af62e77902f425166f1bcc3af2d76a7c837 Mon Sep 17 00:00:00 2001
From: Jim Martens <github@2martens.de>
Date: Wed, 25 Sep 2019 12:34:20 +0200
Subject: [PATCH] Added explanation for entropy threshold differences

Signed-off-by: Jim Martens <github@2martens.de>
---
 body.tex | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/body.tex b/body.tex
index 09c6cab..c28b9b5 100644
--- a/body.tex
+++ b/body.tex
@@ -964,9 +964,15 @@ has a high confidence in one class---including the background.
 However, the entropy plays a larger role for the Bayesian variants---as
 expected: the best performing thresholds are 1.0, 1.3, and 1.4 for micro averaging,
 and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
-threshold is not the largest threshold tested. A lower threshold likely
-eliminated some false positives from the result set. On the other hand a
-too low threshold likely eliminated true positives as well.
+threshold is not the largest threshold tested.
+
+This is caused by a simple phenomenon: at some point most or all true
+positives are in and a higher entropy threshold only adds more false
+positives. Such a behaviour is indicated by a stagnating recall for the
+higher entropy levels. For the low entropy thresholds, the low recall
+is dominating the \(F_1\) score, the sweet spot is somewhere in the
+middle. For macro averaging, it holds that a higher optimal entropy
+threshold indicates a worse performance.
 
 \subsection*{Non-Maximum Suppression and Top \(k\)}