Written background for dropout sampling
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
7d59b862c1
commit
1e66a6f874
|
@ -144,8 +144,98 @@ with MS COCO classes.
|
||||||
|
|
||||||
\chapter{Background and Research Plan}
|
\chapter{Background and Research Plan}
|
||||||
|
|
||||||
|
This chapter will provide a more in-depth look at the two works
|
||||||
|
this thesis is based upon. First, the dropout sampling introduced
|
||||||
|
by Miller et al\cite{Miller2018} will be showcased. Afterwards
|
||||||
|
the Generative Probabilistic Novelty Detection with Adversarial
|
||||||
|
Autoencoders\cite{Pidhorskyi2018} will be presented. The chapter
|
||||||
|
will conclude with a more detailed explanation of the intended
|
||||||
|
contribution of this thesis.
|
||||||
|
|
||||||
|
The dropout sampling explanation will follow the paper of Miller et
|
||||||
|
al\cite{Miller2018} rather closely including the formulae used
|
||||||
|
in their paper.
|
||||||
|
|
||||||
\section{Dropout Sampling}
|
\section{Dropout Sampling}
|
||||||
|
|
||||||
|
To understand dropout sampling, it is necessary to explain the
|
||||||
|
idea of Bayesian neural networks. They place a prior distribution
|
||||||
|
over the network weights, for example a Gaussian prior distribution:
|
||||||
|
\(\mathbf{W} \sim \mathcal{N}(0, I)\). In this example
|
||||||
|
\(\mathbf{W}\) are the weights and \(I\) symbolises that every
|
||||||
|
weight is drawn from an independent and identical distribution. The
|
||||||
|
training of the network determines a plausible set of weights by
|
||||||
|
evaluating the posterior (probability output) over the weights given
|
||||||
|
the training data: \(p(\mathbf{W}|\mathbf{T})\). However, this
|
||||||
|
evaluation cannot be performed in any reasonable
|
||||||
|
time. Therefore approximation techniques are
|
||||||
|
required. In those techniques the posterior is fitted with a
|
||||||
|
simple distribution \(q^{*}_{\theta}(\mathbf{W})\). The original
|
||||||
|
and intractable problem of averaging over all weights in the network
|
||||||
|
is replaced with an optimisation task, where the parameters of the
|
||||||
|
simple distribution are optimised over\cite{Kendall2017}.
|
||||||
|
|
||||||
|
\subsubsection*{Dropout variational inference}
|
||||||
|
|
||||||
|
Kendall and Gal\cite{Kendall2017} showed an approximation for
|
||||||
|
classfication and recognition tasks. Dropout variational inference
|
||||||
|
is a practical approximation technique by adding dropout layers
|
||||||
|
in front of every weight layer and using them also during test
|
||||||
|
time to sample from the approximate posterior. Effectively, this
|
||||||
|
results in the approximation of the class probability
|
||||||
|
\(p(y|\mathcal{I}, \mathbf{T})\) by performing multiple forward
|
||||||
|
passes through the network and averaging over the obtained Softmax
|
||||||
|
scores \(\mathbf{s}_i\), given an image \(\mathcal{I}\) and the
|
||||||
|
training data \(\mathbf{T}\):
|
||||||
|
|
||||||
|
\begin{equation} \label{eq:drop-sampling}
|
||||||
|
p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
With this dropout sampling technique \(n\) model weights
|
||||||
|
\(\widetilde{\mathbf{W}}_i\) are sampled from the posterior
|
||||||
|
\(p(\mathbf{W}|\mathbf{T})\). The class probability
|
||||||
|
\(p(y|\mathcal{I}, \mathbf{T})\) is a probability vector
|
||||||
|
\(\mathbf{q}\) over all class labels. Finally, the uncertainty
|
||||||
|
of the network with respect to the classification is given by
|
||||||
|
the entropy \(H(\mathbf{q}) = - \sum_i q_i \cdot \log q_i\).
|
||||||
|
|
||||||
|
\subsubsection*{Dropout sampling for object detection}
|
||||||
|
|
||||||
|
Miller et al\cite{Miller2018} apply the dropout sampling to
|
||||||
|
object detection. In that case \(\mathbf{W}\) represents the
|
||||||
|
learned weights of a detection network like SSD\cite{Liu2016}.
|
||||||
|
Every forward pass uses a different network
|
||||||
|
\(\widetilde{\mathbf{W}}\) which is approximately sampled from
|
||||||
|
\(p(\mathbf{W}|\mathbf{T})\). Each forward pass in object
|
||||||
|
detection results in a set of detections, each consisting of bounding
|
||||||
|
box coordinates \(\mathbf{b}\) and softmax score \(\mathbf{s}\).
|
||||||
|
The detections are denoted by Miller et al as \(D_i =
|
||||||
|
\{\mathbf{s}_i,\mathbf{b}_i\}\). The detections of all passes are put
|
||||||
|
into a large set \(\mathfrak{D} = \{D_1, ..., D_2\}\).
|
||||||
|
|
||||||
|
All detections with mutual intersection-over-union scores (IoU)
|
||||||
|
of \(0.95\) or higher are defined as an observation \(\mathcal{O}_i\).
|
||||||
|
Subsequently, the corresponding vector of class probabilities
|
||||||
|
\(\mathbf{q}_i\) for the observation is calculated by averaging all
|
||||||
|
score vectors \(\mathbf{s}_j\) in a particular observation
|
||||||
|
\(\mathcal{O}_i\): \(\mathbf{q}_i \approx \overline{\mathbf{s}}_i = \frac{1}{n} \sum_{j=1}^{n} \mathbf{s}_j\). The label uncertainty
|
||||||
|
of the detector for a particular observation is measured by
|
||||||
|
the entropy \(H(\mathbf{q}_i) = - \sum_j q_{ij} \cdot \log q_{ij}\).
|
||||||
|
|
||||||
|
In the introduction I used a very reduced version to describe
|
||||||
|
maximum and low uncertainty. A more complete explanation:
|
||||||
|
If \(\mathbf{q}_i\), which I called averaged class probabilities,
|
||||||
|
resembles a uniform distribution the entropy will be high. A uniform
|
||||||
|
distribution means that no class is more likely than another, which
|
||||||
|
is a perfect example of maximum uncertainty. Conversely, if
|
||||||
|
one class has a very high probability the entropy will be low.
|
||||||
|
|
||||||
|
In open-set conditions it can be expected that falsely generated
|
||||||
|
detections for unknown object classes have a higher label
|
||||||
|
uncertainty. A treshold on the entropy \(H(\mathbf{q}_i)\) can then
|
||||||
|
be used to identify and reject these false positive cases.
|
||||||
|
|
||||||
\section{GPND}
|
\section{GPND}
|
||||||
|
|
||||||
\section{Contribution}
|
\section{Contribution}
|
||||||
|
|
Loading…
Reference in New Issue