Written adversarial auto-encoder part
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
parent
1e66a6f874
commit
9a1bd269fd
|
@ -187,7 +187,6 @@ results in the approximation of the class probability
|
|||
passes through the network and averaging over the obtained Softmax
|
||||
scores \(\mathbf{s}_i\), given an image \(\mathcal{I}\) and the
|
||||
training data \(\mathbf{T}\):
|
||||
|
||||
\begin{equation} \label{eq:drop-sampling}
|
||||
p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i
|
||||
\end{equation}
|
||||
|
@ -238,6 +237,98 @@ be used to identify and reject these false positive cases.
|
|||
|
||||
\section{GPND}
|
||||
|
||||
For the theoretical underpinning of the Generative Probabilistic
|
||||
Novelty Detection the reader is advised to refer to the paper of
|
||||
Pidhorskyi et al\cite{Pidhorskyi2018}. This section will only
|
||||
cover the key aspects of an adversarial auto-encoder required
|
||||
to understand their method.
|
||||
|
||||
\subsection{Adversarial Auto-encoder}
|
||||
|
||||
The training data points \(x_i \in \mathbb{R}^m \) are the input
|
||||
of the auto-encoder. An encoding function \(e: \mathbb{R}^m \rightarrow \mathbb{R}^n\) takes the data points
|
||||
and produces a representation \(\overline{z_i} \in \mathbb{R}^n\)
|
||||
in a latent space. This latent space is smaller (\(n < m\)) than the
|
||||
input which necessitates some form of compression.
|
||||
|
||||
A second function \(g: \Omega \rightarrow \mathbb{R}^m\) is the
|
||||
generator function that takes the latent representation
|
||||
\(z_i \in \Omega \subset \mathbb{R}^n\) and generates an output
|
||||
\(\overline{x_i}\) as close as possible to the input data
|
||||
distribution.
|
||||
|
||||
What then is the difference between \(\overline{z_i}\) and \(z_i\)?
|
||||
With a simple auto-encoder both would be identical. In this case
|
||||
of an adversarial auto-encoder it is slightly more complicated.
|
||||
There is a discriminator \(D_z\) that tries to distinguish between
|
||||
an encoded data point \(\overline{z_i}\) and a \(z_i \sim \mathcal{N}(0,1)\) drawn from a normal distribution with \(0\) mean
|
||||
and a standard deviation of \(1\). During training, the encoding
|
||||
function \(e\) attempts to minimize any perceivable difference
|
||||
between \(z_i\) and \(\overline{z_i}\) while \(D_z\) has the
|
||||
aforementioned adversarial task to differentiate between them.
|
||||
|
||||
Furthermore, there is a discriminator \(D_x\) that has the task
|
||||
to differentiate the generated output \(\overline{x_i}\) from the
|
||||
actual input \(x_i\). During training, the generator function \(g\)
|
||||
tries to minimize the perceivable difference between \(\overline{x_i}\) and \(x_i\) while \(D_x\) has the mentioned
|
||||
adversarial task to distinguish between them.
|
||||
|
||||
With this all components of the adversarial auto-encoder employed
|
||||
by Pidhorskyi et al are introduced. Finally, the losses are
|
||||
presented. The two adversarial objectives have been mentioned
|
||||
already. Specifically, there is the adversarial loss for the
|
||||
discriminator \(D_z\):
|
||||
\begin{equation} \label{eq:adv-loss-z}
|
||||
\mathcal{L}_{adv-d_z}(x,e,D_z) = E[\log (D_z(\mathcal{N}(0,1)))] + E[\log (1 - D_z(e(x)))],
|
||||
\end{equation}
|
||||
\noindent
|
||||
where \(E\) stands for an expected
|
||||
value\footnote{a term used in probability theory},
|
||||
\(x\) stands for the input, and
|
||||
\(\mathcal{N}(0,1)\) represents an element drawn from the specified
|
||||
distribution. The encoder \(e\) attempts to minimize this loss while
|
||||
the discriminator \(D_z\) intends to maximize it.
|
||||
|
||||
In the same way the adversarial loss for the discriminator \(D_x\)
|
||||
is specified:
|
||||
\begin{equation} \label{eq:adv-loss-x}
|
||||
\mathcal{L}_{adv-d_x}(x,D_x,g) = E[\log(D_x(x))] + E[\log(1 - D_x(g(\mathcal{N}(0,1))))],
|
||||
\end{equation}
|
||||
\noindent
|
||||
where \(x\), \(E\), and \(\mathcal{N}(0,1)\) have the same meaning
|
||||
as before. In this case the generator \(g\) tries to minimize the loss
|
||||
while the discriminator \(D_x\) attempts to maximize it.
|
||||
|
||||
Every auto-encoder requires a reconstruction error to work. This
|
||||
error calculates the difference between the original input and
|
||||
the generated or decoded output. In this case, the reconstruction
|
||||
loss is defined like this:
|
||||
\begin{equation} \label{eq:recon-loss}
|
||||
\mathcal{L}_{error}(x, e, g) = - E[\log(p(g(e(x)) | x))],
|
||||
\end{equation}
|
||||
\noindent
|
||||
where \(\log(p)\) is the expected log-likelihood and \(x\),
|
||||
\(E\), \(e\), and \(g\) have the same meaning as before.
|
||||
|
||||
All losses combined result in the following formula:
|
||||
\begin{equation} \label{eq:full-loss}
|
||||
\mathcal{L}(x,e,D_z,D_x,g) = \mathcal{L}_{adv-d_z}(x,e,D_z) + \mathcal{L}_{adv-d_x}(x,D_x,g) + \lambda \mathcal{L}_{error}(x,e,g),
|
||||
\end{equation}
|
||||
\noindent
|
||||
where \(\lambda\) is a parameter used to balance the adversarial
|
||||
losses with the reconstruction loss. The model is trained by
|
||||
Pidhorskyi et al using the Adam optimizer by doing alternative
|
||||
updates of each of the aforementioned components:
|
||||
|
||||
\begin{itemize}
|
||||
\item Maximize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(D_x\);
|
||||
\item Minimize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(g\);
|
||||
\item Maximize \(\mathcal{L}_{adv-d_z}\) by updating weights of \(D_z\);
|
||||
\item Minimize \(\mathcal{L}_{error}\) and \(\mathcal{L}_{adv-d_z}\) by updating weights of \(e\) and \(g\).
|
||||
\end{itemize}
|
||||
|
||||
|
||||
|
||||
\section{Contribution}
|
||||
|
||||
\chapter{Thesis as a project}
|
||||
|
|
Loading…
Reference in New Issue