Written adversarial auto-encoder part

Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
Jim Martens 2019-03-07 15:03:26 +01:00
parent 1e66a6f874
commit 9a1bd269fd
1 changed files with 92 additions and 1 deletions

View File

@ -187,7 +187,6 @@ results in the approximation of the class probability
passes through the network and averaging over the obtained Softmax
scores \(\mathbf{s}_i\), given an image \(\mathcal{I}\) and the
training data \(\mathbf{T}\):
\begin{equation} \label{eq:drop-sampling}
p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i
\end{equation}
@ -238,6 +237,98 @@ be used to identify and reject these false positive cases.
\section{GPND}
For the theoretical underpinning of the Generative Probabilistic
Novelty Detection the reader is advised to refer to the paper of
Pidhorskyi et al\cite{Pidhorskyi2018}. This section will only
cover the key aspects of an adversarial auto-encoder required
to understand their method.
\subsection{Adversarial Auto-encoder}
The training data points \(x_i \in \mathbb{R}^m \) are the input
of the auto-encoder. An encoding function \(e: \mathbb{R}^m \rightarrow \mathbb{R}^n\) takes the data points
and produces a representation \(\overline{z_i} \in \mathbb{R}^n\)
in a latent space. This latent space is smaller (\(n < m\)) than the
input which necessitates some form of compression.
A second function \(g: \Omega \rightarrow \mathbb{R}^m\) is the
generator function that takes the latent representation
\(z_i \in \Omega \subset \mathbb{R}^n\) and generates an output
\(\overline{x_i}\) as close as possible to the input data
distribution.
What then is the difference between \(\overline{z_i}\) and \(z_i\)?
With a simple auto-encoder both would be identical. In this case
of an adversarial auto-encoder it is slightly more complicated.
There is a discriminator \(D_z\) that tries to distinguish between
an encoded data point \(\overline{z_i}\) and a \(z_i \sim \mathcal{N}(0,1)\) drawn from a normal distribution with \(0\) mean
and a standard deviation of \(1\). During training, the encoding
function \(e\) attempts to minimize any perceivable difference
between \(z_i\) and \(\overline{z_i}\) while \(D_z\) has the
aforementioned adversarial task to differentiate between them.
Furthermore, there is a discriminator \(D_x\) that has the task
to differentiate the generated output \(\overline{x_i}\) from the
actual input \(x_i\). During training, the generator function \(g\)
tries to minimize the perceivable difference between \(\overline{x_i}\) and \(x_i\) while \(D_x\) has the mentioned
adversarial task to distinguish between them.
With this all components of the adversarial auto-encoder employed
by Pidhorskyi et al are introduced. Finally, the losses are
presented. The two adversarial objectives have been mentioned
already. Specifically, there is the adversarial loss for the
discriminator \(D_z\):
\begin{equation} \label{eq:adv-loss-z}
\mathcal{L}_{adv-d_z}(x,e,D_z) = E[\log (D_z(\mathcal{N}(0,1)))] + E[\log (1 - D_z(e(x)))],
\end{equation}
\noindent
where \(E\) stands for an expected
value\footnote{a term used in probability theory},
\(x\) stands for the input, and
\(\mathcal{N}(0,1)\) represents an element drawn from the specified
distribution. The encoder \(e\) attempts to minimize this loss while
the discriminator \(D_z\) intends to maximize it.
In the same way the adversarial loss for the discriminator \(D_x\)
is specified:
\begin{equation} \label{eq:adv-loss-x}
\mathcal{L}_{adv-d_x}(x,D_x,g) = E[\log(D_x(x))] + E[\log(1 - D_x(g(\mathcal{N}(0,1))))],
\end{equation}
\noindent
where \(x\), \(E\), and \(\mathcal{N}(0,1)\) have the same meaning
as before. In this case the generator \(g\) tries to minimize the loss
while the discriminator \(D_x\) attempts to maximize it.
Every auto-encoder requires a reconstruction error to work. This
error calculates the difference between the original input and
the generated or decoded output. In this case, the reconstruction
loss is defined like this:
\begin{equation} \label{eq:recon-loss}
\mathcal{L}_{error}(x, e, g) = - E[\log(p(g(e(x)) | x))],
\end{equation}
\noindent
where \(\log(p)\) is the expected log-likelihood and \(x\),
\(E\), \(e\), and \(g\) have the same meaning as before.
All losses combined result in the following formula:
\begin{equation} \label{eq:full-loss}
\mathcal{L}(x,e,D_z,D_x,g) = \mathcal{L}_{adv-d_z}(x,e,D_z) + \mathcal{L}_{adv-d_x}(x,D_x,g) + \lambda \mathcal{L}_{error}(x,e,g),
\end{equation}
\noindent
where \(\lambda\) is a parameter used to balance the adversarial
losses with the reconstruction loss. The model is trained by
Pidhorskyi et al using the Adam optimizer by doing alternative
updates of each of the aforementioned components:
\begin{itemize}
\item Maximize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(D_x\);
\item Minimize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(g\);
\item Maximize \(\mathcal{L}_{adv-d_z}\) by updating weights of \(D_z\);
\item Minimize \(\mathcal{L}_{error}\) and \(\mathcal{L}_{adv-d_z}\) by updating weights of \(e\) and \(g\).
\end{itemize}
\section{Contribution}
\chapter{Thesis as a project}