diff --git a/body_expose.tex b/body_expose.tex index c3e8270..2a112d5 100644 --- a/body_expose.tex +++ b/body_expose.tex @@ -187,7 +187,6 @@ results in the approximation of the class probability passes through the network and averaging over the obtained Softmax scores \(\mathbf{s}_i\), given an image \(\mathcal{I}\) and the training data \(\mathbf{T}\): - \begin{equation} \label{eq:drop-sampling} p(y|\mathcal{I}, \mathbf{T}) = \int p(y|\mathcal{I}, \mathbf{W}) \cdot p(\mathbf{W}|\mathbf{T})d\mathbf{W} \approx \frac{1}{n} \sum_{i=1}^{n}\mathbf{s}_i \end{equation} @@ -238,6 +237,98 @@ be used to identify and reject these false positive cases. \section{GPND} +For the theoretical underpinning of the Generative Probabilistic +Novelty Detection the reader is advised to refer to the paper of +Pidhorskyi et al\cite{Pidhorskyi2018}. This section will only +cover the key aspects of an adversarial auto-encoder required +to understand their method. + +\subsection{Adversarial Auto-encoder} + +The training data points \(x_i \in \mathbb{R}^m \) are the input +of the auto-encoder. An encoding function \(e: \mathbb{R}^m \rightarrow \mathbb{R}^n\) takes the data points +and produces a representation \(\overline{z_i} \in \mathbb{R}^n\) +in a latent space. This latent space is smaller (\(n < m\)) than the +input which necessitates some form of compression. + +A second function \(g: \Omega \rightarrow \mathbb{R}^m\) is the +generator function that takes the latent representation +\(z_i \in \Omega \subset \mathbb{R}^n\) and generates an output +\(\overline{x_i}\) as close as possible to the input data +distribution. + +What then is the difference between \(\overline{z_i}\) and \(z_i\)? +With a simple auto-encoder both would be identical. In this case +of an adversarial auto-encoder it is slightly more complicated. +There is a discriminator \(D_z\) that tries to distinguish between +an encoded data point \(\overline{z_i}\) and a \(z_i \sim \mathcal{N}(0,1)\) drawn from a normal distribution with \(0\) mean +and a standard deviation of \(1\). During training, the encoding +function \(e\) attempts to minimize any perceivable difference +between \(z_i\) and \(\overline{z_i}\) while \(D_z\) has the +aforementioned adversarial task to differentiate between them. + +Furthermore, there is a discriminator \(D_x\) that has the task +to differentiate the generated output \(\overline{x_i}\) from the +actual input \(x_i\). During training, the generator function \(g\) +tries to minimize the perceivable difference between \(\overline{x_i}\) and \(x_i\) while \(D_x\) has the mentioned +adversarial task to distinguish between them. + +With this all components of the adversarial auto-encoder employed +by Pidhorskyi et al are introduced. Finally, the losses are +presented. The two adversarial objectives have been mentioned +already. Specifically, there is the adversarial loss for the +discriminator \(D_z\): +\begin{equation} \label{eq:adv-loss-z} + \mathcal{L}_{adv-d_z}(x,e,D_z) = E[\log (D_z(\mathcal{N}(0,1)))] + E[\log (1 - D_z(e(x)))], +\end{equation} +\noindent +where \(E\) stands for an expected +value\footnote{a term used in probability theory}, +\(x\) stands for the input, and +\(\mathcal{N}(0,1)\) represents an element drawn from the specified +distribution. The encoder \(e\) attempts to minimize this loss while +the discriminator \(D_z\) intends to maximize it. + +In the same way the adversarial loss for the discriminator \(D_x\) +is specified: +\begin{equation} \label{eq:adv-loss-x} + \mathcal{L}_{adv-d_x}(x,D_x,g) = E[\log(D_x(x))] + E[\log(1 - D_x(g(\mathcal{N}(0,1))))], +\end{equation} +\noindent +where \(x\), \(E\), and \(\mathcal{N}(0,1)\) have the same meaning +as before. In this case the generator \(g\) tries to minimize the loss +while the discriminator \(D_x\) attempts to maximize it. + +Every auto-encoder requires a reconstruction error to work. This +error calculates the difference between the original input and +the generated or decoded output. In this case, the reconstruction +loss is defined like this: +\begin{equation} \label{eq:recon-loss} + \mathcal{L}_{error}(x, e, g) = - E[\log(p(g(e(x)) | x))], +\end{equation} +\noindent +where \(\log(p)\) is the expected log-likelihood and \(x\), +\(E\), \(e\), and \(g\) have the same meaning as before. + +All losses combined result in the following formula: +\begin{equation} \label{eq:full-loss} + \mathcal{L}(x,e,D_z,D_x,g) = \mathcal{L}_{adv-d_z}(x,e,D_z) + \mathcal{L}_{adv-d_x}(x,D_x,g) + \lambda \mathcal{L}_{error}(x,e,g), +\end{equation} +\noindent +where \(\lambda\) is a parameter used to balance the adversarial +losses with the reconstruction loss. The model is trained by +Pidhorskyi et al using the Adam optimizer by doing alternative +updates of each of the aforementioned components: + +\begin{itemize} + \item Maximize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(D_x\); + \item Minimize \(\mathcal{L}_{adv-d_x}\) by updating weights of \(g\); + \item Maximize \(\mathcal{L}_{adv-d_z}\) by updating weights of \(D_z\); + \item Minimize \(\mathcal{L}_{error}\) and \(\mathcal{L}_{adv-d_z}\) by updating weights of \(e\) and \(g\). +\end{itemize} + + + \section{Contribution} \chapter{Thesis as a project}