From f52edb95bb9b074ca1f7682c4313c2093fb8e0e1 Mon Sep 17 00:00:00 2001 From: Jim Martens Date: Tue, 6 Aug 2019 10:54:24 +0200 Subject: [PATCH] Written section about implementing auto-encoder Signed-off-by: Jim Martens --- body.tex | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/body.tex b/body.tex index 7ece5f9..d509960 100644 --- a/body.tex +++ b/body.tex @@ -683,7 +683,7 @@ Miller et al. use SSD for the object detection part. They compare vanilla SSD, vanilla SSD with entropy thresholding, and the Bayesian SSD with each other. The Bayesian SSD was created by adding two dropout layers to the vanilla SSD; no other changes -were made. Miller et al. used weights that were trained on MS COCO +were made. Miller et al. use weights that were trained on MS COCO to predict on SceneNet RGB-D. As the source code was not available, I had to implement Miller's @@ -709,10 +709,28 @@ RGB-D data set, I counted the number of instances per COCO class and a huge class imbalance was visible; not just globally but also between trajectories: some classes are only present in some trajectories. This makes training with SSD on SceneNet practically -impossible. +impossible. \section{Implementing an auto-encoder} +Pidhorskyi et al.~\cite{Pidhorskyi2018} released their source code +but it is for +PyTorch; I had to adapt the code for Tensorflow. For the proof of +concept, a simpler model of encoder and generator was used; the +adversarial parts were disabled for this. The encoder starts with +a sigmoid-activated convolutional layers, followed by two +convolutional layers with ReLU as activation function. It ends +with a Flatten and Dense layer. +Decoding starts with a Dense layer, followed by three transposed +convolutional layers with ReLU as activation function; the last +layer is a transposed convolutional layer with sigmoid as +activation function. + +The auto-encoder works on the MNIST data set, as expected. It +works very well for COCO as well, with one caveat: it is equally +good for all classes, even when trained only on one. Novelty +detection is out of the question under theses circumstances. + \chapter{Results} \chapter{Discussion}