diff --git a/body.tex b/body.tex
index e09ac69..4bab418 100644
--- a/body.tex
+++ b/body.tex
@@ -374,12 +374,78 @@ implementation details will be presented.
 
 \section{Bayesian SSD for Model Uncertainty}
 
+Bayesian SSD adds dropout sampling to the vanilla SSD. First,
+the model architecture will be explained, followed by details on
+the uncertainty calculation, and implementation details.
+
 \subsection{Model Architecture}
 
+\begin{figure}
+    \centering
+    \includegraphics[scale=1.2]{vanilla-ssd}
+    \caption{The vanilla SSD network as defined by Liu et al.~\cite{Liu2016}. VGG-16 is the base network, extended with extra feature layers. These predict offsets to anchor boxes with different sizes and aspect ratios. Furthermore, they predict the
+    corresponding confidences.}
+    \label{fig:vanilla-ssd}
+\end{figure}
+
+Vanilla SSD is based upon the VGG-16 network (see figure \ref{fig:vanilla-ssd}) and adds extra feature layers. These layers
+predict the offsets to the anchor boxes, which have different sizes
+and aspect ratios. The feature layers also predict the
+corresponding confidences. By comparison, Bayesian SSD only adds
+two dropout layers after the fc6 and fc7 layers (see figure \ref{fig:bayesian-ssd}).
+
+\begin{figure}
+    \centering
+    \includegraphics[scale=1.2]{bayesian-ssd}
+    \caption{The Bayesian SSD network as defined by Miller et al.~\cite{Miller2018}. It adds dropout layers after the fc6
+    and fc7 layers.}
+    \label{fig:bayesian-ssd}
+\end{figure}
+
 \subsection{Model Uncertainty}
 
+Dropout sampling measures model uncertainty with the help of
+entropy: every forward pass creates predictions, these are
+partitioned into observations, and then their entropy is calculated.
+Entropy works to detect uncertainty because uncertain networks
+will produce different classifications for the same object in an
+image across multiple forward passes.
+
+The vanilla SSD
+per-class confidence threshold and non-maximum suppression has one
+weakness: even if SSD correctly predicts all objects as the
+background class with high confidence, the per-class confidence
+threshold of 0.01 will consider predictions with very low
+confidences; as background boxes are not present in the maxima
+collection, many low confidence boxes can be. Furthermore, the
+same detection can be present in the maxima collection for multiple
+classes. In this case, the entropy threshold would let the detection
+pass because the background class has high confidence. Subsequently,
+a low per-class confidence threshold does not restrict the boxes
+either. Therefore, the decoding output is worse than the actual
+predictions of the network.
+Bayesian SSD cannot help in this situation because the network
+is not actually uncertain.
+
+SSD was developed with closed set conditions in mind. A well trained
+network in such a situation does not have many high confidence
+background detections. In an open set environment, background
+detections are the correct behaviour for unknown classes.
+In order to get useful detections out of the decoding, a higher
+confidence threshold is required.
+
 \subsection{Implementation Details}
 
+For this thesis, an SSD implementation based on Tensorflow and
+Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
+was used. It was modified to support entropy thresholding, and dropout
+layers in the SSD model. Entropy thresholding takes place before
+the per-class confidence threshold is applied.
+
+
+
+
+
 \section{Software and Source Code Design}
 
 The source code of many published papers is either not available