Browse Source

Improved thesis based upon feedback

Signed-off-by: Jim Martens <github@2martens.de>
master
Jim Martens 2 years ago
parent
commit
dc976932f8
  1. 8
      acknowledge.tex
  2. 22
      appendix.tex
  3. 159
      body.tex
  4. 9
      ma.bib

8
acknowledge.tex

@ -1,12 +1,12 @@
\clearpage
\section*{Acknowledgement}
I would like to thank for the continued support, suggestions, and advise
from my super-visor Prof. Dr. Simone Frintrop and co-supervisor Dr.
I would like to thank for the continued support, suggestions, and advice
from my supervisor Prof. Dr. Simone Frintrop and co-supervisor Dr.
Mikko Lauri.
Additionally, I would like to thank my friends and family for the continued
support and sometimes helpful questions. Especially in some hard times
Additionally, I would like to thank my friends and family for their continued
support and helpful questions. Especially during some hard times
their support was invaluable.
Furthermore, I am grateful for the Fridays for Future movement

22
appendix.tex

@ -1,23 +1,23 @@
\chapter{Software and Source Code Design}
The source code of many published papers is either not available
or seems like an afterthought: it is poorly documented, difficult
or is of bad quality: it is poorly documented, difficult
to integrate into your own work, and often does not follow common
software development best practices. Moreover, with Tensorflow,
PyTorch, and Caffe there are at least three machine learning
frameworks. Every research team seems to prefer another framework
and sometimes even develops their own; this makes it difficult
frameworks. Every research team seems to prefer another framework,
and, occasionally, even develops their own; this makes it difficult
to combine the work of different authors.
In addition to all this, most papers do not contain proper information
regarding the implementation details, making it difficult to
accurately replicate them if their source code is not available.
In addition to this, most papers do not contain proper information
regarding implementation details, making it difficult to
accurately replicate their results, if their source code is not available.
Therefore, it was clear to me: I will release my source code and
make it available as Python package on the PyPi package index.
Therefore, I will release my source code and
make it available as a Python package on the PyPi package index.
This makes it possible for other researchers to simply install
a package and use the API to interact with my code. Additionally,
the code has been designed to be future proof and work with
the announced Tensorflow 2.0 by supporting eager mode.
the code has been designed to be future proof, and work with
the announced Tensorflow 2.0, by supporting eager mode.
Furthermore, it is configurable, well documented, and conforms largely
to the clean code guidelines: evolvability and extendability among
@ -38,7 +38,7 @@ can be found in plotting.py, and the ssd.py module contains
code to train the SSD and later predict with it.
Lastly, the SSD implementation from a third party repository
has been modified to work inside a Python package architecture and
has been modified to work inside a Python package architecture, and
with eager mode. It is stored as a Git submodule inside the package
repository.

159
body.tex

@ -21,7 +21,7 @@ black boxes and prevents any answers to questions of causality.
However, these questions of causality are of enormous consequence when
results of neural networks are used to make life changing decisions:
Is a correlation enough to bring forth negative consequences
is a correlation enough to bring forth negative consequences
for a particular person? And if so, what is the possible defence
against math? Similar questions can be raised when looking at computer
vision networks that might be used together with so called smart
@ -29,14 +29,14 @@ vision networks that might be used together with so called smart
This leads to the need for neural networks to explain their results.
Such an explanation must come from the network or an attached piece
of technology to allow adoption in mass. Obviously, this setting
poses the question, how such an endeavour can be achieved.
of technology to allow mass adoption. Obviously, this setting
poses the question of how such an endeavour can be achieved.
For neural networks there are fundamentally two types of tasks:
For neural networks there are fundamentally two types of problems:
regression and classification. Regression deals with any case
where the goal for the network is to come close to an ideal
function that connects all data points. Classification, however,
describes tasks where the network is supposed to identify the
describes problems where the network is supposed to identify the
class of any given input. In this thesis, I will work with both.
\subsection*{Object Detection in Open Set Conditions}
@ -54,53 +54,51 @@ class of any given input. In this thesis, I will work with both.
More specifically, I will look at object detection in the open set
conditions (see figure \ref{fig:open-set}).
In non-technical words this effectively describes
the kind of situation you encounter with \gls{CCTV} or robots
outside of a laboratory. Both use cameras that record
images. Subsequently, a neural network analyses the image
and returns a list of detected and classified objects that it
found in the image. The problem here is that networks can only
In non-technical terms this effectively describes
the conditions \gls{CCTV} and robots outside of a laboratory operate in. In both cases images are recorded with cameras. In order to detect objects, a neural network has to analyse the images
and return a list of detected and classified objects that it
finds in the images. The problem here is that networks can only
classify what they know. If presented with an object type that
the network was not trained with, as happens frequently in real
environments, it will still classify the object and might even
have a high confidence in doing so. This is an example for a
false positive. Anyone who uses the results of
such a network could falsely assume that a high confidence always
such a network could falsely assume that a high confidence
means the classification is very likely correct. If one uses
a proprietary system one might not even be able to find out
that the network was never trained on a particular type of object.
Therefore, it would be impossible for one to identify the output
of the network as false positive.
of the network as a false positive.
This reaffirms the need for automatic explanation. Such a system
should recognise by itself that the given object is unknown and
hence mark any classification result of the network as meaningless.
mark any classification result of the network as meaningless.
Technically there are two slightly different approaches that deal
with this type of task: model uncertainty and novelty detection.
Model uncertainty can be measured, for example, with dropout sampling.
Dropout layers are usually used only during training but
Miller et al.~\cite{Miller2018} use them also during testing
to achieve different results for the same image making use of
Dropout layers are usually used only during training, but
Miller et al.~\cite{Miller2018} also use them during testing
to achieve different results for the same image---making use of
multiple forward passes. The output scores for the forward passes
of the same image are then averaged. If the averaged class
probabilities resemble a uniform distribution (every class has
the same probability) this symbolises maximum uncertainty. Conversely,
if there is one very high probability with every other being very
low this signifies a low uncertainty. An unknown object is more
likely to cause high uncertainty which allows for an identification
low, this signifies a low uncertainty. An unknown object is more
likely to cause high uncertainty, which allows for an identification
of false positive cases.
Novelty detection is another approach to solve the task.
Novelty detection is another approach to solve the problem.
In the realm of neural networks it is usually done with the help of
auto-encoders that solve a regression task of finding an
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have
internally at least two components: an encoder, and a decoder or
auto-encoders that try to solve a regression problem of finding an
identity function that reconstructs the given input~\cite{Pimentel2014}. Auto-encoders have,
internally, at least two components: an encoder, and a decoder or
generator. The job of the encoder is to find an encoding that
compresses the input as good as possible while simultaneously
compresses the input as well as possible, while simultaneously
being as loss-free as possible. The decoder takes this latent
representation of the input and has to find a decompression
that reconstructs the input as accurate as possible. During
representation of the input, and has to find a decompression
that reconstructs the input as accurately as possible. During
training these auto-encoders learn to reproduce a certain group
of object classes. The actual novelty detection takes place
during testing: given an image, and the output and loss of the
@ -148,25 +146,24 @@ conditions compared to object detection without it.
\subsection*{Reader's Guide}
First, chapter \ref{chap:background} presents related works and
First, chapter \ref{chap:background} presents related works, and
provides the background for dropout sampling.
Afterwards, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how
Thereafter, chapter \ref{chap:methods} explains how \gls{vanilla} \gls{SSD} works, how
Bayesian \gls{SSD} extends \gls{vanilla} \gls{SSD}, and how the decoding pipelines are
structured.
Chapter \ref{chap:experiments-results} presents the data sets,
the experimental setup, and the results. This is followed by
chapter \ref{chap:discussion}, focusing on the discussion and closing.
Therefore, the contribution is found in chapters \ref{chap:methods},
The contribution of this thesis is found in chapters \ref{chap:methods},
\ref{chap:experiments-results}, and \ref{chap:discussion}.
\chapter{Background}
\label{chap:background}
This chapter begins with an overview over previous works
in the field of this thesis. Afterwards the theoretical foundations
of dropout sampling are explained.
This chapter begins with an overview of previous works, followed by an explanation of the theoretical
foundations of dropout sampling.
\section{Related Works}
@ -176,7 +173,7 @@ methods published over the previous decade. They showcase probabilistic,
distance-based, reconstruction-based, domain-based, and information-theoretic
novelty detection. Based on their categorisation, this thesis falls under
reconstruction-based novelty detection as it deals only with neural network
approaches. Therefore, the other types of novelty detection will only be
approaches. The other types of novelty detection will, therefore, only be
introduced briefly.
\subsection{Overview over types of Novelty Detection}
@ -197,16 +194,16 @@ Both methods are similar to estimating the
\gls{pdf} of data, they use well-defined distance metrics to compute the distance
between two data points.
Domain-based novelty detection describes the boundary of the known data, rather
than the data itself. Unknown data is identified by its position relative to
the boundary. A common implementation for this are support vector machines
(e.g. implemented by Song et al. \cite{Song2002}).
Domain-based novelty detection describes the boundary of the known data,
rather than the data itself. Unknown data is identified by its position
relative to the boundary. Support vector machines (e.g. implemented by
Song et al. \cite{Song2002}) are a common implementation of this.
Information-theoretic novelty detection computes the information content
of a data set, for example, with metrics like \gls{entropy}. Such metrics assume
that novel data inside the data set significantly alters the information
content of an otherwise normal data set. First, the metrics are calculated over the
whole data set. Afterwards, a subset is identified that causes the biggest
whole data set. Second, a subset is identified that causes the biggest
difference in the metric when removed from the data set. This subset is considered
to consist of novel data. For example, Filippone and Sanguinetti \cite{Filippone2011} provide
a recent approach.
@ -214,7 +211,7 @@ a recent approach.
\subsection{Reconstruction-based Novelty Detection}
Reconstruction-based approaches use the reconstruction error in one form
or another to calculate the novelty score. This can be auto-encoders that
or another to calculate the novelty score. These can be auto-encoders that
literally reconstruct the input but it also includes \gls{MLP} networks which try
to reconstruct the ground truth. Pimentel et al.~\cite{Pimentel2014} differentiate
between neural network-based approaches and subspace methods. The first are
@ -242,7 +239,7 @@ Gal and Ghahramani~\cite{Gal2016} show that dropout training is a
Bayesian approximation of a Gaussian process. Subsequently, Gal~\cite{Gal2017}
shows that dropout training actually corresponds to a general approximate
Bayesian model. This means every network trained with dropout is an
approximate Bayesian model. During inference the dropout remains active,
approximate Bayesian model. During inference the dropout remains active:
this form of inference is called \gls{MCDO}.
Miller et al.~\cite{Miller2018} build upon the work of Gal and Ghahramani: they
use \gls{MCDO} under open-set conditions for object detection.
@ -261,14 +258,13 @@ Consequently, this technique can be applied to any network that utilises
standard batch normalisation.
Li et al.~\cite{Li2019} investigate the problem of poor performance
when combining dropout and batch normalisation: dropout shifts the variance
of a neural unit when switching from train to test, batch normalisation
of a neural unit when switching from train to test; batch normalisation
does not change the variance. This inconsistency leads to a variance shift which
can have a larger or smaller impact based on the network used.
Non-Bayesian approaches have been developed as well. Usually, they compare with
\gls{MCDO} and show better performance.
Non-Bayesian approaches have also been developed. Usually they are compared with \gls{MCDO} and show better performance.
Postels et al.~\cite{Postels2019} provide a sampling-free approach for
uncertainty estimation that does not affect training and approximates the
uncertainty estimation that does not affect training, and approximates the
sampling at test time. They compare it to \gls{MCDO} and find less computational
overhead with better results.
Lakshminarayanan et al.~\cite{Lakshminarayanan2017}
@ -279,7 +275,7 @@ introduce an uncertainty estimation algorithm for non-Bayesian deep
neural classification that estimates the uncertainty of highly
confident points using earlier snapshots of the trained model and improves,
among others, the approach introduced by Lakshminarayanan et al.
Sensoy et al.~\cite{Sensoy2018} explicitely model prediction uncertainty:
Sensoy et al.~\cite{Sensoy2018} explicitly model prediction uncertainty:
a \gls{Dirichlet distribution} is placed over the class probabilities. Consequently,
the predictions of a neural network are treated as subjective opinions.
@ -348,21 +344,21 @@ training of the network determines a plausible set of weights by
evaluating the probability output (\gls{posterior}) over the weights given
the training data \(\mathbf{T}\): \(p(\mathbf{W}|\mathbf{T})\).
However, this
evaluation cannot be performed in any reasonable
evaluation cannot be performed in any reasonable amount of
time. Therefore approximation techniques are
required. In those techniques the \gls{posterior} is fitted with a
simple distribution \(q^{*}_{\theta}(\mathbf{W})\). The original
and intractable problem of averaging over all weights in the network
is replaced with an optimisation task, where the parameters of the
simple distribution are optimised over~\cite{Kendall2017}.
is replaced with an optimisation task: the parameters of the
simple distribution are optimised~\cite{Kendall2017}.
\subsubsection*{Dropout Variational Inference}
Kendall and Gal~\cite{Kendall2017} show an approximation for
classfication and recognition tasks. Dropout variational inference
is a practical approximation technique by adding dropout layers
in front of every weight layer and using them also during test
time to sample from the approximate \gls{posterior}. Effectively, this
in front of every weight layer and also using them during test
time to sample from the approximate \gls{posterior}. In effect, this
results in the approximation of the class probability
\(p(y|\mathcal{I}, \mathbf{T})\) by performing \(n\) forward
passes through the network and averaging the so obtained softmax
@ -479,7 +475,7 @@ and very low confidences in other classes.
\subsection{Implementation Details}
For this thesis, an \gls{SSD} implementation based on Tensorflow~\cite{Abadi2015} and
Keras\footnote{\url{https://github.com/pierluigiferrari/ssd\_keras}}
Keras~\cite{Chollet2015}
is used. It has been modified to support \gls{entropy} thresholding,
partitioning of observations, and dropout
layers in the \gls{SSD} model. Entropy thresholding takes place before
@ -517,7 +513,7 @@ confidence thresholding and a subsequent \gls{NMS}.
All boxes that pass \gls{NMS} are added to a
per image maxima list. One box could make the confidence threshold
for multiple classes and, hence, be present multiple times in the
maxima list for the image. Lastly, a total of \(k\) boxes with the
maxima list for the image. In the end, a total of \(k\) boxes with the
highest confidences is kept per image across all classes. The
original implementation uses a confidence threshold of \(0.01\), an
IOU threshold for \gls{NMS} of \(0.45\) and a top \(k\)
@ -548,7 +544,7 @@ confidence threshold is required.
\subsection{Vanilla SSD with Entropy Thresholding}
Vanilla \gls{SSD} with \gls{entropy} tresholding adds an additional component
Vanilla \gls{SSD} with \gls{entropy} thresholding adds an additional component
to the filtering already done for \gls{vanilla} \gls{SSD}. The \gls{entropy} is
calculated from all \(\#nr\_classes\) softmax scores in a prediction.
Only predictions with a low enough \gls{entropy} pass the \gls{entropy}
@ -558,8 +554,8 @@ false positive or false negative cases with high confidence values.
\subsection{Bayesian SSD with Entropy Thresholding}
Bayesian \gls{SSD} has the speciality of multiple forward passes. Based
on the information in the paper, the detections of all forward passes
Bayesian \gls{SSD} uses multiple forward passes. Based
on the information from Miller et al.~\cite{Miller2018}, the detections of all forward passes
are grouped per image but not by forward pass. This leads
to the following shape of the network output after all
forward passes: \((batch\_size, \#nr\_boxes \, \cdot \, \#nr\_forward\_passes, \#nr\_classes + 12)\). The size of the output
@ -576,7 +572,7 @@ mutual IOU score of every detection with all other detections. Detections
with a mutual IOU score of 0.95 or higher are partitioned into an
observation. Next, the softmax scores and bounding box coordinates of
all detections in an observation are averaged.
There can be a different number of observations for every image which
There can be a different number of observations for every image, which
destroys homogenity and prevents batch-wise calculation of the
results. The shape of the results is per image: \((\#nr\_observations,\#nr\_classes + 4)\).
@ -598,14 +594,14 @@ at the end.
\label{chap:experiments-results}
This chapter explains the used data sets, how the experiments have been
set up, and what the results are.
This chapter explains the data sets used, and how the experiments have been
set up. Furthermore, it presents the results.
\section{Data Sets}
This thesis uses the MS COCO~\cite{Lin2014} data set. It contains
80 classes, their range is illustrated by two classes: airplanes and toothbrushes.
The images are taken by camera from the real world, ground truth
The images are real world images, ground truth
is provided for all images. The data set supports object detection,
keypoint detection, and panoptic segmentation (scene segmentation).
@ -779,7 +775,7 @@ The relation of \(F_1\) score to absolute open set error can be observed
in figure \ref{fig:ose-f1-micro}. Precision-recall curves for all variants
can be seen in figure \ref{fig:precision-recall-micro}. Both \gls{vanilla} \gls{SSD}
variants with 0.01 confidence threshold reach a much higher open set error
and a higher recall. This behaviour is expected as more and worse predictions
and a higher recall. This behaviour is to be expected as more and worse predictions
are included.
All plotted variants show a similar behaviour that is in line with previously
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
@ -861,7 +857,7 @@ The relation of \(F_1\) score to absolute open set error can be observed
in figure \ref{fig:ose-f1-macro}. Precision-recall curves for all variants
can be seen in figure \ref{fig:precision-recall-macro}. Both \gls{vanilla} \gls{SSD}
variants with 0.01 confidence threshold reach a much higher open set error
and a higher recall. This behaviour is expected as more and worse predictions
and a higher recall. This behaviour is to be expected as more and worse predictions
are included.
All plotted variants show a similar behaviour that is in line with previously
reported figures, such as the ones in Miller et al.~\cite{Miller2018}
@ -878,9 +874,9 @@ only 0.7\% of the ground truth. With this share, it is below
the average of roughly 0.9\% for each of the 56 classes that make up the
second half of the ground truth.
In some cases, multiple variants have seemingly the same performance
but only one or some of them are marked bold. This is informed by
differences prior to rounding. If two or more variants are marked bold
In some cases, multiple variants have apparently the same performance
but only one or some of them are marked bold. This is caused by
differences prior to rounding: if two or more variants are marked bold
they had the exact same performance before rounding.
\begin{table}[tbp]
@ -909,11 +905,9 @@ they had the exact same performance before rounding.
\end{table}
The vanilla \gls{SSD} variant with 0.2 per class confidence threshold performs
best in the persons class with a max \(F_1\) score of 0.460, as well as
recall of 0.405 and precision of 0.533 at the max \(F_1\) score.
It shares the first place in recall with the \gls{vanilla} \gls{SSD}
variant using 0.01 confidence threshold. All Bayesian \gls{SSD} variants
perform worse than the \gls{vanilla} \gls{SSD} variants (see table
best in the persons class: it has a max \(F_1\) score of 0.460, consisting of a recall of 0.405 and a precision of 0.533.
The variant shares the first place in recall with the \gls{vanilla} \gls{SSD}
variant that uses a 0.01 confidence threshold. All Bayesian \gls{SSD} variants perform worse than the \gls{vanilla} \gls{SSD} variants (see table
\ref{tab:results-persons}). With respect to the macro averaged result,
all variants perform better than the average of all classes.
@ -951,7 +945,7 @@ variant with \gls{NMS} and disabled dropout, and the one with 0.9 keep
ratio have a better precision (0.460 and 0.454 respectively) than the
\gls{vanilla} \gls{SSD} variants with 0.01 confidence threshold (0.452 and
0.453). With respect to the macro averaged result, all variants have
a better precision than the average and the Bayesian variant without
a better precision than the average. The Bayesian variant without
\gls{NMS} and dropout also has a better recall and \(F_1\) score.
\begin{table}[tbp]
@ -983,7 +977,7 @@ The best \(F_1\) score (0.288) and recall (0.251) for the chairs class
belongs to \gls{vanilla} \gls{SSD} with \gls{entropy} threshold. Precision
is mastered by Bayesian \gls{SSD} with \gls{NMS} and disabled dropout (0.360).
The variant with 0.9 keep ratio has the second-highest precision (0.343)
of all variants. Both in \(F_1\) score and recall all Bayesian variants
of all variants. Both in \(F_1\) score and recall, all Bayesian variants
are worse than the \gls{vanilla} variants. Compared with the macro averaged
results, all variants perform worse than the average.
@ -1077,7 +1071,7 @@ ratio.
\end{figure}
The ground truth only contains a stop sign and a truck. The differences between \gls{vanilla} \gls{SSD} and Bayesian \gls{SSD} are almost not visible
(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian \gls{SSD}, instead both detected a pottet plant and a traffic light. The stop sign is detected by both variants.
(see figures \ref{fig:stop-sign-truck-vanilla} and \ref{fig:stop-sign-truck-bayesian}): the truck is neither detected by \gls{vanilla} nor Bayesian \gls{SSD}, instead both detected a "potted plant" and a traffic light. The stop sign is detected by both variants.
This behaviour implies problems with detecting objects at the edge
that overwhelmingly lie outside the image frame. Furthermore, the predictions are usually identical.
@ -1095,9 +1089,11 @@ that overwhelmingly lie outside the image frame. Furthermore, the predictions ar
\end{minipage}
\end{figure}
Another example (see figures \ref{fig:cat-laptop-vanilla} and \ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background on the right
side. Both variants detect a cat but the \gls{vanilla} variant detects a dog as well. The laptop and TV are not detected but this is expected since
these classes have not been trained.
Another example (see figures \ref{fig:cat-laptop-vanilla} and
\ref{fig:cat-laptop-bayesian}) is a cat with a laptop/TV in the background
on the right side. Both variants detect a cat but the \gls{vanilla}
variant detects a dog as well. The laptop and TV are not detected but this
is to be expected since these classes have not been trained.
\chapter{Discussion and Outlook}
@ -1153,7 +1149,7 @@ open set error continues to rise a bit.
There is no visible impact of \gls{entropy} thresholding on the object detection
performance for \gls{vanilla} \gls{SSD}. This indicates that the network has almost no
uniform or close to uniform predictions, the vast majority of predictions
has a high confidence in one class---including the background.
have a high confidence in one class---including the background.
However, the \gls{entropy} plays a larger role for the Bayesian variants---as
expected: the best performing thresholds are 1.0, 1.3, and 1.4 for micro averaging,
and 1.5, 1.7, and 2.0 for macro averaging. In all of these cases the best
@ -1190,7 +1186,7 @@ threshold indicates a worse performance.
Miller et al.~\cite{Miller2018} supposedly do not use \gls{NMS}
in their implementation of dropout sampling. Therefore, a variant with disabled \glslocalreset{NMS}
\gls{NMS} has been tested. The results are somewhat expected:
\gls{NMS} has been tested. The results are somewhat as expected:
\gls{NMS} removes all non-maximum detections that overlap
with a maximum one. This reduces the number of multiple detections per
ground truth bounding box and therefore the false positives. Without it,
@ -1208,7 +1204,7 @@ more than 50\% of the original observations are removed with \gls{NMS} and
stay without---all of these are very likely to be false positives.
A clear distinction between micro and macro averaging can be observed:
recall is hardly effected with micro averaging (0.300) but goes down equally with macro averaging (0.229). For micro averaging, it does
recall is hardly affected with micro averaging (0.300) but goes down noticeably with macro averaging (0.229). For micro averaging, it does
not matter which class the true positives belong to: every detection
counts the same way. This also means that top \(k\) will have only
a marginal effect: some true positives might be removed without \gls{NMS} but overall that does not have a big impact. With macro averaging, however,
@ -1256,7 +1252,7 @@ recall.
\end{table}
The dropout variants have largely worse performance than the Bayesian variants
without dropout. This is expected as the network was not trained with
without dropout. This is to be expected as the network was not trained with
dropout and the weights are not prepared for it.
Gal~\cite{Gal2017}
@ -1282,7 +1278,7 @@ more than 430 million detections remain (see table \ref{tab:effect-dropout} for
has slightly fewer predictions left compared to the one without dropout.
After the grouping, the variant without dropout has on average between
10 and 11 detections grouped into an observation. This is expected as every
10 and 11 detections grouped into an observation. This is to be expected as every
forward pass creates the exact same result and these ten identical detections
per \gls{vanilla} \gls{SSD} detection perfectly overlap. The fact that slightly more than
ten detections are grouped together could explain the marginally better precision
@ -1316,5 +1312,4 @@ networks.
To facilitate future work based on this thesis, the source code will be
made available and an installable Python package will be uploaded to the
PyPi package index. In the appendices can be found more details about the
source code implementation as well as more figures.
PyPi package index. More details about the source code implementation and additional figures can be found in the appendices.

9
ma.bib

@ -909,4 +909,13 @@ to construct explicit models for non-normal classes. Application includes infere
timestamp = {2019.09.09},
}
@Misc{Chollet2015,
author = {Chollet, Fran\c{c}ois and others},
title = {Keras},
year = {2015},
howpublished = {\url{https://keras.io}},
owner = {jim},
timestamp = {2019.10.04},
}
@Comment{jabref-meta: databaseType:biblatex;}

Loading…
Cancel
Save