\documentclass[12pt,twoside]{scrartcl} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Meta informations: \newcommand{\trauthor}{Jim Martens} \newcommand{\trtype}{Seminar Paper} %{Seminararbeit} %{Proseminararbeit} \newcommand{\trcourse}{Neural Networks} \newcommand{\trtitle}{Catastrophic Forgetting and Neuromodulation} \newcommand{\trmatrikelnummer}{6420323} \newcommand{\tremail}{2martens@informatik.uni-hamburg.de} \newcommand{\trarbeitsbereich}{Knowledge Technology, WTM} \newcommand{\trdate}{09.07.2018} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Languages: % Falls die Ausarbeitung in Deutsch erfolgt: % \usepackage[german]{babel} % \usepackage[T1]{fontenc} % \usepackage[latin1]{inputenc} % \usepackage[latin9]{inputenc} % \selectlanguage{german} % If the thesis is written in English: \usepackage[spanish,english]{babel} \selectlanguage{english} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Bind packages: \usepackage[utf8]{inputenc} % Unicode funktioniert unter Windows, Linux und Mac \usepackage[T1]{fontenc} \usepackage{acronym} % Acronyms \usepackage{algorithmic} % Algorithms and Pseudocode \usepackage{algorithm} % Algorithms and Pseudocode \usepackage{amsfonts} % AMS Math Packet (Fonts) \usepackage{amsmath} % AMS Math Packet \usepackage{amssymb} % Additional mathematical symbols \usepackage{amsthm} \usepackage{booktabs} % Nicer tables %\usepackage[font=small,labelfont=bf]{caption} % Numbered captions for figures \usepackage{color} % Enables defining of colors via \definecolor \definecolor{uhhRed}{RGB}{254,0,0} % Official Uni Hamburg Red \definecolor{uhhGrey}{RGB}{122,122,120} % Official Uni Hamburg Grey \usepackage{fancybox} % Gleichungen einrahmen %\usepackage{fancyhdr} % Packet for nicer headers \usepackage[automark]{scrlayer-scrpage} \usepackage[hidelinks]{hyperref}\urlstyle{rm} %\usepackage{fancyheadings} % Nicer numbering of headlines %\usepackage[outer=3.35cm]{geometry} % Type area (size, margins...) !!!Release version %\usepackage[outer=2.5cm]{geometry} % Type area (size, margins...) !!!Print version %\usepackage{geometry} % Type area (size, margins...) !!!Proofread version \usepackage[outer=3.15cm]{geometry} % Type area (size, margins...) !!!Draft version \geometry{a4paper,body={5.8in,9in}} \usepackage{graphicx} % Inclusion of graphics %\usepackage{latexsym} % Special symbols \usepackage{longtable} % Allow tables over several parges \usepackage{listings} % Nicer source code listings \usepackage{multicol} % Content of a table over several columns \usepackage{multirow} % Content of a table over several rows \usepackage{rotating} % Alows to rotate text and objects \usepackage[hang]{subfigure} % Allows to use multiple (partial) figures in a fig %\usepackage[font=footnotesize,labelfont=rm]{subfig} % Pictures in a floating environment \usepackage{tabularx} % Tables with fixed width but variable rows \usepackage{url,xspace,boxedminipage} % Accurate display of URLs \usepackage{csquotes} \usepackage[ backend=biber, bibstyle=ieee, citestyle=ieee, minnames=1, maxnames=2 ]{biblatex} \addbibresource{bib.bib} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Configurationen: \hyphenation{whe-ther} % Manually use: "\-" in a word: Staats\-ver\-trag %\lstloadlanguages{C} % Set the default language for listings \DeclareGraphicsExtensions{.pdf,.svg,.jpg,.png,.eps} % first try pdf, then eps, png and jpg \graphicspath{{./src/}} % Path to a folder where all pictures are located %\pagestyle{fancy} % Use nicer header and footer \pagestyle{scrheadings} % Redefine the environments for floating objects: \setcounter{topnumber}{3} \setcounter{bottomnumber}{2} \setcounter{totalnumber}{4} \renewcommand{\topfraction}{0.9} %Standard: 0.7 \renewcommand{\bottomfraction}{0.5} %Standard: 0.3 \renewcommand{\textfraction}{0.1} %Standard: 0.2 \renewcommand{\floatpagefraction}{0.8} %Standard: 0.5 % Tables with a nicer padding: \renewcommand{\arraystretch}{1.2} \MakeOuterQuote{"} %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Additional 'theorem' and 'definition' blocks: \theoremstyle{plain} \newtheorem{theorem}{Theorem}[section] %\newtheorem{theorem}{Satz}[section] % Wenn in Deutsch geschrieben wird. \newtheorem{axiom}{Axiom}[section] %\newtheorem{axiom}{Fakt}[chapter] % Wenn in Deutsch geschrieben wird. %Usage:%\begin{axiom}[optional description]%Main part%\end{fakt} \theoremstyle{definition} \newtheorem{definition}{Definition}[section] %Additional types of axioms: \newtheorem{lemma}[axiom]{Lemma} \newtheorem{observation}[axiom]{Observation} %Additional types of definitions: \theoremstyle{remark} %\newtheorem{remark}[definition]{Bemerkung} % Wenn in Deutsch geschrieben wird. \newtheorem{remark}[definition]{Remark} %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Provides TODOs within the margin: \newcommand{\TODO}[1]{\marginpar{\emph{\small{{\bf TODO: } #1}}}} %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Abbreviations and mathematical symbols \newcommand{\modd}{\text{ mod }} \newcommand{\RS}{\mathbb{R}} \newcommand{\NS}{\mathbb{N}} \newcommand{\ZS}{\mathbb{Z}} \newcommand{\dnormal}{\mathit{N}} \newcommand{\duniform}{\mathit{U}} \newcommand{\erdos}{Erd\H{o}s} \newcommand{\renyi}{-R\'{e}nyi} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Document: \begin{document} \renewcommand{\headheight}{14.5pt} %\fancyhead{} %\fancyhead[LE]{ \slshape \trauthor} %\fancyhead[LO]{} %\fancyhead[RE]{} %\fancyhead[RO]{ \slshape \trtitle} \lehead{\slshape \trauthor} \rohead{\slshape \trtitle} %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Cover Header: \begin{titlepage} \begin{flushleft} Universit\"at Hamburg\\ Department Informatik\\ \trarbeitsbereich\\ \end{flushleft} \vspace{3.5cm} \begin{center} \huge \trtitle\\ \end{center} \vspace{3.5cm} \begin{center} \normalsize\trtype\\ [0.2cm] \Large\trcourse\\ [1.5cm] \Large \trauthor\\ [0.2cm] \normalsize Matr.Nr. \trmatrikelnummer\\ [0.2cm] \normalsize\tremail\\ [1.5cm] \Large \trdate \end{center} \vfill \end{titlepage} %backsite of cover sheet is empty! \thispagestyle{empty} \hspace{1cm} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Abstract: % Abstract gives a brief summary of the main points of a paper: \section*{Abstract} Catastrophic forgetting is a huge problem for neural networks, in particular for autonomous systems. This paper will showcase three approaches using diffusion-based neuromodulation and compare them with respect to catastrophic forgetting. The results of the comparison being that modulated random search is not useful to combat catastrophic forgetting, modulated gaussian walk is significantly better and likely useful for single task or combined feedback situations. Localized learning overcomes catastrophic forgetting in a very bespoke setup and more generally could be useful for situations with combined tasks and distinct feedback for each task. % Lists: \setcounter{tocdepth}{2} % depth of the table of contents (for Seminars 2 is recommented) \tableofcontents \pagenumbering{arabic} \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Content: % the actual content, usually separated over a number of sections % each section is assigned a label, in order to be able to put a % crossreference to it \section{Introduction} \label{sec:introduction} Autonomous robots need to adapt to new situations. They have a need to learn for an entire life. In order to do this they need a second environmental feedback loop that tells them when to learn\cite{Toutounji2016}. The learning itself is also described as plasticity. In the context of this paper the definition of synaptic plasticity given by Citri\cite{Citri2008} will be used. In short the process of learning itself, changing the weights, is already considered plasticity. This can occur throughout the lifetime of a network or during the training phase of networks using for example supervised learning and backpropagation. When a network has to adapt to new situations, it has to learn new tasks. Usually the previously learned weights are largely forgotten. This phenomenon is called catastrophic forgetting\cite{French1999,McCloskey1989}. It is highly problematic, because the weights encode the learning of a network. If they are forgotten or rather overwritten the previously learned tasks cannot be fulfilled anymore. Since catastrophic forgetting is a key problem for autonomous learning, it is crucial to overcome it. In this paper I will present some approaches for learning in an autonomous setup to analyse which of them if any can overcome catastrophic forgetting. \section{Catastrophic Forgetting} \label{sec:catastrophicforgetting} French\cite{French1999} did a review of the existing research about catastrophic forgetting. The following paragraphs will follow this review and highlight the major developments in research related to catastrophic forgetting. McCloskey and Cohen\cite{McCloskey1989} originally discovered the problem of catastrophic forgetting, which was referred to as catastrophic interference. This discovery of a fundamental limitation of the classic neural network was as important as the work of Minsky and Papert\cite{Minsky1969} who described the limitations of a perceptron twenty years prior. The key discovery of McCloskey and Cohen was that previously learned patterns were completely forgotten after a few training cycles of learning a new pattern. The reason behind this behaviour was the real problem. They identified the single set of shared weights as responsible for it. If one thinks a few minutes about it this makes absolute sense. The classic backpropagation algorithm works by modifying the weights that have contributed the most to a bad outcome. When the set of targets is changing then the network will perform badly for the new pattern. In order to rectify this the backpropagation algorithm will change many of the weights so that the network is delivering a good result for the new pattern. This on the other hand results in an increasingly worse performance on previously learned patterns. If this worsening was gradual it would still be unfortunate but understandable. It is called catastrophic because this performance change is not gradual but rather abrupt. Even small changes in the weights can have a huge impact on the output. In fact catastrophic forgetting is only a very radical example of a more general problem for all models of memory, the so called "stability-plasticity" problem\cite{Grossberg1982}. This problem, sometimes called dilemma, basically puts up the question how to design a system in such a way that it is both sensitive to new input and not radically disrupted by it. In other words: That it can learn new things without largely or completely forgetting already learned things. Early attempts to alleviate or overcome catastrophic forgetting required a more sparse representation. This means that not every weight is responsible for all possible inputs. The downside is a worse ability to generalize to new input and overall a worse ability to discriminate. In an extreme form this can lead to catastrophic remembering\cite{Sharkey1995}. The idea here is that a network learns the function describing the inputs too well and therefore loses its ability to differentiate between new and already learned input. This can be understood well with the example given by French\cite{French1999}, where a network has the task to reproduce the input at the output. It can detect a new input if the output is diverging by a large margin. It has learned too well if it learned the identity function and is therefore able to reproduce any input perfectly at the output and hence loses the ability to detect new input. Significant improvements were made by rehearsing previously learned input. Robins\cite{Robins1995} found a way to rehearse prior input if it is no longer available and called it "pseudo-patterns". The idea being that the weights of the trained network resemble a function. A random input and the predicted output together somewhat describe this function and are such a pattern. Robins used many of them interleaved with new input and the results were promising as the forgetting became more gradual. This insight together with the findings of McClelland\cite{McClelland1995} resulted in the development of dual-network models. In short one network would model the hippocampus and be able to quickly learn new information without disrupting previously learned regularities. This network would then serve as teacher for the second network which models the neocortex and is responsible for generalizing. In the time between 1999 and 2018 more work was done with regards to catastrophic forgetting. Most recently the work of Kirkpatrick\cite{Kirkpatrick2017}, who slows down the learning for weights of older tasks, Velez\cite{Velez2017}, whose work will be showcased later, and Shmelkov\cite{Shmelkov2017}, who introduces a loss function that intends to keep catastrophic forgetting at bay, has to be named. \section{Plasticity} \label{sec:plasticity} Catastrophic forgetting requires learned weights that can be forgotten. Every neural network learns and therefore deals with plasticity, given our definition of it. In this section three approaches for plasticity using diffusion-based neuromodulation are presented in more detail. Modulated Random Search and Modulated Gaussian Walk are using linearly modulated neural networks. They are taken from Toutounji and Pasemann\cite{Toutounji2016}. The third approach was introduced by Velez and Clune\cite{Velez2017} and uses diffusion-based neuromodulation for localized learning hence the name of the subsection here. \subsection{Modulated Random Search} \label{subsec:mrs} \subsubsection*{Modulated Neural Network} Since both approaches from Toutounji and Pasemann are using linearly-modulated neural networks the structure of these networks is described first. Linearly-modulated neural networks (LMNN) are a specific variant of modulated neural networks (MNN). Any artifical neural network (ANN) or simply neural network in the context of Computer Science can become a modulated neural network by adding a neuromodulator layer. This neuromodulator layer is the second environmental feedback loop mentioned earlier. Toutounji and Pasemann describe a variant of this layer that uses neuromodulator cells (NMCs). Each NMC produces a specific type of neuromodulator (NM) and saves its own concentration level of it. The network wide concentration level at a certain point in space and time can be obtained by summing up all concentration levels saved in NMCs at the point in space. Produced neuromodulators usually impact nearby network parts. This type of spatial impact requires a spatial representation in the network where all network elements have a location in the space. There are a production and a reduction mode for the NMCs. During the production mode the concentration of neuromodulator can be increased and during reduction mode it can be decreased. A cell can enter production mode if it was stimulated for some time while it falls back to reduction mode when this stimulation does not happen for some time. \subsubsection*{Linearly-Modulated Neural Network} A linearly-modulated neural network uses discrete time and stimulates NMCs with a simple linear model. Each NMC is connected to a carrier cell or neuron which itself is part of a modulatory subnetwork. The NMC is stimulated if the output of the carrier neuron is within a specified range (\(\text{S}^{\text{min}}\), \(\text{S}^{\text{max}}\)). In every time step is checked if the output of the carrier cell is high enough to stimulate the NMC. If it is the stimulation level of the NMC increases. Otherwise it decreases. Once the stimulation level reaches the threshold \(\text{T}^{\text{prod}}\) the cell goes into the production mode. If it falls below \(\text{T}^{\text{red}}\) the cell goes back into reduction mode. Over time the neuromodulator diffuses to the surrounding control subnetwork where it initiates plasticity that is dependent on the concentration of it at the respective synapse. \subsubsection*{Modulated Random Search} \begin{table} \begin{tabular}{l|l} \textbf{Parameter} & \textbf{Description} \\ \(Type\) & The neuromodulator type the synapse is sensitive to \\ \(W\) & Weight change probability \\ \(D\) & Disable / enable probability \\ \(W^{min}, W^{max}\) & Minimum and maximum weight of synapse \\ \(M\) & Maximum neuromodulator sensitivity limit of the synapse \end{tabular} \caption{Parameters stored for each synapse. Replication of Table 1 in Toutounji and Pasemann\cite{Toutounji2016}.} \label{tab:mrs-synapse} \end{table} Modulated random search means essentially random weight changes. Each synapse \(i\) has some parameters that are used (see table \ref{tab:mrs-synapse}). The weight change probability \(p_i^w\) at time \(t\) is the product of the intrinsic weight change probability \(W_i\) and the concentration of the neuromodulator the synapse is sensitive to \(c(t, x_i, y_i)\) at its location \((x_i, y_i)\). Additionally the maximum neuromodulator sensitivity \(M_i\) is the ceiling for the second part of that product \eqref{eq:weightchangeprob}. This means there is a maximum weight change probability for each synapse. Weight changes can happen at any time step. Therefore the intrinsic weight change probability has to be very small. Should a weight change occur a new weight \(w_i\) is chosen randomly from the interval \([W_i^{min}, W_i^{max}]\). The weight change probability \(p_i^w\) tells the network when to learn and leaves room for variation as it is a probability and not a binary learn/do not learn situation. Within this example this probability is the so called second environmental feedback loop. \begin{equation}\label{eq:weightchangeprob} p_i^w = min(M_i, c(t, x_i, y_i)) \cdot W_i,\; 0 < W_i \lll 1 \end{equation} Moreover a synapse can disable or enable itself. The actual disable/enable probability \(p_i^d\) is the product of the intrinsic value \(D_i\) saved as parameter and the neuromodulator concentration \(c(t, x_i, y_i)\) \eqref{eq:enableprob}. The concentration is again ceiled by the maximum sensitivity limit \(M_i\) given as parameter. This means there is a maximum disable/enable probability as well. The intrinsic enable/disable probability must be smaller than the intrinsic weight change probability. A disabled synapse is treated as having weight 0 but the actual value is stored so that it can be restored when the synapse is enabled again. \begin{equation}\label{eq:enableprob} p_i^d = min(M_i, c(t, x_i, y_i) \cdot D_i,\; 0 \leq D_i < W_i \end{equation} Given a so called neural network structure or substrate this makes it easier to find different network topologies (structure and weights combined). \subsection{Modulated Gaussian Walk} \label{subsec:mgw} The modulated gaussian walk is introduced by Toutounji and Pasemann. The key differences start with the parameters. There is no maximum sensitivity for the neuromodulator concentration. When a weight change occurs the new weight is not chosen randomly but rather the difference to be added to the current weight is sampled from a normal distribution with a mean of zero and \(\sigma^2\)-variance \eqref{eq:gausswalk}. The sampled value could be infinitely large and hence the new weight outside of the given bounds for it. Therefore the value is sampled until the sum of the current weight and the sampled value are within the interval \([W_i^{min}, W_i^{max}]\). \begin{equation}\label{eq:gausswalk} w_i (t + 1) = w_i (t) + \Delta w_i \;\text{where}\; \Delta w_i \sim \mathcal{N}(0, \sigma^2) \end{equation} Toutounji and Pasemann implemented a mechanism for disabling synapses in the modulated gaussian walk as well but did not make use of it later and therefore they did not describe how it works. \subsection{Localized learning} \label{subsec:diffusion} Velez and Clune are using a small network to solve the foraging task. The network represents an agent that has a lifetime of three years. Each year consists of the seasons summer and winter. During each season the agent is presented with food and has to either eat the food or not. Half of the food is nutritious and the other poisonous. The target is a fitness value which is best if the agent eats all nutritious food and none of the poisonous. The associations of nutritious and poisonous are different between summer and winter but within a season remain the same during the lifetime. Therefore a nutritious food in summer will always be nutritious. This setup makes it easy to measure if the agent is able to remember the learned associations from the previous seasons. The initial weights of the network are derived from an evolutionary algorithm. All later learning uses neuromodulation. The neurons of the network are spatially located and there are two sources of neuromodulators in the network - one on either side. The sources are only active in their respective season and encode whether the previously eaten food was nutritious (1) or poisonous (-1). If they are not active their value is zero. As soon as the sources are activated the neuromodulators fill a space within a radius of 1.5 units of distance from the source and potentially trigger weight changes of neurons inside the radius. The strength of the neuromodulators is decreasing with further distance from the source. The sources are the second environmental feedback loop in this example as they tell the network or a part of it when to learn. How does the actual learning happen? The weight change between two neurons is dependent on the activation of both neurons, the learning rate and the concentration of neuromodulators \eqref{eq:hebbian}. In short Hebbian learning is employed. \begin{equation}\label{eq:hebbian} \Delta w_{ij} = \eta \cdot m_i \cdot a_i \cdot a_j \end{equation} This explanation should suffice for the general understanding of their method. The neurons within the vicinity of these sources only update their weights in one of the seasons. Therefore they only learn for one season and are unaffected by the other season. This results in a localized learning. \section{Comparison regarding catastrophic forgetting} \label{sec:comparison} Given the presentations of the three approaches it is interesting to compare them with regard to their ability to mitigate or overcome catastrophic forgetting. For both the modulated random search and the modulated gaussian walk this aspect was analyzed in the experiments conducted by Toutounji and Pasemann\cite{Toutounji2016}. Therefore the results of their work will be utilized for this comparison. Velez and Clune\cite{Velez2017} introduced the presented approach of localized learning to analyze its capability with respect to overcoming catastrophic forgetting. Hence their results will be used for the comparison in this section. Over multiple experiments of increasing difficulty the performance of modulated random search and modulated gaussian walk were tested. The difficulty ranged from a positive light-tropism task in the first experiment (E1) over an obstacle-avoidance task in the second experiment (E2), a combination of E1 and E2 in the third experiment (E3) to a more difficult variant of E3 in the fourth experiment (E4). The fifth experiment (E5) was a pendulum experiment. In each experiment a robot had to learn the task from scratch. A pre-designed LMNN was given in each case and defined the boundaries in which the learning took place. If a temporary solution was discarded the learning started again. Modulated random search was able to find successful behaviours in almost all cases in E1 despite a short training time of only two hours. The slightly longer training time for E2 of four hours however was apparently far too short to find consistently good solutions. Both in E3 and E4 the number of intermediate temporary solutions is significantly higher than the final number of solutions. The pendulum experiment was an easier task and therefore many successful behaviours were found. Toutounji and Pasemann note that even almost stable networks are destroyed if they have the slightest weakness. Therefore modulated random search does not help at all against catastrophic forgetting. Modulated gaussian walk contrary to the random search tends to improve temporary solutions when they have weaknesses. For E3 the random search resulted in 34 temporary solutions which lasted longer than five minutes, averaging at \(5.7\) minutes per solution. The gaussian walk found roughly twice that many temporary solutions and averaged at \(12.5\) minutes per solution. This indicates that gaussian walk mitigates catastrophic forgetting although it does not completely remove it. After the experiments related to modulated random search and modulated gaussian walk the experiment related to localized learning is described. The experiment setup for the localized learning approach was already mentioned. After performing some tests Velez and Clune discovered that two functional modules formed. One set of connections is learning during sommer and the other during winter. The connections learning in summer do not change in winter and vice versa. This completely removes catastrophic forgetting. If catastrophic forgetting is the only measurement then localized learning seems to be the supreme solution to the problem. But Velez and Clune only showed that it works in a very bespoke setup which a priori information about the linear separability of the learning areas and correct solution. It has yet to be shown that localized learning can be generalized to larger problems. Modulated random search can be completely discarded as a potential solution. Modulated gaussian walk is a clear improvement compared to the random search in the analyzed experiments. While all three approaches used diffusion-based neuromodulation the first two and the third are quite different in their setup. First the general neuromodulation architecture was different (each neuron can diffuse neuromodulators vs. two stationary sources) and second the actual weight change was also different. Both modulated gaussian walk and the shown localized learning are improving previous weights instead of completely changing them. The gaussian walk is using a normal distribution to get the weight change while the localized learning uses Hebbian learning and therefore is dependent on the activations of two neurons and is directly incorporating the neuromodulators in the weight change formula itself. It is important to note that the advantage of gaussian walk had nothing to do with the architecture of the neuromodulation as that was identical for both random search and gaussian walk. The improvement originated in the learning rule. In the experiments of Toutounji and Pasemann they used a homogenous diffusion but it would have been possible to use different diffusion strengths, decays and so on for every neuron. For the future it would be interesting to compare the LMNN architecture with the "sources" architecture of localized learning to understand the impact of the neuromodulation architecture. In addition the gaussian walk learning rule should be compared with the Hebbian learning rule used by localized learning. For the E3 experiment used by Toutounji and Pasemann it is safe to assume that the kind of a priori placement of neuromodulator sources won't work. The experiment requires the robot to solve two tasks at the same time: It has to approach the lights and avoid obstacles. Since both tasks need to be solved at the same time and always it is not possible to devise two "seasons" or some similar separation of learning time. Therefore the LMNN architecture is likely better suited. Nevertheless it would make sense to separate the learning for these two tasks as a robot might already be very good at approaching lights but only mediocre at avoiding obstacles. In that case the improvements for the second task should not impact the first task. The Hebbian learning rule is likely better suited to achieve this effect as it correlates the weight change with the correlation of the connected neurons. Simply using a value sampled from a normal distribution as the gaussian walk does it, probably does not result in localized learning. On the other hand localized learning will likely only work if it is possible to give the robot distinct feedback about its performance in each task. If it only receives a combined feedback it is more difficult to utilize localized learning as it is then not easy to find out which part (and therefore which weights) performed bad. In situations where there is only one task to solve (E2) or the feedback is only given as a total without distinct information about each sub task it is very likely that localized learning won't work and therefore gaussian walk is better suited than Hebbian learning. \section{Conclusion} \label{sec:concl} The second environmental feedback loop is used to tell autonomous systems when to learn. However the mere existence of such a loop is not enough. It matters how this feedback loop is working and how it is connected with the rest of the network. The weight change probability of both modulated random search and modulated gaussian walk is the second environmental feedback loop but it was shown that these two approaches are vastly different in their performance. Therefore it is equally important how the learning actually works. The comparison has shown that localized learning utilizing neuromodulator sources can overcome catastrophic forgetting for small networks in a very restricted setup. Furthermore the comparison revealed that modulated random search is not part of a solution to catastrophic forgetting. In a more general case it is likely that the LMNN architecture is better than the sources architecture and that Hebbian learning is better suited for combined tasks and localized learning than modulated gaussian walk. For single task environments or those where localized learning is not an option modulated gaussian walk is likely better suited than Hebbian learning. Future work should look into the assumptions that were taken here and analyze which network architecture is better and which learning rule is better for the kind of autonomous robot experiments that were conducted by Toutounji and Pasemann. In general the applicability of localized learning to bigger problems for example in the area of deep neural networks should be researched. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % hier werden - zum Ende des Textes - die bibliographischen Referenzen % eingebunden % % Insbesondere stehen die eigentlichen Informationen in der Datei % ``bib.bib'' % \newpage \printbibliography \addcontentsline{toc}{section}{Bibliography}% Add to the TOC \end{document}