Modelling Immediate Serial Recall using a Bayesian Attractor Neural Network

(1)

(2)

Attractor Neural Network

JULIA ERICSON

Master’s Programme, Machine Learning, 120 credits Date: February 28, 2021

Supervisor: Pawel Herman Examiner: Erik Fransén

School of Electrical Engineering and Computer Science

Swedish title: Modellering av sekventiellt korttidsminne med hjälp

av ett autoassociativt Bayesianskt neuronnätverk

(3)

© 2021 Julia Ericson

(4)

Abstract

In the last decades, computational models have become useful tools for studying biological neural networks. These models are typically constrained by either behavioural data from neuropsychological studies or by biological data from neuroscience. One model of the latter kind is the Bayesian Confi- dence Propagating Neural Network (BCPNN) - an attractor network with a Bayesian learning rule which has been proposed as a model for various types of memory. In this thesis, I have further studied the potential of the BCPNN in short-term sequential memory. More specifically, I have investigated if the network can be used to qualitatively replicate behaviours of immediate verbal serial recall, and thereby offer insight into the network-level mechanisms which give rise to these behaviours. The simulations showed that the model was able to reproduce various benchmark effects such as the word length and irrelevant speech effects. It could also simulate the bow shaped positional accuracy curve as well as some backward recall if the to-be recalled sequence was short enough. Finally, the model showed some ability to handle sequences with repeated patterns. However, the current model architecture was not sufficient for simulating the effects of rhythm such as temporally grouping the inputs or stressing a specific element in the sequence. Overall, even though the model is not complete, it showed promising results as a tool for investigating biological memory and it could explain various benchmark behaviours in immediate serial recall through neuroscientifically inspired learning rules and architecture.

Keywords

Bayesian Confidence Propagating Neural Network, Phonological Loop,

Computational model, Immediate serial recall

(5)

(6)

Sammanfattning

Under de senaste årtionden har datorsimulationer blivit ett allt mer populärt verktyg för att undersöka biologiska neurala nätverk. Dessa modeller är vanligtvis inspirerade av antingen beteendedata från neuropsykologiska stud- ier eller av biologisk data från neurovetenskapen. En modell av den senare typen är ett Bayesian Confidence Propagating Neural Network (BCPNN) - ett autoassociativt nätverk med en Bayesiansk inlärningsregel, vilket tidigare har använts för att modellera flera typer av minne. I det här examensarbetet har jag vidare undersökt om nätverket kan användas som en modell för sekventiellt korttidsminne genom att undersöka dess förmåga att replikera beteenden inom verbalt sekventiellt korttidsminne. Experimenten visade att modellen kunde simulera ett flertal viktiga nyckeleffekter såsom the word length effect och the irrelevant speech effect. Däröver kunde modellen även simulera den bågformade kurvan som beskriver andelen lyckade repetitioner som en funktion av position, och den kunde dessutom repetera korta sekvenser baklänges. Modellen visade också på viss förmåga att hantera sekvenser där ett element återkom senare i sekvensen. Den nuvarande modellen var däremot inte tillräcklig för att simulera effekterna som tillkommer av rytm, såsom temporär gruppering eller en betoning på specifika element i sekvensen. I sin helhet ser modellen däremot lovande ut, även om den inte är fullständig i sin nuvarande form, då den kunde simulera ett flertal viktiga nyckeleffekter och förklara dessa med hjälp av neurovetenskapligt inspirerade inlärningsregler.

Nyckelord

Bayesian Confidence Propagating Neural Network, Fonologiska loopen,

Datorsimulation, Sekventiellt korttidsminne

(7)

(8)

Acknowledgments

I would like to thank my supervisor Pawel Herman who has given me very valuable feedback, insights to the process of scientific research and plenty of interesting discussions. I would also like to thank Anders Lansner who has shown a large interest in the projects and joined us for discussions. Your inputs have been very appreciated.

Stockholm, February 2021

Julia Ericson

(9)

(10)

Chapter 1 Introduction

Modelling biological memory is challenging in many ways. Unlike most artificial networks, the brain needs to be adaptable to a continuously changing environment and an unlimited number of training samples without exceeding its storage capacity (Sandberg et al. 2002). Furthermore, a computational model of biological memory should not only imitate this behaviour but it should also have a certain level of biological plausibility. Naturally, even though an artificial network can exhibit the same properties as a particular memory structure, it does not necessarily mean that the model suffices from a neuroscientific point of view. Several constrains should therefore be placed on a computational model. First of all, the architecture and learning rules should be inspired by biological memory and offer an adequate mechanistic understanding for the memory system under investigation. Secondly, if it is assumed that the neural architecture is similar across different subsystems in the brain, a model should be given more credibility if it is able to accurately simulate multiple types of memory without substantially changing its underlying architecture (Sandberg 2003).

A group of memory models which have considerable support from biological data are associative attractor memory models with Hebbian-type learning rules, which accounts for the mechanisms of biological synaptic plasticity.

One such model is the Bayesian Confidence Propagating Neural Network

(BCPNN), where the Hebbian-based learning algorithm uses Bayesian infer-

ence to store information as estimated posterior beliefs (Martinez 2019a). In

its incremental version (Sandberg et al. 1999), new information can over-

write the currently stored memories, meaning that the network is able to

(13)

adapt to a changing environment without exceeding its storage capacity. This makes it a suitable model for palimpsest memory, and the BCPNN has shown promising results when modelling both short-term and long-term memory.

For example, Lanser et al. (2013) illustrated how it could reproduce the error patterns of free word-list recall, and Sandberg (2003) showed that the model could simulate both mental aging and working memory by changing parameter values. Furthermore, in 2016 Tully et al. demonstrated how the network could store sequences of patterns by modifying a time constant which gave rise to asymmetrical connections in the associative attractor network. This sequential version has thereafter been used to explain how biological memory could distinguish between overlapping patterns, considered one of the most important properties of sequential memory (Martinez 2019b).

In this thesis, I have studied if the BCPNN is able to simulate the behaviours observed during neuropsychological studies on immediate serial recall of verbal sequences. According to psychological evidences, the ability to imme- diately recall short sequences of speech resides in a working memory device referred to as the phonological loop, which temporally stores the information in phonological codes (Baddeley & Hitch 1974). The phonological loop has successfully been able to qualitatively explain various benchmark effects of immediate serial recall; however, it lacks a more detailed description of the neural mechanisms that create the storage codes (Baddeley & Hitch 2019, Hurlstone et al. 2014). In recent years, many attempts have been made to conceptualise these mechanisms through computational models (eg. Burgess

& Hitch 1999, Hartley et al. 2016, Page & Norris 1998) but to my knowledge,

none of the more recognized models use associative attractor networks. In

fact, associative attractor networks such as the BCPNN, which link together

neighbouring items in the sequence, have generally been rejected as models for

the phonological loop due to difficulties repeating behavioural observations

(Baddeley & Hitch 2019). Thus, this thesis serves two purposes. First

of all, it investigates if the mechanisms of immediate serial recall could in

fact be explained using an associative attractor neural network. Secondly,

it contributes to a better understanding of how well the BCPNN can model

sequential memory in general. Essentially, if the network can simulate the

phonological loop, it is likely that it will be able to simulate other types of

short-term sequential memory as well.

(14)

1.1 Problem Statement

The aim of the thesis is to investigate which behavioural effects of immediate serial recall the BCPNN can qualitatively reproduce. The behavioural effects refer to observations from immediate serial recall tasks performed by human subjects, reported in the cognitive neuroscience and psychology research.

The model is expected to offer insights into network-level mechanisms and underlying neural phenomena which give rise to these short-term sequential memory effects.

1.2 Scope and Delimitations

In order to address the problem statement, I have assumed that the abstract BCPNN model, developed largely based on the attractor hypothesis of short- term memory, connects phenomenologically to mesoscopic biological mecha- nisms at the level of neural networks and synapses. The behaviour of the model was then studied under the influence of noise with the aim of reproducing the error patterns of immediate serial recall presented by neuropsychological research. The error pattern was explored under several different conditions to investigate which behavioural effects the BCPNN could account for. The conditions were specifically chosen because the error patterns which they produce are considered important for understanding immediate serial recall as a whole. These conditions are:

i. the number of patterns in the input sequence ii. the level of noise

iii. sequences with repeated patterns iv. the rhythm of the input sequence

v. backward repetition.

A challenge when evaluating the accuracy of the simulated behaviour is the

fact that there does not exist an exact quantitative answer to what the error

pattern should look like. Even though patterns in immediate serial recall

clearly exist, each neuropsychological study will report sightly different results

due to experimental setup and variability in the population. Therefore, the

comparison with neuropsychological data was qualitative and my aim was

(15)

to investigate if the model could exemplify behavioural trends in immediate serial recall rather than to quantitatively compare the results with a specific neuropsychological experiment.

1.3 Thesis Outline

Chapter 2 describes the background to this thesis, which includes a review of

the phonological loop and the behavioural patterns in immediate serial recall,

as well as a derivation of the BCPNN learning rule. Chapter 3 gives details

on the experimental setup and methodology to this investigation. Chapter 4

describes the results of the investigations. For each investigation, the expected

results for the experiment, referring to qualitative behavioural patterns which

have been highlighted in neuropsychological experiments, are first of all

revised. Thereafter, the method of investigation is specified and finally, the

actual results are presented and discussed. In Chapter 5, a general discussion

is provided. Here, the model is evaluated both as a phonological loop model as

well as a more general model for sequential memory. Furthermore, the chapter

provides a critical evaluation of the experimental approach. In Chapter 6, the

findings are summarized and future research is discussed.

(16)

Chapter 2 Background

The background to this thesis is divided up into two parts. The first part consists of a review of the current knowledge of the phonological loop, its involvement in immediate serial recall as well as previous computational models of immediate serial recall. In the second part, a derivation of the sequential BCPNN learning rule is presented together with biological interpretations of the model.

2.1 Phonological loop

2.1.1 Baddeley’s Model of Working Memory

The currently dominant model of working memory is Baddeley’s Multi-

component Model, which was introduced in 1974 by Baddeley and Hitch. The

model has since been expanded and refined but the core idea remains - working

memory is made up of several different components which together carry out

an extensive range of cognitive processes such as language comprehension,

problem solving and long-term learning (Baddeley & Hitch 1974). The

phonological loop is one of the components in Baddeley’s model, responsible

for storing information in phonological codes. Information processed by

the loop was initially thought to be restricted to auditory inputs. However,

the model was later expanded to include other types of inputs, such as

visual stimuli, converted to phonological codes through an inner voice. The

phonological loop has a very limited storage capacity and the code starts

(17)

to decay within a few seconds if it is not reactivated by subvocal rehearsal.

Figure 2.1 shows an early model of the phonological loop that includes two components - a phonological store and a unit responsible for the rehearsal process. It should be mentioned that the model has gradually been extended in order to satisfy more recent empirical data. These modifications include a unit that transform the phonological code into vocalisation and another unit that converts visual information into articulation code. However, the original two components have remained at the core of the model and will be the focus of this study (Baddeley & Hitch 2019).

Figure 2.1: Original model of the phonological loop (Gathercole 2008)

2.1.2 Evidence and Function

There is substantial empirical evidence of both a phonological storage code

as well as a subvocal rehearsal unit. First of all, experiments have shown

that a sequence of phonologically similar letters such as b-t-v-p-c is more

difficult to recall compared to the dissimilar letters r-w-y-k-f, supporting

the theory of a phonological storage code. This is called the phonological

similarity effect (Gathercole 2008). Furthermore, the phonological loop is

sensitive to background noise, such as irrelevant speech or music, which

suggests that any type of sounds have access to the store. This effect is

referred to as the irrelevant speech effect, or alternatively the articulatory

suppressing effect if the test subjects themselves are responsible for creating

a background noise by repeating an irrelevant sound. Both types of noise

affect the phonological store; however, articulatory suppression is the most

disruptive (Neath 2000). Secondly, the word length effect provides evidence

for a subvocal rehearsal process. It has been demonstrated that the recall of

(18)

a sequence of long words is more challenging than the recall of a sequence of short words. In fact, the phonological store seems to be limited by the number of words that can be articulated within approximately two seconds.

Thus, it is hypothesised that if the sequence exceeds this limit, the code decays before it can be refreshed using subvocal rehearsal (Baddeley 2016).

In coherence with this theory, experiments have shown that irrelevant speech also disrupt the rehearsal process, supporting the idea that new phonological information over-writes old information in the storage (Neath 2000). This effect is especially noticeable during articulatory suppression, explained by the fact that articulatory suppression completely removes the possibility of subvocal rehearsal (Gathercole 2008).

While there is evidence for the existence of a phonological loop, its purpose did for long remain unknown. Even though serial recall of words and digits has been the main method used to study the phonological loop, there is clearly no evolutionary advantage in the ability to remember sequences of unrelated words itself. Baddeley and Hitch (1974) hypothesised that the phonological loop could be needed for verbal processing; however, patients with short-term phonological impairments showed no reduction in either verbal comprehension or sentence construction (Gathercole et al. 1998).

In 1988, Baddeley et al. began to test recalling capabilities in unfamiliar languages instead. In this case, patients with short-term phonological impair- ments performed significantly worse than the control groups and could even struggle to learn a single new word. A similar effect was also observed in children’s language development. Children who performed well in non-word immediate serial recall tests generally had a larger vocabulary. Thus, there appeared to be a connection between the phonological loop and long-term learning of unfamiliar sound patterns (Gathercole et al. 1998).

If the main purpose of the phonological loop is to store unfamiliar sound

patterns, the effects observed in word and digit recall should be even more

evident in serial non-word recall. This was also proven to be the case. For

example, Papagano et al. (1991) asked participants to remember pairs of

unrelated words under articulatory suppression. The pairs were either two

words in the native language of the participant, or one word in the native

language and one word in a foreign language. The participants had few

problems remembering the pairs with two words in the native language, but

were unable to learn the pairs which included a foreign language. Similar

results were later observed using the phonological similarity and word length

effect (Papagano & Vallar 1992). Furthermore, participants that were given

(19)

words from a foreign language that was phonologically similar to their own language experienced less difficulty remembering the word pairs. These results are consistent with later studies which showed that non-words that resemble words are easier to recall compared to other phonological patterns.

In 1998, Gathercole et al. summarized the experimental results on the phono- logical loop and language development. They concluded that, although the phonological loop can be used to recall sequences of non related words, its primary function is to store unknown phonological patterns over a short period of time. They argue that this sort of temporal storage should also be necessary in order to avoid overloading the long-term memory. Instead of remembering every new sound, the brain can use prior knowledge to detect repeated patterns from codes which have been stored temporarily in the phonological loop. These relevant patterns can then gradually be coded into the long-term memory if needed.

2.1.3 Serial Order Effects in Recall

A common way to study the mechanisms of the phonological loop is to use behavioural data from neuropsychological experiments, which have brought attention to several recalling patterns that occur during sequential recall.

First of all, the recalling accuracy as a function of serial position has an asymmetrical bow shape. The accuracy is high at the beginning of the recall sequence and then gradually declines until the last few items where the accuracy increases again. The effects responsible for the high accuracy in the beginning and the end of the sequence are referred to as the primacy and recency effects. Secondly, transposition errors are more common than omission errors. In other words, it is more likely for patterns to be recalled in the wrong order rather than being completely forgotten. Furthermore, the transposition errors are typically local, meaning that the frequency of transposition errors are highest in nearby positions (Gathercole 2008). Thirdly, presenting a sequence with a regular rhythm (e.g. 529-048 instead of 529048) increases the total accuracy in the same time as it introduces interpositional errors. This refers to patterns being recalled in the wrong group but at the right position within the group (e.g. 549-028 instead of 529-048). Moreover, grouping transforms the accuracy function into several bow shapes, one for each group (Ryan 1969, Hartley et al. 2016).

Another important aspect and a challenge for any type of sequential learning

algorithm is to understand how repeated patterns are processed. Pattern

(20)

repetitions were originally thought to increase the error rate in items following the repetitions. However, this was not supported by experimental observations, which instead pointed towards a reduction of accuracy in the repeated pattern itself rather than in those that followed. Furthermore, the second repetition in generally more affected than the first (Baddeley and Hitch 2018). This observation is called the Ranschburg effect after the work of Paul Ranschburg in 1902 and it has been considered a key to understanding how repeated patterns are dealt with in the brain (Jahnke 1969). The results have been replicated on numerous occasions as for instance by Jahnke in 1969, who showed that the accuracy of the repeated patterns would vary depending on the length of the pause between presentation and recall. Figure 2.2 illustrates examples of experimental data on the serial effects mentioned (Hartley et al. 2016). Figure 2.2A exemplifies the positional accuracy curves for both grouped and ungrouped sequences as well as the transposition distance of the items that were recalled at the wrong position. Figure 2.2B shows the experiment on repeated patterns performed by Jahnke (1969). As previously mentioned, every experiment shows slightly different results and thus, these figures provide an idea of what the results could look like rather than claiming that the model should behave in the exact same way.

Finally, it is important to mention that even though most studies on immediate

serial recall are based on word, digit or letter recall, the phonological loop is

as previously mentioned believed to above all be used as a language learning

device. Consequently, using data from non-word repetition is perhaps more

appropriate. Unfortunately, the amount of data on non-word repetition is

considerably smaller than the amount data on word repetition and therefore, it

is difficult to rely solely on non-word data. Yet, I still believe that it is valuable

to be aware of these findings and the differences between word and non-word

recall. First of all, experiments have demonstrated that the frequency of errors

types could differ. In word, letter and digit recall transposition errors are

the most common, while in non-word recall, Saint-Aubin and Poirier (2000)

showed that omissions were the most common type of error. In fact, the order

memory seem to improve in nonword recall, even though this is overshadowed

by the fact that the item memory is considerably worse. Secondly, the impact

of rhythm has also been investigated in non-word recall, but using syllable

stress instead of grouping (Gutpa et al. 2005). Interestingly, the recall pattern

that emerged was similar to that of grouped patterns, where the recall accuracy

function consists of several bow shapes. A summary of the results is illustrated

in figure 2.3.

(21)

Figure 2.2: Examples of recalling patterns from three different neuro- psychological experiments. (A) A comparison between ungrouped and grouped inputs. The recalling accuracy as a function of position (bottom) and the portion of transposition errors as a function of transposition distance (top) for input sequences with different rhythms. (Hartley et al. 2016). (B) The positional recalling accuracy for a sequence of digits. The sequence can either include no repeated items, or an item that is repeated at position 2 and 4, or 3 and 5. The data is divided up into two groups where the subjects either have a high or low ability for serial recall. In the graphs on the top, recall starts directly. In the graphs in the middle and on the bottom recall start after 2 and 4 seconds respectively. To avoid repetition before recall, articulatory suppression was used (Jahnke 1969).

2.1.4 Backward Recall

Even though backward recall seems to lack an evolutionary value, many

believe that it is an important field of research that could potentially help us

to understand working memory on a broader level (Norris et al. 2019). Both

verbal and spatial backward recall have been areas of interest, but the focus

here will lie on verbal recall. There are various theories on how verbal recall is

performed. For example, the peel-off strategy suggests that backward recall is

(22)

Figure 2.3: The positional accuracy curve for two neuropsychological experiments on immediate serial recall. Top: Recall of nonwords with seven syllables under articulary suppression. Bottom: The difference between word and syllable recall under articulary suppression. The presentation speed of the syllables is modified to have the same speed as the words (Gupta et al. 2005).

performed using multiple forward recalls where the last item is peeled off each

time. There is also a grouped peel-off strategy hypothesizing that the sequence

is divided into groups during backward recall. In this case, each group is

recalled in backward order but the items within the group are remembered

using a peel-off strategy. Moreover, other studies point towards spatial

memory being used. In a neuropsychological study by Norris et al. (2019),

participants were asked what sort of strategy they used. Interestingly, several

different answers were reported and most participants stated that they used a

combination of different strategies. Thus, there is no clear understanding of the

(23)

backward recalling procedure. However, regardless of strategy, neuroscientific studies have demonstrated that backward recall is a more high demanding task which activates multiple areas in the brain (Manan et al. 2014).

The behavioural effects of backward serial recall also remain unclear. Several studies report no accuracy difference between the recall directions (Bireta et al. 2010) or even higher accuracy for backward recall (Guérard et al.

2012). However, the majority of studies demonstrate that backward recall is indeed less accurate (Donolato et al. 2017). Moreover, it is also debated on whether or not the typical benchmark effects of forward serial recall (the word length, similarity, articulatory suppression and irrelevant speech effects) show in backward recall. Some studies claim that the effects are removed or diminished (Bireta et al. 2010) while other studies claim that they do not depend on recall direction (Guérard et al. 2012). A possible reason for these contradicting result could be differences in experimental setups such as list type or sequence length, in combination with the fact that many strategies are potentially used during backward recall (Norris et al. 2019, Hurlstone et al. 2014). However, several observations also remain consistent. First of all, the shape of the accuracy function across positions is reversed, with an enhanced recency effect and a reduced primacy effect (Hurlstone et al. 2014).

Secondly, backward recall is generally slower than forward recall (Norris et al.

2019).

2.1.5 Computational Models of the Phonological Loop

In the last decades, computation brain models have become useful tools for evaluating and rejecting hypothesis regarding neural architectures and mechanisms. The procedure usually consists of inducing noise into a model and comparing its error pattern with empirical data from neuropsychology.

For serial recall, three different types of models have been proposed - chaining associations, context associations and ordinal relative markers (Henson 1996).

Chaining associations refer to models where items are chained to their neigh-

bours such that the activation of one item causes the next item in the chain

to activate. This theory traces back to Hebb’s initial proposal of the Hebbian

learning rule (see section 2.2.1), and it has gained substantial support from

neuroscientific data over the years (Botvinick & Plaut 2006). In the chaining

theory’s simplest form, each item is only linked to the adjacent ones. However,

such a model is not able to deal with repeated pattern, nor can it simulate

transposition errors. This has been one of the largest concerns with the

(24)

chaining theory. In a compound chaining model on the other hand, where the items are also chained with more remote links, information is received from a number of preceding items and not only the nearest neighbour. Therefore, the above mentioned problems are less of an issue. Nevertheless, the chaining theory has largely been rejected as a model for the phonological loop, mostly due to the fact that simpler models can explain serial order patterns equally well and that proposed compound chaining models have often required heavy training regimes (Henson 1996, Baddeley & Hitch 2019).

Today, most models are based on either context associations or ordinal relative markers (Baddeley & Hitch 2019). Both types of models assume that the sequence is retrieved in two stages in order to easily handle repeated items. The first stage selects a position and the next stage retrieves the phonological code from the chosen position. The key difference between the two-stage models is the nature of the positional code (Hartley et al. 2016). In ordinal models, the relative strengths of the positional nodes define the order in which the items are repeated. The first node is given the highest activation strength and thereafter, the strength weakens across positions, creating a primacy gradient.

This naturally gives rise to transposition errors as well as a primacy and recency gradient due to less competition from neighbours at the beginning and end of the sequence. However, the model has difficulties explaining the grouping effects (Page & Norris 1998, Henson et al. 1998). In contrast to ordinal relative markers, the context association approach assumes that the positional codes, also called the context signal, are already known to the network. During training, associations between the phonological items and the positional nodes are created. When the context signal is replayed during recall, these associations are used to retrieve the phonological codes (Burgess

& Hitch 1999, Hartley et al. 2016). With this approach, the primacy and recency effects also arise because of less competition between neighbours.

Furthermore, it is possible to simulate the grouping effects by expanding the context signal to a second dimension (Baddeley & Hitch 2019). Even though the use of positional codes has successfully been able to simulate many benchmark behaviours of serial recall, the models have several drawbacks.

Most importantly, they still lack the neuroscientific support that the chain

based models have (Botvinick & Plaut 2006). Moreover, a concern with

positional codes is that they lack generalization across memory system, being

relevant exclusively in sequential learning (Martinez et al. 2019). Thus, the

nature of sequential learning is still widely debated.

(25)

2.2 Bayesian Confidence Propagating Neural Networks

2.2.1 Hebbian Learning

The Hebbian learning rule is one of the earliest and simplest learning rules for neural networks. It has been especially valuable in the development of autoassociative neural networks. Proposed by Donald Hebb in 1949, Hebbian learning is based on the idea that "neurons that fire together, wire together". This is achieved by increasing the weights between neurons if they are activated simultaneously and decreasing the weights otherwise. Thus, for each time step t, the weight w ij between neurons x i and x j is updated as:

w _ij (t) = w _ij (t − 1) + x _i x _j

where the neurons take either binary or bipolar values. By strengthening the connections between neurons, the activation of one neuron leads to the activation of other neurons. This is used to create identity mappings such that a distorted input pattern can trigger the network to retrieve the originally stored pattern. A group of neurons with associative links is referred to as a cell assembly, and Hebb hypothesized that these assemblies are used in biological memory to represent concepts such as words or images. Since the proposition was made, substantial neuroscientific evidence for the theory has been produced and autoassociative neural networks are often used to describe biological memory (Huyck & Passmore 2013).

2.2.2 Derivation of BCPNN Learning Rule

The BCPNN employs a Bayesian view of the Hebbian learning rule, where

the weights between neurons represent the probability of the neurons firing

together. The input and output are regarded as the confidence in a detected

feature and the posterior probability in the retrieved pattern respectively. The

BCPNN learning rule is then derived from a naive Bayesian classifier, which

calculates the probability of event j occurring given events A. If events A are

assumed to be independent as well as conditionally independent given j, the

(26)

posterior probability for j is derived as

P (j|A) = P (j) Y

i∈A

P (i|j)

P (i) = P (j) Y

i∈A

P (i, j)

P (i)P (j) . (2.1) In logarithmic form, the above equation becomes

logP (j|A) = logP (j) + X

i∈A

log h P (i, j) P (i)P (j)

i

. (2.2)

By assuming that there are N possible events in total and using the indicator function

o _i =

( 1 i ∈ A

0 i / ∈ A, (2.3)

equation 2.2 can be expressed as

logP (j|A) = logP (j) +

N

X

i=1

o _i log h P (i, j) P (i)P (j)

i

. (2.4)

This can now be interpreted as a single layered neural network with the network weights w ij = log h

P (i,j)

P (i)P (j) i, input o i and bias β j = logP (j).

Moreover, it can also be used as an autoassociative network, where the simultaneously activated inputs {A, j} create a pattern which can be retrieved using the update rule

o j = Θ

β j +

N

X

i=1

w ij o i

Θ(x) = (exp(x) x < 0 1 x ≥ 0.

(2.5)

From a biological point of view, the bias term represents the intrinsic excita-

bility of the neurons and the weights represents the synaptic strength between

them.

(27)

Finally, in order to derive the network variables, the probabilities have to be estimated. Suppose we have a pattern sequence o ^k , k = 1, . . . , C, where o ^k i = 1 if pattern i is observed at time step k. Them the number of occurrences of events i, j and ij are

c _i =

C

X

k=1

o ^k _i

c ij =

C

X

k=1

o ^k _i o ^k _j

(2.6)

and the estimated probabilities become

p i = c i

C p _ij = c ij

C . (2.7)

2.2.3 Incremental Learning

In the above formulations of the original version of the BCPPN algorithm, the probabilities are estimated based on information of the complete training set. In a biological setting, where the network learns as the patterns are being presented, this is not possible. Instead, rates of events have to be estimated for each time step. The incremental version of the BCPNN learning rule estimates the rates p i using exponential moving averages defined as

p i (t) = α t o i (t) + (1 − α t )p i (t − 1), α t = 1 − e

^∆T^τ

. (2.8) where τ is a constant. With ∆T τ, the approximation α t ≈ ^∆T _τ can be made, which gives

p _i (t) = p _i (t − 1) + ∆T o i (t) + p i (t − 1)

τ . (2.9)

In a continuous setting where ∆T → 0, equation 2.9 can be expressed as the

(28)

differential equation

dp _i (t)

dt = o _i (t) − p _i (t − 1)

τ . (2.10)

Finally the weights and bias in the incremental learning algorithm become

w _ij = log h p _ij p _i p _j

i β _i = logp i .

(2.11)

In order to avoid divisions by zero as well logarithms of zero, a low background input rate λ 0 1 is introduced. This also means that in the absence of an external input, p i converges to λ 0 and p ij converges to λ ² 0 , giving w ij = 0 .

An exponential moving average gives the strongest memory representation to the patterns that have been presented more recently while the weights of older patterns decay. This is an important property that prevents catastrophic forgetting, where all stored memories are lost in an abrupt manner due to information overload. Catastrophic forgetting is a common problem in artificial autoassociative networks but it is not observed in biological networks.

In biological palimpsest memory, old information decays to give space to new information and thus, the incremental version of the algorithm is a crucial part in creating a biologically plausible memory model (Sandberg et al. 1999).

2.2.4 Sequential Learning Rules

Up to this point, I have shown that the BCPNN can be used as a biological

autoassociative network, storing inputs that are active simultaneously as a

memory pattern. However, in sequential learning, the network needs to

create connections between inputs that are not active simultaneously and

remember them in serial order. In order to achieve this result, exponential

moving averages of the inputs, called z-traces, are used in a similar manner to

equations 2.8 - 2.10. However, in this case each input creates two averages,

one presynaptic and one postsynaptic, which utilise different time constants.

(29)

For example, suppose input o i is presented followed by o j . The exponential moving average of these patterns give rise to the following four z-traces

dz _i

_pre

dt = o _i − z _i

_pre

τ _z

_pre

, dz _i

_post

dt = o _i − z _i

_post

τ _z

_post

dz j

pre

dt = o j − z j

pre

τ _z

_pre

, dz j

post

dt = o j − z j

post

τ _z

_post

.

(2.12)

If τ z

pre

is larger than τ z

post

, z i

pre

will decay slower than z j

post

increases, creating an overlap between the two z-traces where they are both active simultaneously.

This overlap is used to define the excitatory weight w ij . However, this will also imply that z i

post

decreases faster than z j

pre

, resulting in a much smaller overlap between these two z-traces and an inhibitory weight w ji , as illustrated in figure 2.4. With z-traces, the sequential BCPNN learning rule can be summarized as

dz _i

_pre

dt = o _i − z _i

_pre

τ _z

_pre

, dz _j

_post

dt = o _j − z _j

_post

τ _z

_post

dp _i

dt = z _i − p _i

τ , dp _j

dt = z _j − p _j

τ , dp _ij

dt = z _i z _j − p _ij τ w _ij = log h p _ij

p _i p _j i

, β _i = logp i

(2.13)

Figure 2.4: The weight w ij is proportional to the overlap between z i

pre

and z j

post

, thus creating excitatory connection when event j succeeds i but

inhibitory when j precedes i.

(30)

2.2.5 Stabilizing the network

In order to make the network more robust to noise, previous papers have proposed two modifications to the BCPNN, which are both inspired by neuroscientific findings. First of all, there is evidence suggesting that the cortex is organised into small sub-circuits with dense connections and similar encoding, referred to as minicolumns. These minicolumns are themselves organised into hypercolumns, where a hypercolumn can take on encodings from a complete receptive field. Inhibitory neurons within each hypercolumn exhibit a normalization effect which prevents multiple minicolumns from being active simultaneously (Buxhoeveden & Casanova 2002). Through inter- hypercolumn connections, the network can as a whole become more robust to noise. More specifically, if one hypercolumn activates the wrong minicolumn, inhibitory feedback from the other hypercolumns will help to suppress the incorrect minicolumn and activate the correct one.

In order to incorporate this structure in the BCPNN model, the input is presented as a vector where each element in the vector represents the input to a specific hypercolumn. For example, for the input sequence (P 1 , . . . , P _m ) , the first pattern corresponds to P 1 = (u ¹ ₁ , . . . , u ⁿ ₁ ) and the last pattern corresponds to P m = (u ¹ _m , . . . , u ⁿ _m ) , where the upper index marks the hypercolumn and the lower index marks the minicolumn within each hypercolumn. All neurons across the two dimensional sequence are connected as illustrated in figure 2.5 and a winner takes all (WTA) mechanism is applied to the hypercolumns in order to normalize the network. The WTA mechanisms is inspired by lateral inhibition in hypercolumns (Klampfl & Maass 2013). It should be mentioned that these dense long-range connections between hypercolumns are not quite biologically plausible; however, they are needed due to the small size of the network (Lansner et al. 2013). With the introduction of hypercolumns, the sequential learning algorithm is defined as

dz _ii

⁰_pre

dt = o _ii

⁰

− z _ii

⁰_pre

τ _z

_pre

, dz _jj

_post⁰

dt = o jj

⁰

− z _jj

_post⁰

τ _z

_post

dp _ii

⁰

dt = z _ii

⁰

− p _ii

⁰

τ , dp _jj

⁰

dt = z _jj

⁰

− p _jj

⁰

τ , dp _ii

⁰

_jj

⁰

dt = z _ii

⁰

z _jj

⁰

− p _ii

⁰

_jj

⁰

τ w _ii

⁰

_jj

⁰

= log h p _ii

⁰

_jj

⁰

p _i i ⁰ p _j j ⁰ i

, β _ii

⁰

= logp ii

⁰

.

(2.14)

(31)

where i represents a pattern and i ⁰ the hypercolumn.

The second modification is inspired by the fast AMPA channels and the slower NMDA channels observed in the central nervous system. Instead of using one pair of z-traces, a second pair with different time constants is introduced in order to represent these two types of channels. The slow z-trace is created using a large time constant τ z ^slow

pre

with the purpose of controlling the sequential memory. For the fast z-trace, the presynaptic constant in smaller but most importantly τ z ^{f ast}

pre

= τ _z ^{f ast}

_post

. Thus, the fast z-trace is not involved in the sequential learning. Instead it is used to maintain stability within the activated unit (Tully et al 2016). Practically, a second z-trace means that there will be two weight matrices in the network, one for each z-trace.

Figure 2.5: Network connectivity across hypercolumns. Excitatory connec- tions are created between units from the same patterns as well as units that follow closely in time. To other units, inhibitory connections are made.

H 1 , . . . , H n represent hypercolumns and u ¹ 1 , . . . , u ⁿ ₄ represents the units within the hypercolumns. For example u ² 1 is the first unit in the second hyper. The recall sequence are the patterns P 1 to P 4 . The grey circles in the hypercolumns are units which are not part of the recall sequence (Adapted from Martinez et al. 2019a).

2.2.6 Sequential Recall

The final part of the BCPNN concerns the recalling algorithm. The incoming

z-traces z i are multiplied with their respective weights and summed up together

with the bias β j to calculate the current s j to unit u j . Furthermore, once a unit

is activated, an intrinsic adaptation current a j deactivates the unit in order

for the next pattern in the sequence to activate. Due to the WTA effect, the

(32)

current is normalized in each hypercolumn such that only one unit remains active. With H hypercolumns and N patterns in the sequence, the recalling algorithm is defined as

τ s

ds _jj

⁰

dt =β jj

⁰

+ 1 H

H

X

i

⁰

N

X

i

η ^{f ast} w ^{f ast} _ii

0

jj

⁰

z _ii ^{f ast}

0

pre

+ η ^slow w ^slow _ii

⁰

_jj

⁰

z _ii ^slow

⁰

pre

−

− g a a jj

⁰

− s jj

⁰

+ σ + I jj

⁰

o _jj

⁰

=

( 1, s _jj

⁰

= max(s j

⁰

) 0, otherwise τ a

da _jj

⁰

dt =o jj

⁰

− a jj

⁰

.

(2.15)

σ represents a white noise which is used to induce errors during recall and I is an external current which is used to activate the first pattern in the sequence.

η ^{f ast} and η ^slow are the ratios for the fast and slow z-traces respectively. Finally, g _a is used to regulate the strength of the intrinsic adaptation.

To summarize, sequential learning algorithm derived from the BCPNN has now been introduced together with a recalling algorithm based on the current knowledge of biological learning and recall. Hence, the equations that will be used in the investigation are equations 2.14 and 2.15.

2.2.7 Summary of the sequential BCPNN

The presented BCPNN model is developed based on mesoscopic biological mechanisms, which are assumed to give rise to short-term sequential memory.

These mechanisms include:

• An increased synaptic strength between neurons that are activated close in time, modelled as a Bayesian probability estimate of a unit activating given that another unit has recently become activated.

• The decay of old information, which is modelled by weighing the probability estimates using an exponential moving such that new infor- mation is given the most attention. The rate is decay is controlled by the probability time constant τ p .

• Asymmetric synaptic strengths, which imply a stronger connection

(33)

between unit one and unit two if unit one is activated before unit two and a weaker connection if unit two is activated before unit one. This is modelled using z-traces, which are controlled by the z-trace time constants τ z

pre

and τ z

post

.

• Two different types of synaptic strengths inspired by the two types of receptors (NMDA and AMPA) involved in short-term memory. This is modelled using two different z-traces, z ^{f ast} and z ^slow .

• Hypercolumns which stabilize the network, making it less susceptible to noise.

• An intrinsic adaptation which deactivates neurons, enabling other neur- ons to activate. This is modelled using the parameter a, where the strength of a is controlled by g a .

• Lateral inhibition which makes sure that only one unit is activated at the

time in each hypercolumn. This is modelled using a WTA mechanism.

(34)

Chapter 3 Method

3.1 Model Specification

In this section, the model setup is further specified. The section begins with a detailed description of how the BCPNN architecture has been adapted to simulate the phonological loop. This is followed by a definition of the input sequence as well as short studies on initializations and model parameters.

3.1.1 Phonological loop algorithm

In order to model the phonological loop specifically, I have made several adaptations to the BCPNN. One of the most conspicuous properties of the phonological loop is the close relation between recall and learning. For example, recall is needed in order to prevent the memory traces from decaying and environmental factors such as noise affect learning in the same way as it affects later recalls (Gathercole 2007). Due to their close connection, I have merged the BCPNN learning and recalling algorithms (equation 2.15 and 2.14) into one, such that every time step of learning can only be accessed through the recalling algorithm. Essentially, learning is executed by passing an external current I through specific nodes in the network, one at the time.

This creates a current s in the network, which activates the nodes through the

WTA mechanism. Thereafter, o is used to update the weights in the learning

algorithm. In mathematical terms, equation 2.15 and 2.14 are merged together

(35)

in order to create a learning algorithm with the three following steps:

τ _s ds _jj

⁰

dt =η ^{f ast} β _jj ^{f ast}

0

+ η ^slow β _jj ^slow

0

+ 1 H

H

X

i

⁰

N

X

i

η ^{f ast} w ^{f ast} _ii

0

jj

⁰

z _ii ^{f ast}

0

pre

+

+ η ^slow w ^slow _ii

⁰

_jj

⁰

z _ii ^slow

⁰

pre

− g _a a _jj

⁰

− s _jj

⁰

+ σ + I _jj

⁰

(3.1)

o jj

⁰

=

( 1, s _jj

⁰

= max(s j

⁰

) 0, otherwise τ _a da _jj

⁰

dt =o _jj

⁰

− a _jj

⁰

(3.2)

dz _ii

⁰_pre

dt = o _ii

⁰

− z _ii

⁰_pre

τ _z

_pre

, dz _jj

⁰

post

dt = o _jj

⁰

− z _jj

⁰

post

τ _z

_post

dp _ii

⁰

dt = z _ii

⁰

− p _ii

⁰

τ , dp _jj

⁰

dt = z _jj

⁰

− p _jj

⁰

τ , dp _ii

⁰

_jj

⁰

dt = z _ii

⁰

z _jj

⁰

− p _ii

⁰

_jj

⁰

τ w _ii

⁰

_jj

⁰

= log h p _ii

⁰

_jj

⁰

p _i i ⁰ p _j j ⁰ i

, β _ii

⁰

= logp ii

⁰

(3.3)

The recalling algorithm also corresponds to the above equations; however, in

this case only the first pattern is activated using the external current. The

pattern is activated for 50 ms and thereafter, the path of the current is decided

by the weights. In previous versions of the incremental BCPNN, learning

has been carried out directly through the learning algorithm by controlling

the inputs o and recalling is executed without updating the weights. By

using this algorithm, the aim is to create a more realistic representation of

the phonological loop where the intrinsic adaptation a and the current s are

involved in learning and where the plasticity in the weights remain during

recall. To summarize, pseudocode for the model is provided below. The

differential equations are solved numerically using the Euler method and the

(36)

time step is set to 1 ms.

Algorithm 1: The phonological loop algorithm for i in input sequence do

o = network.recall(W, β, z, p, I i ) W, β, z, p = network.learn(o) end

while t < t _max do

o = network.recall(W, β, z, p, I t ) W, β, z, p = network.learn(o) end

3.1.2 Input sequence

The input sequence is a list of phonological patterns described in section 2.2.5, which is presented to the model through an external current I. Each pattern is presented during a time pulse of T p milliseconds followed by an interpulse interval of IP I milliseconds (see figure 3.1). After the last pattern has been presented, a longer interpulse interval, IP I last , is added to allow the z-traces of the last pattern to return to lower values before recall starts. In order to present a realistic phonetic input T p is set to 200 ms, which is in the right order of magnitude for a phoneme in spoken language. IP I and IP I last are set to 10 ms and 500 ms respectively. Furthermore, the magnitude of each input in the sequence is set to 5.

Figure 3.1: Input sequence with three patterns.

(37)

3.1.3 Initialization

The probabilities p ii

⁰

and p ii

⁰

jj

⁰

should be initialized in a way such that the patterns can be stored after one epoch of training, and the closer the network is to its steady state values, the more patterns can be stored. Therefore, a good initialization should bring the network as close as possible to its steady state after the first epoch. Two possible initialization have been suggested in previous BCPNN memory models (Sandberg 2003). One approach is to set p ii

⁰

= _N ¹

i0

and p ii

⁰

jj

⁰

= _N ¹

i0

N

_j0

, where N i

⁰

is the number of units in hypercolumn i ⁰ . This corresponds to the assumption that the units in a hypercolumn are independent and identically distributed. Another approach is to look at the steady state values of the network in the absence of a pattern, in which case p ii

⁰

= λ 0 and p ii

⁰

jj

⁰

= λ ² ₀ . In both cases, the network will converge to the steady state values. However, as illustrated in figure 3.2, the second approach converges significantly faster and I have therefore chosen initialization approach two.

Figure 3.2: The norm of the weight matrix divided by the number of elements in the matrix as a function of epochs.

3.1.4 Model Parameters

The network includes many parameters, of which most have biological inter-

pretations. This means that they are constrained by neuroscientific data and

(38)

cannot take on any value. In order to simplify the parameter investigation, some values were adapted from Martinez et al. (2019b). This is justified by the fact that Martinez et al. used a BCPNN network to study sequential memory on a similar timescale and hence, the parameter values should be similar due to the fact that they represent the same biological mechanisms. However, a few parameters are further investigated, namely the z-trace time constants, the probability trace time constant and the adaptation gain. These parameters are connected to several key characteristics of the network, and therefore, this section is devoted to investigating their influence on the network behaviour in more detail.

The z-traces are used to create the weight matrix which stores the patterns in the network. The shape of the traces are governed by the z-trace time constants, which therefore become highly relevant for the formation of the weight matrix and the behaviour of the network. From a biological perspective, the weight matrix represents the synaptic strength between neurons. In order to create sequential memory patterns, w i,i+1 must be larger than w i,i−1 , meaning that τ _z

_pre

> τ _z

_post

. Figure 3.3A, illustrates the storage codes created by three different constellations of z-traces time constants. The larger τ z

pre

is in comparison to τ z

post

, the larger are the connections going forward, increasing the network’s ability to handle repeated patterns. However, the forward weights also become more similar in size making the network more prone to skipping or recalling patterns in the wrong order. In this model, there are two different weight matrices. W ^{f ast} is inspired by the faster AMPA receptors where both τ _z ^{f ast}

_pre

and τ _z ^{f ast}

_post

are set to 5 ms, in accordance to the work of Martinez et al. Hence, W ^{f ast} is symmetric and not involved in creating the sequential storage codes. Instead, it is used to stabilize the network as demonstrated by Martinez et al. The time constants of the slow z-traces, which create long-range and asymmetric connections in the network, are on the other hand an essential part of sequential learning. Biologically, they correspond to the slower NMDA receptors. The values of τ z ^slow

pre

and τ z ^slow

post

were chosen such that the forward connections are large enough to give the network sufficient information about previous states in order to handle repeated patterns. In the same time, the forward connections were not allowed to become too similar in size as that would increase the frequency of errors.

The probability trace constant τ p regulates the plasticity of the network, where

a low τ p results in a high plasticity and a low storage capacity. This is shown

in figure 3.3B, where an increase of τ p from 9 ms to 10 ms enables the

network to increase its storage capacity from five to ten patterns. For the

(39)

simulations, τ p was set to 1 s. This meant that the network could remember an approximately two second long sequence under moderate background noise in accordance to observations of immediate serial recall in neuropsychological experiments.

Finally, the adaptation gain g a regulates the size of the intrinsic adaptation a . Since the intrinsic adaptation is responsible for deactivating patterns and enabling the next pattern in the sequence to activate, g a offers a simple way of controlling the persistence time T per . T per

i

is defined as the period of time that pattern i is active during recall, which Martinez et al. (2019a) derived as

T _per

_i

= τ _a log 1 1 − B _i

+ τ _a log 1 1 − τ _s /τ _a

B _i = w _i,i − w _i,i+1 + β _i − β _i+1 g a

.

(3.4)

As seen in the equation, all the above mentioned parameters affect T per . However, since g a does not have a large influence on any other characteristics, it provides an effective way of controlling T per . It should also be noted that equation 3.4 implies that 0 < B < 1 which means that g a has a lower boundary. That is, the intrinsic adaptation has to be large enough to compensate for the difference in size between w i,i and w i,i+1 . Further- more, when g a goes towards infinity, T per converges to its minimum value τ _a log

1 1−τ

s

/τ

a

.

A summary of the network parameters and their values is shown in table 3.1.

Here τ s , τ a , τ _z ^{f ast}

_pre

and τ _z ^{f ast}

_post

have the same values as specified by Martinez

at al. (2019b). g a , τ p , τ z ^slow

pre

and τ z ^slow

post

have been changed within the limits

of what would be biologically plausible in order to exhibit the properties

of the phonological loop in terms of persistence time, storage capacity and

ability to handle repeated patterns. Furthermore, the parameters T p , IP I,

IP I _last and N were given values which resemble the typical inputs during

neuropsychological studies on immediate serial recall. Finally, the remaining

parameters in table 3.1 were not specified by Martinez el al. (2019b) and

they were simply set to values for which the network could perform the recall

process.

(40)

Figure 3.3: (A) The weight matrix after one epoch of training using only the learning algorithm (equation 2.14). To the left, τ z

pre

= τ _z

_post

= 5 ms, which is equivalent to the constants of the fast z-trace. In the middle, τ z

pre

= 10 ms and τ _z

_post

= 5 ms, and to the right, τ z

pre

= 50 ms and τ z

post

= 5 ms. (B) The right graph shows the current s during the correct recall of 10 patterns with τ p = 10 . To the left, τ p = 9 resulting in the probabilities and weights decaying faster and the size of w i,i+1 eventually growing too small from s i+1 to outperform s i .

3.2 Evaluation of Experimental Simulations

In this section, the methods used for evaluating the simulations are defined.

As noted earlier, the purpose of the study is to investigate if the characteristic

error pattern trends highlighted in psychological literature can be qualitatively

reproduced by the model. To perform such evaluations, this section first of

all provides a description of the marking system used to classify recalls as

correct or incorrect. Thereafter, the methods used for visualizing the results

are accounted for. Finally, a test for measuring if there is a statistical difference

between two recall distributions is defined.

(41)

Parameter Description Value

τ _s Synaptic time constant 10 ms

τ a Adaptation time constant 250 ms

g _a Adaptation gain 10

τ _p Probability trace constant 1000 ms

τ _z ^{f ast}

_pre

Fast presynaptic z-trace constant 5 ms

τ _z ^{f ast}

post

Fast postsynaptic z-trace constant 5 ms τ _z ^slow

_pre

Slow presynaptic z-trace constant 200 ms τ _z ^slow

_post

Slow postsynaptic z-trace constant 10 ms

η ^{f ast} Portion of fast z-traces 0.4

η ^slow Portion of slow z-traces 0.6

T _p Pulse time 200 ms

IP I Inter pulse interval 10 ms

IP I last Last inter pulse interval 500 ms

∆T Time step 1 ms

N Number of patterns 6 - 15

H Number of hypercolumns 6

λ ₀ Background current 0.0001

σ Standard deviation of noise during recall 3.5 σ 0 Standard deviation of noise during learning 1

I ₀ Magnitude of external current 5

Table 3.1: Relevant parameters in the network.

3.2.1 Defining Accuracy of Recall

In psychological experiments on immediate serial recall, an item generally have to be recalled at the right position in the sequence in order for it to be marked as correct. Therefore, the follow two types or error can occur - omissions , when an item is forgotten completely or transpositions, when the item is recalled at the wrong position. Furthermore, repetitions can also occur if the item is recalled more times than it should.

Modelling Immediate Serial Recall using a Bayesian Attractor Neural Network

Attractor Neural Network

JULIA ERICSON

Master’s Programme, Machine Learning, 120 credits Date: February 28, 2021

Supervisor: Pawel Herman Examiner: Erik Fransén

School of Electrical Engineering and Computer Science

Swedish title: Modellering av sekventiellt korttidsminne med hjälp

av ett autoassociativt Bayesianskt neuronnätverk

© 2021 Julia Ericson

Abstract

Keywords

Bayesian Confidence Propagating Neural Network, Phonological Loop,

Computational model, Immediate serial recall

Sammanfattning

Nyckelord

Bayesian Confidence Propagating Neural Network, Fonologiska loopen,

Datorsimulation, Sekventiellt korttidsminne

Acknowledgments

Stockholm, February 2021

Julia Ericson

Contents

1 Introduction 1

1.1 Problem Statement . . . . 3

1.2 Scope and Delimitations . . . . 3

1.3 Thesis Outline . . . . 4

2 Background 5 2.1 Phonological loop . . . . 5

2.1.1 Baddeley’s Model of Working Memory . . . . 5

2.1.2 Evidence and Function . . . . 6

2.1.3 Serial Order Effects in Recall . . . . 8

2.1.4 Backward Recall . . . 10

2.1.5 Computational Models of the Phonological Loop . . . 12

2.2 Bayesian Confidence Propagating Neural Networks . . . 14

2.2.1 Hebbian Learning . . . 14

2.2.2 Derivation of BCPNN Learning Rule . . . 14

2.2.3 Incremental Learning . . . 16

2.2.4 Sequential Learning Rules . . . 17

2.2.5 Stabilizing the network . . . 19

2.2.6 Sequential Recall . . . 20

2.2.7 Summary of the sequential BCPNN . . . 21

3 Method 23 3.1 Model Specification . . . 23

3.1.1 Phonological loop algorithm . . . 23

3.1.2 Input sequence . . . 25

3.1.3 Initialization . . . 26

3.1.4 Model Parameters . . . 26

3.2 Evaluation of Experimental Simulations . . . 29

3.2.1 Defining Accuracy of Recall . . . 30

3.2.2 Visualizing Results . . . 31

3.2.3 Statistical Test . . . 31

4 Experimental Results 33 4.1 Sequence Length . . . 33

4.2 Noise . . . 36

4.3 Repeated Patterns . . . 39

4.4 Rhythm . . . 43

4.5 Backward Recall . . . 47

5 Discussion 49 5.1 Evaluation of Simulations . . . 49

5.2 Evaluation of Learning and Recalling Mechanisms . . . 51

5.3 Limitations . . . 52

5.4 The BCPNN in Relation to other Models . . . 53

5.5 The BCPNN as a Model for Sequential Memory . . . 55

5.6 Sustainability, Ethics and Social Impact . . . 57

6 Conclusion 59 6.1 Open Questions and Future Work . . . 59

References 61

Chapter 1 Introduction

A group of memory models which have considerable support from biological data are associative attractor memory models with Hebbian-type learning rules, which accounts for the mechanisms of biological synaptic plasticity.

One such model is the Bayesian Confidence Propagating Neural Network

(BCPNN), where the Hebbian-based learning algorithm uses Bayesian infer-

ence to store information as estimated posterior beliefs (Martinez 2019a). In

its incremental version (Sandberg et al. 1999), new information can over-

write the currently stored memories, meaning that the network is able to

adapt to a changing environment without exceeding its storage capacity. This makes it a suitable model for palimpsest memory, and the BCPNN has shown promising results when modelling both short-term and long-term memory.

& Hitch 1999, Hartley et al. 2016, Page & Norris 1998) but to my knowledge,

none of the more recognized models use associative attractor networks. In

fact, associative attractor networks such as the BCPNN, which link together

neighbouring items in the sequence, have generally been rejected as models for

the phonological loop due to difficulties repeating behavioural observations

(Baddeley & Hitch 2019). Thus, this thesis serves two purposes. First

of all, it investigates if the mechanisms of immediate serial recall could in

fact be explained using an associative attractor neural network. Secondly,

it contributes to a better understanding of how well the BCPNN can model

sequential memory in general. Essentially, if the network can simulate the

phonological loop, it is likely that it will be able to simulate other types of