An Experiment in Modelling Learning in Autism Using Self-Organizing Artificial Neural Networks

(1)

An Experiment in Modelling Learning in Autism Using Self-Organizing Artificial Neural Networks.

Lennart Gustafsson,

Lule˚a University of Technology, Sweden Andrew P. Papli´nski

Computer Science and Software Engineering Monash University, Australia

April 8, 2002

(2)

Cortical feature maps make it possible to classify stimuli, such as phonemes of speech, dis- regarding incidental detail. Hierarchies of such maps are instrumental in creating abstract codes and representations of objects and events. It has been theorized that cortical feature maps in individuals with autism are inadequate for forming abstract codes and representations, which would explain the importance paid by autistic individuals to detail rather than salient features.

Self-Organizing Maps (SOMs) are artificial neural networks (ANNs) which offer insights into the development of cortical feature maps.

We model attention shift by presenting a SOM with stimuli from two sources in four different modes, namely: 1) novelty seeking (regarded as normal learning), 2) attention shift impairment (shifts are made with a low probability), 3) familiarity preference (shifts made with a lower probability to that source which is the less familiar to the SOM of the two sources) and 4) familiarity preference in conjunction with attention shift impairment.

The resulting feature maps from learning with novelty seeking and with attention shift impairment are, perhaps surprisingly, much the same. In contrast, the resulting maps from learning with strong familiarity preference are adapted to one of the sources at the expense of the other and if one of the sources has a reduced set of stimuli the resulting maps are adapted to stimuli from that reduced source. When familiarity preference is less pronounced the resulting maps show great variation, from normal to fully restricted to one of the sources and always the reduced source if such a source is present. Such learning, in many different maps, would result in very uneven capacities, which is common in individuals with autism.

Learning with attention shift impairment in conjunction with familiarity preference further reduces the probabilities for normal maps.

Early intervention in the learning process, based on the observation of attention shift misses and implemented as an increased number of stimuli being presented to the SOM from the less familiar source, results in a significantly higher probability for the development of a normal map.

(4)

1 Introduction — autism, attentional impairment and familiar- ity preference

Autism is a developmental disorder first described by Kanner [1], and Asperger [2]. Presently diagnostic criteria according to DSM-IV [3] are grouped into three main categories:

• impairments in social interaction,

• impairments in verbal and nonverbal communication, and

• restricted repertoire of activities and interests.

The diagnostic criteria are behavior-based, but a number of biological abnormalities have been connected with autism. For an introduction, see Gillberg and Coleman [4].

There is a general agreement that attentional impairment is commonly seen in autism. This impairment includes joint attention and attention shifts. However there are different opinions whether attentional impairment is a primary cause for other autistic characteristics or is itself secondary to some other autistic characteristic.

A large body of research on attentional impairment has been presented by Courchesne and co- workers, see [5, 6, 7] and Townsend et al. [8, 9].

The importance of cerebellar damage (loss of Purkinje neurons, documented in many cases of autism) in causing impairment in shifting attention is stressed in this research. Courchesne et al. argue that impairment in shifting attention will cause autism since memories of events will be incomplete and fragmented if attention shift impairments preclude a perception of some aspects of the events.

They also hypothesize that this will lead to the development of other autistic characteristics, among them obsessive insistence on sameness, therefore it is appropriate to use the term “primary attention shift impairment” to describe a causative relation of autistic characteristics in autism.

Other researchers stress findings that attention shift impairment in children with autism is mani- fested particularly when social stimuli are present. Dawson et al. [10] find that

“Children with autism showed only a slightly greater number of orienting errors, compared to children with Down’s syndrome and typical development, when presented with the sound of a rattle or a musical toy. When they heard their names called or the sounds of hand clapping, however, children with autism often failed to orient to these stimuli.”

Pascualvaca et al. [11] find that

“Children in the autism group were able to make comparisons and shift their attention continuously on the Same-Different Task. They did have difficulty in the WCST, a task that assesses several cognitive processes including problem-solving skills and the ability to shift set. On this task, they clearly understood the instructions and, at least initially, performed like controls. Nevertheless, once they started to use a particular strategy, they could not change it, and failed to benefit from the examiner’s feedback.”

Pascualvaca et al. obtained results “suggesting that they (children with autism) do not have a general deficit in shifting attention.” Minshew et al. [12] have studied saccadic eye movements in

(5)

individuals with autism and found that such movements, when under cerebellar control, are normal, but, under neocortical influence, are not. Minshew et al. state that deficits in “elementary attentional and sensorimotor systems” were not demonstrated.

Dawson, Pascualvaca and other researchers hypothesize that novelty itself is disagreeable to children with autism (personal stimuli are assumed to be connected with novelty more often than imper- sonal stimuli) and that novelty avoidance will cause attention impairments, and, as a consequence, other autistic characteristics. Kootz et al. [13] note that:

“Rather than responding to novelty with orientation, observation, and exploration, the autistic child often responds with avoidance, thus preventing the development of new schemas and subsequent familiarization.”

Dawson and Lewy [14] find that

“The evidence thus far suggests that novel stimuli may elicit an aversive response (char- acterized by slow or absent habituation), which is associated with sensory rejection.”

Kanner [1] originally described the cases he studied as possessing two cardinal features, one of them being obsessive insistence on sameness. It is this feature which here is hypothesized to be of primary causative importance, here called ”primary familiarity preference.”

2 Introduction — artificial neural networks

Artificial neural networks (ANNs) consist of synapses with adjustable connection strength parameters (weights) and signal aggregating nodes representing neurons. During learning process weight vectors are modified according to specific features of the pre-synaptic signals. ANNs are used for information processing after periods of learning from typical examples and, if they have been modelled properly, to simulate biological neural networks. Of particular importance are the self-organizing maps (SOMs) developed by Kohonen [15, 16] which produce topographical feature maps which may represent feature maps in sensory cortices. There are very convincing demonstrations of the correspondence between self-organizing maps and measured cortical feature maps of animals, see Ritter et al. [17]. Further arguments for such correspondence have been presented by Spitzer [18].

For a general introduction to artificial neural networks, see e.g. Haykin [19] and for a presentation of SOMs, see Kohonen [15, 16]. For an introduction to cortical maps, see e.g. Kandel et al. [20].

Details of the SOM neural networks relevant to this work are presented in Appendix A.

Theories on causes of autism, based on properties of artificial neural networks, have been presented by Cohen [21] and Gustafsson [22]. An artificial neural network is subjected to a learning process in order to enable it to detect and categorize stimuli presented during the learning process.

The learning process of a SOM simulates the development and fine-tuning of a cortical feature map.

Through hierarchies of such cortical feature maps ever more abstract representations of objects and events enable the coding of experiences without involving masses of details. It was found by Her- melin [23] that autistic children are impaired in their capacity to recode information from sensory to abstract codes, making it difficult for them to see what normal individuals regard as salient features of a situation, see e.g. Happ´e [24]. A learning process for an artificial neural network may result in an inadequate feature map such that correct classification of stimuli, ignoring incident detail, is not

(6)

well accomplished. In the theories presented by Cohen and Gustafsson this is seen as modelling the impairments in forming more abstract codes and representations, evident in individuals with autism.

The purpose of this paper is to examine how the attention shift impairment and familiarity preference influence the self-organization of an artificial neural network and to discuss the characteristics of the resulting maps. It will be shown that some, but not all, of these maps exhibit characteristics which may be argued to be autistic in kind. A comparison will be made with maps organized when novelty seeking is present. Finally, it will be shown that early intervention in the learning process, based on the observation of attention shift misses and implemented as an increased number of stimuli being presented to the SOM from the less familiar source, results in a significantly larger probability for the development of a normal map.

3 Simulations — methods

In the simulations two sources of stimuli are employed. New stimuli are made available by the sources alternating in presenting stimuli in packets of two consecutive stimuli. (The length of the packets is not at all critical.) The artificial neural network attends to a particular source when the output from that source forms the input to the artificial neural network. Shift of attention to the alternate source is made in four different modes in the self-organization of the artificial neural network.

Mode 1, novelty seeking: attention is shifted to the alternate source if that source presents the next new stimulus. This is regarded as the “normal” mode of learning.

Mode 2, attention shift impairment: attention is shifted to the alternate source with a low probability, in the simulations chosen as 0.01, if that source presents the next new stimulus.

Mode 3, familiarity preference: attention is shifted to the alternate source if that source presents the next new stimulus when both sources are unfamiliar to the map, i.e. in the first phase of the self-organization, and then after familiarity to at least one of the sources has been reached with lower probability to the source which is the least familiar to the map. Familiarity of a source to the map is a weighted average of the distances between the weight vectors of the nodes most resembling the the stimuli from that source in the past. This mode is intended to model self-organization whose attention shifting characteristics are the same as those reported by Pascualvaca et al. [11] (see quote above).

Mode 4, familiarity preference in conjunction with attention shift impairment: attention is shifted to the alternate source with a probability of 0.01 if that source presents the next new stimulus, multiplied by the probability calculated as in mode 3.

In the simulations presented below the sources provide two-dimensional stimuli. The node (i.e.

artificial neuron) weight vectors are thus also two-dimensional and the learning results can easily be visualized in two-dimensional space. Each source in the two first sets of simulations provides three classes of stimuli. There are ten exemplars in each class for each source. In the last two sets of simulations one source is reduced to provide only two classes of stimuli, one of them with twenty exemplars. The stimuli provided by the sources are represented in the figures by ‘o’ and ‘+’

respectively. The node weight vectors are represented by ‘*’.

(7)

The sources can be thought of as producing two dialects of a very limited protolanguage, each with three or two protophonemes. Real sensory stimuli, like phonemes of speech, are of course larger in number and dimension. However, the explanations given for the results obtained do not rely on the dimensionality or the number of classes of the sources. Simulations with sources that provide few classes of low dimensionality have been presented because a complete visualization of the results can easily be achieved.

4 Simulations — results

In the first set of simulations a canonical SOM (the smallest possible ANN which can successfully learn to classify stimuli) with nodes organized in a 2×2 mesh was used. The resulting maps from learning in modes 1, 2 and 3 are shown in Figure 1.

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 1

#attention shifts =245759

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 2

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 3

Figure 1: Learning in a canonical SOM with nodes in a 2×2 mesh. The resulting maps are from self-organization in mode 1, mode 2, mode 3, respectively. The number of attention shifts during learning is shown below each map.

As can be seen normal learning with novelty seeking and learning with attention shift impairment results in the same maps — the node weights assume values which are the mean values of each class.

Learning with familiarity preference, however, results in a map where the nodes have adapted with a preference for the stimuli from one source. This result has been repeated with the same result, although the preferred source has altered, in many hundred simulations.

It might at first seem surprising that the resulting maps from learning with attention shift impairment are the same as those resulting from learning with novelty seeking but the explanation is straightforward — if the nodes were more adapted to the mean values of the subclasses of one of the sources then learning from exemplars of the other source would result in greater node weight adjust- ments, pulling the node towards the mean value of the subclasses of both sources. In learning with familiarity preference this mechanism is not present — when the nodes have adjusted more to the

(8)

subclasses of one of the sources, exemplars from the other source will not cause learning but will be ignored.

It should be noticed that the number of attention shifts in learning with attention shift impairment is very low in spite of the successful self-organization. This is because there is no bias in favor of one of the sources in this mode of learning.

In the second set of simulations a SOM with an excess of nodes, in a 3×3 mesh, was used. The resulting maps are shown in Figure 2.

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 1

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 2

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 3

Figure 2: Learning with an excess of nodes in a 3×3 mesh. The resulting maps are from self- organization in mode 1, mode 2, mode 3, respectively.

Learning in mode 1 resulted in maps that in some cases have nodes adapted to the means of the subclasses from both sources combined and in other cases adapted to the means of the subclasses from each source. Learning in mode 2 tended to result (as illustrated) in nodes becoming adapted to the means of the subclasses from each source. Learning in mode 3 resulted in maps which give preference to one source, and most nodes will adapt to subclasses from that source, but nodes may also adapt to one or more subclasses of the other source.

In the third and fourth sets of simulations one source is reduced to provide only two classes of stimuli, one of the classes having twenty exemplars. In the third set a canonical 2×2 SOM with the same degree of familiarity preference as in the first and second sets of simulations is used, in the fourth set a canonical 2×2 SOM with weaker familiarity preference is used.

The results of the third set of simulations were, with no appreciable variation those, shown in Figure 3.

As before, the resulting maps from learning in mode 1 and 2 are somewhat similar and adequately cover all classes of the stimuli. The resulting map from learning in mode 3 shows that the source with a reduced set of stimuli dominates the development of learning, leaving one subclass of stimuli from the full source without any detector node.

Again there is a straightforward explanation for this result — the exemplars from the reduced set source show less diversity than the exemplars from the full set source and learning from the

(9)

0 0.5 1 1.5 2 2.5 0.5

1 1.5 2 2.5

Mode 1

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 2

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Mode 3

Figure 3: Learning in a canonical SOM with nodes in a 2×2 mesh. The resulting maps are from self-organization in mode 1, mode 2, mode 3, respectively.

exemplars of the reduced set source will therefore cause a faster adaptation of the node weights and thus exemplars from the full set source will subsequently be ignored.

In the fourth set of simulations learning in mode 3 only is investigated. In this case a weaker familiarity preference than in the third set of simulations has been applied. The resulting maps are shown in Figure 4.

Learning in mode 3 here produces very different results in different simulations, even with very small differences in stimuli and initial weights. The stimuli are presented in a random order and is therefore also different in different simulations. The small simulation differences yield maps of such extremes as shown in Figure 4 and in the third column of Figure 3.

A measure of the “goodness” of a map can be calculated using the sum of the distances between the centres of each subclass and the weight vector of the node which most resembles it. The “goodness index” has been normalized so that a value of 1 represents a map with perfect representation of the stimuli. There is a correlation between the number of attention shifts during self-organization and the goodness of the resulting map as shown in Figure 5. In Figure 4 it would seem that there are some

”late developers” among the maps among the maps but these have not, as can be seen in Figure 5 obtained a better goodness index. We have not observed any 2× 2 map which has succeeded in late development, but in maps with an excess of nodes we have noticed such late development.

Learning in mode 4, i.e. with attention shift impairment in conjunction with familiarity preference, yields the same kind of results as learning in mode 3, but the probability for a normal map is much smaller. This is the only, but important, way in which attention shift impairment has significantly influenced the result of the learning process in our simulations.

(10)

0 0.5 1 1.5 2 2.5 0.5

1 1.5 2 2.5

mode 3

#attention shifts =122868 10⁰ 10² 10⁴

10¹ 10² 10³ 10⁴ 10⁵ 10⁶

mode 3

#attention shifts

epochs

66 out of 100

Figure 4: Learning in a SOM with 2× 2 nodes. The first column illustrates a successfully self- organized map from learning in mode 3. Notice the number of attention shifts during learning shown below the map. The second column shows the number of attention shifts as self-organization pro- ceeds. A total of 100 simulations are represented in this diagram.

10³ 10⁴ 10⁵ 10⁶

0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Map goodness index

#attention shifts

Map index

Figure 5: Goodness of the resulting maps and the corresponding number of attention shifts during self-organization. The data is from the simulations presented in Figure 4.

5 Early intervention

As shown in Figure 4 the attention shifts during self-organization in learning with familiarity preference reach a point where they continue to occur in some simulations but fairly rapidly cease in others and then as a consequence those maps obtain a low goodness score. This point is reached when fa-

(11)

miliarity with at least one source has been reached and can of course be immediately observed as attention shifts to one source — the least familiar — being omitted in some of the simulations. A simple early intervention strategy is then to make the packet of stimuli from the least familiar source longer and therefore when this source is next attended to it will have a longer, compensatory, effect in self-organization. The efficacy of this scheme is obvious in Figure 6. This is the same set of simulations as that shown in Figure 4 with the exception that the packet length of the source which has not caused attention shift is doubled when attended to next time. It is obvious that this strategy is successful in these simulations.

0 0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

mode 3

#attention shifts =21470 10⁰ 10² 10⁴

10¹ 10² 10³ 10⁴ 10⁵

Early Intervention

#attention shifts

epochs

Figure 6: Learning in mode 3 with early intervention on a 2×2 mesh

6 Discussion

Our results show that familiarity preference results in inadequate maps with characteristic deficits such that they lend support to the theory that familiarity preference or novelty avoidance may be primary in causing other autistic characteristics. The stimuli of one source will be learned precisely, at the expense of the other. If one source has a reduced set of stimuli it will dominate the resulting map.

If the SOM is canonical this domination will preclude the learning of the source with a full set of stimuli. It may be argued that development of cortical maps of this kind in a child will be inducive of the development of narrow interests, commonly present in autism.

If the SOM has an excess of nodes the result will vary greatly between simulations, even though all initial values are the same for all simulations — in some cases only stimuli from the reduced set source are learned and in others the stimuli from the full set source will also be learned well. There are many cortical maps and if some of them develop to respond only to a reduced set source and others achieve a normal development this offers a reasonable explanation to the uneven capacities

(12)

often found in individuals with autism (Kanner’s “islets of ability”; for a discussion, see e.g. Frith [25]).

Our results do not lend support to the hypothesis that attention shift impairments by themselves may be primary in causing other autistic characteristics since self-organization with attention shift impairments have, in our simulations, always resulted in normal maps. However, we have not presented results that refute the hypothesis that attention shift impairments by itself may be primary in causing other autistic characteristics. The argument of Courchesne et al. [5, 6, 7] is that sequences of events cannot be learned if attention cannot properly be shifted between modalities to take in the sequential stimuli and that this will cause autistic characteristics. The simulations presented all deal with learning unimodal stimuli without significance of sequence.

Learning with attention shift impairments in conjunction with familiarity preference, however, will very much reduce the probability for a normal map resulting from self-organization. Thus attention shift impairment may be important, albeit not independently so, in causing autism.

An early intervention strategy presents itself immediately from the simulations. By giving addi- tional exposure to the least familiar source the difference in familiarity of the SOM to the two sources can be continuously counteracted and normal maps will develop.

Acknowledgements

This work is part of a cooperation between Lule˚a University of Technology, Lule˚a, Sweden and Monash University at Clayton, Victoria, Australia. We wish to express our appreciation to the Lule˚a STINT grant scheme and the Monash SMURF-2 grant scheme for supporting this cooperation.

(13)

7 References

[1] L. Kanner. Autistic disturbances of affective contact. Nervous Child, (2):217–250, 1943.

[2] H. Asperger. Die ’autistischen Psychopathen’ im Kindesalter. Arch. Psychiatrie Ner- venkrankheiten, (117):76–136, 1944. Translated in Frith U. (ed) (1991): Autism and Asperger Syndrome. Cambridge University Press.

[3] Diagnostic and statistical manual of mental disorders. fourth edition. American Psychiatric Association, 1994. Available from: http://www.psychologynet.org/dsm.html.

[4] C. Gillberg and M. Coleman. The Biology of the Autistic Syndromes. Cambridge University Press, 3rd edition, 2000.

[5] E. Courchesne, J.P. Townsend, N.A. Akshoomoff, R. Yeung-Courchesne, G.A. Press, J.W. Mu- rakami, A.J. Lincoln, H.E. James, O. Saitoh, B. Egaas, R.H. Haas, and L. Schreibman. A new finding: Impairment in shifting attention in autistic and cerebellar patients. In S.H. Broman and J. Grafinan, editors, Atypical cognitive deficits in developmental disorders: Implications for brain function., pages 101–137. Erlbaum, Hillsdale, N.J., 1994.

[6] E. Courchesne, J.P. Townsend, N.A. Akshoomoff, O. Saitoh, R. Yeung-Courchesne, A.J. Lin- coln, H.E. James, R.H. Haas, L. Schreibman, and L. Lau. Impairment in shifting attention in autistic and cerebellar patients. Behavioral Neuroscience, 108(5):848–865, 1994.

[7] E. Courchesne, N.A. Akshoomoff, J.P. Townsend, and O. Saitoh. A model system for the study of attention and the cerebellum: infantile autism. In G. Karmos, M. Moln´ar, I. Cs´epe, and J.E. Desmedt, editors, Perspectives of Event-Related Potentials Research, pages 315–325. 1995.

(EEG Suppl.44).

[8] J.P. Townsend, N.S. Harris, and E. Courchesne. Visual attention abnormalities in autism: De- layed orienting to location. Journal of the International Neuropsychological Society, (2):541–

550, 1996.

[9] J.P. Townsend, E. Courchesne, J. Covington, M. Westerfield, N.S. Harris, P. Lyden, P. Lowry, and G.A. Press. Spatial attention deficits in patients with acquired or developmental cerebellar abnormality. Journal of Neuroscience, 19(13):5632–5643, July 1999.

[10] G. Dawson, A.N. Meltzoff, J. Osterling, J. Rinaldi, and E. Brown. Children with autism fail to orient to naturally occurring social stimuli. Journal of Autism and Developmental Disorders, 28(6):479–485, 1998.

[11] D.M. Pascualvaca, B.D. Fantie, M. Papageorgiou, and A.F. Mirsky. Attentional capacities in children with autism: Is there a general deficit in shifting focus? Journal of Autism and Devel- opmental Disorders, 28(6):467–478, 1998.

[12] N.J. Minshew, B. Luna, and J.A. Sweeney. Attentional capacities in children with autism: Is there a general deficit in shifting focus? Neurology, (52):917–922, 1999.

(14)

[13] J.P. Kootz, B. Marinelli, and D.J. Cohen. Modulation of response to environmental stimulation in autistic children. Journal of Autism and Developmental Disorders, 12(2):185–193, 1982.

[14] G. Dawson and A. Lewy. Arousal, attention and the socioemotional impairments of individuals with autism. In G. Dawson, editor, Autism: Nature, diagnosis and treatment, pages 49–74.

Guilford, New York, 1989.

[15] T. Kohonen. Self-Organisation and Associative Memory. Springer-Verlag, Berlin, 3rd edition, 1988.

[16] T. Kohonen. Self-Organising Maps. Springer-Verlag, Berlin, 2nd edition, 1997.

[17] H. Ritter, T. Martinetz, and K. Schulten. Neural Computation and Self-Organizing Maps.

Addison-Wesley, Reading, MA, 1992.

[18] M. Spitzer. A neurocomputational approach to delusions. Compr. Psychiatry, (36):83–105, 1995.

[19] Simon Haykin. Neural Networks – a Comprehensive Foundation. Prentice Hall, New Jersey, 2nd edition, 1999. ISBN 0-13-273350-1.

[20] E.R. Kandel, Schwartz J.H., and Jessel T.M., editors. Principles of Neural Science. McGraw Hill, New York, 4th edition, 2000. Part V: Perception.

[21] I.L. Cohen. An artificial neural network analogue of learning in autism. Biol. Psychiatry, (36):5–20, 1994.

[22] L. Gustafsson. Inadequate cortical feature maps: A neural circuit theory of autism. Biol. Psy- chiatry, (42):1138–1147, 1997.

[23] B. Hermelin. Images and language. In M. Rutter and E. Schoppler, editors, Autism: A Reap- praisal of Concept and Treatment, pages 141–154. Plenum, New York, 1978.

[24] F. Happ´e. The autobiographical writings of three asperger syndrome adults: Problems of iden- tification and implications for theory. In U. Frith, editor, Autism and Asperger Syndrome, pages 207–242. Cambridge University Press, Cambridge, 1991.

[25] U. Frith. Autism: Explaining the Enigma. Basil Blackwell, Oxford, 1989.

[26] C. von der Malsburg. Self-organization of orientation sensitive calls in the striate cortex. Ky- bernetik, 14:85–100, 1973.

(15)

A Self-Organizing Feature Maps

Self-Organizing Feature Maps also known as Kohonen maps, topographic maps, or self-organizing maps were first introduced by von der Malsburg in 1973 [26] and in their present form by Kohonen in 1982 [15, 16]. Self-Organizing Maps (SOMs) are competitive neural networks in which neurons are organized in an l-dimensional lattice (grid) representing the feature space. Such neural networks perform mapping of a p-dimensional input space into the l-dimensional feature space. With respect to the visualisation aspect the dimensionality of the feature space is often restricted to l = 1, 2 or 3.

Consider an example of a self-organizing map consisting of m = 12 neurons in which the input space is 3-dimensional (p = 3) and the feature space is 2-dimensional (l = 2). The structure of such a SOM is illustrated in Figure 7. The first section of the network is a distance-measure

W ^d¹ V

x₂ x₃ A SELF−ORGANIZING 2−D FEATURE MAP

d₁₂ m=12

MinNet Distance−measure

y2,2

y_2,1 y

y

1,1

1,2

y_3,4 x1

2−D lattice of neurons 1

Figure 7: A 2-D SOFM with p = 3; m = [3 4]; l = 2.

layer consisting of m = 12 dendrites each containing p = 3 synapses excited by the input signal vectors x = [x₁ x₂ x₃] and characterised by the weight vector w_i = [w_i1 w_i2 w_i3]. The distance- measure layer calculates the distances d_i between each input vector x and every weight vector w_i. This distance information, (d1, . . . , dm) is passed to the competition layer, the MinNet in Figure 7, which calculates the minimal distance d_k = min d_i in order to establish the position of the winning neuron k. The competition is implemented through the lateral inhibitive and local self-excitatory connections between neurons in the competitive layer. In addition, every neuron is located at l = 2- dimensional lattice and its position is specified by an l-dimensional vector v_i = [v_i1v_i2].

The synaptic weight vectors, w_i, and the vectors of topological positions of neurons, v_i, are grouped into the weight and position matrices, W, V , respectively.

(16)

A.1 Feature Maps

A typical Feature Map is a plot of synaptic weights in the input space in which weights of the neighbouring neurons are joined by lines and illustrates the mapping from the input space to the feature spaces. For simplicity, we restrict our attention here to two-dimensional input and feature spaces (p, l = 2).

As an illustrative example let us consider a SOM with p = 2 inputs and m = 12 neurons organized on a 3 × 4 lattice as in Figure 7. An example of the weight W and position V matrices and the resulting resulting feature map is given in Figure 8. Note how the feature map of Figure 8 represents the mapping of the 2-D input space onto a 2-D neuronal lattice as in Figure 7.

k W V

1 0.83 0.91 1 1 2 0.72 2.01 2 1 3 0.18 2.39 3 1 4 2.37 0.06 1 2 5 1.38 2.18 2 2 6 1.41 2.82 3 2 7 2.38 1.27 1 3 8 2.06 1.77 2 3 9 2.51 2.61 3 3 10 3.36 0.85 1 4 11 3.92 2.05 2 4

12 3.16 2.90 3 4 00 0.5 1 1.5 2 2.5 3 3.5 4

0.5 1 1.5 2 2.5 3

A 2−D Feature Map using "grid lines" (method 2)

wv11 wv21 wv31

wv12 wv22

w_v32

w_v13 wv23

wv33

wv14 wv24 w_v34

x1 x2

Figure 8: Example of the weight and position matrices and the the resulting feature map for p, l = 2

A.2 Learning Algorithm for Self-Organizing Feature Maps

The objective of the learning algorithm for a SOFM neural network is formation of a feature map which captures the essential characteristics of the p-dimensional input data and maps them on an l-D feature space. The learning algorithm consists of two essential aspects of the map formation, namely, competition and cooperation between neurons of the output lattice.

Competition is implemented as in the competitive learning: each input vector x(n) is compared with each weight vector from the weight matrix W and the position V (k(n), :) of the winning neuron k(n) is established. For the winning neuron the distance

d_k = |x^T(n) − W (k(n), :)|

attains a minimum (as in MATLAB used as a simulation tool ‘:)’ denotes all column elements of a matrix).

Cooperation All neurons located in a topological neighbourhood of the winning neurons k(n) will have their weights updated usually with a strength Λ(j) related to their distance ρ(j) from the

(17)

winning neuron,

ρ(j) = |V (j, :) − V (k(n), :)| for j = 1, . . . , m.

The neighbourhood function, Λ(j), is usually an l-dimensional Gausssian function:

Λ(j) = exp(−ρ²(j) 2σ² )

where σ²is the variance parameter specifying the spread of the Gaussian function.

Feature map formation is critically dependent on the learning parameters, namely, the learning gain, η, and the spread of the neighbourhood function specified for the Gaussian case by the variance, σ². In general, both parameters should be time-varying, but their values are selected experimentally.

Usually, the learning gain, η, should stay close to unity during the ordering phase of the algorithm which can last for, say, 1000 iteration (epochs). After that, during the convergence phase, should be reduced to reach the value of, say, 0.1. The spread, σ², of the neighbourhood function should initially include all neurons for any winning neuron and during the ordering phase should be slowly reduced to eventually include only a few neurons in the winner’s neighbourhood. During the convergence phase, the neighbourhood function should include only the winning neuron.

Details of the SOFM learning algorithm

The complete algorithm can be described as consisting of the following steps 1. Initialise:

(a) the weight matrix W with a random sample of m input vectors.

(b) the learning gain and the spread of the neighbourhood function.

2. for every input vector, x(n), n = 1, . . . , N :

(a) Determine the winning neuron, k(n), and its position V (k, :) as k(n) = arg min

j |x^T(n) − W (j, :)|

(b) Calculate the neighbourhood function

Λ(n, j) = exp(−ρ²(j) 2σ² ) where

ρ(j) = |V (j, :) − V (k(n), :)| for j = 1, . . . , m.

(c) Update the weight matrix as

∆W = η(n) · Λ(n) · (x^T(n) − W (j, :))

All neurons (unlike in the simple competitive learning) have their weights modified with a strength proportional to the neighbourhood function and to the distance of their weight vector from the current input vector (as in competitive learning).

The step (2) is repeated E times, where E is the number of epochs.

(18)

3. (a) During the ordering phase, shrink the neighbourhood until it includes only one neuron:

σ²(e) = σ₀² e

where e is the epoch number and σ₀²is the initial value of the spread (variance).

(b) During the convergence phase, “cool down” the learning process by reducing the learning gain. We use the following formula:

η(e) = η₀ 1 + η_pe

where η₀ is the initial value of the learning gain, and η_p is selected so that the final value of the learning gain, reaches the prescribed value, η(E) = ηf.

An Experiment in Modelling Learning in Autism Using Self-Organizing Artificial Neural Networks