1

 Laws of Association
 Building Associations Into Networks
 The Hebb Rule
 The Delta Rule

2

 Memory has been studied for thousands of years
 Aristotle was interested in chains of memory, and proposed laws of
association to explain successive thoughts
 Aristotle wrote “Acts of recollection happen because one change is of a
nature to occur after another”
 Aristotle considered three different kinds of relationships between the
starting image and its successor: similarity, opposition, and (temporal)
contiguity

3

 Aristotelian theory evolved into British empiricism, and later into the
association psychology
 Scholars argued over the nature of the laws that permitted two ideas to
be associated, so that the occurrence of one idea would lead naturally
to the subsequent occurrence of the other
 One law that persisted throughout this evolution was some form of the law
of contiguity

4

 One of the key building blocks for a connectionist system is a method
for storing associations between and input and output pattern
 Let us begin by considering a couple of simple methods by which this
sort of association could be achieved
 We will focus on bringing the law of contiguity to life:
 “When two elementary brainprocesses have been active together or in
immediate succession, one of them, on reoccurring, tends to propagate
its excitement into the other” (James, 1890)

5


6


7

 “When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes place in firing it, some growth process or
metabolic change takes place in one or both cells such that A’s
efficiency, as one of the cells firing B, is increased” (Hebb, 1949)
 Principle of contiguity!

8

 Modern views of Hebb learning involve the strengthening of synapses
(both excitatory and inhibitory) as well as the weakening of synapses
 These two processes have been combined to create many interesting models
of content addressable memories

9

 “Address” addressable memory
 Retrieve items by contentindependent location

10

 A simple distributed memory system consists of two sets of processors,
and a set of modifiable connections between them

11

 Present two patterns of activity
 Associate the patterns because of their temporal contiguity
 Later, one pattern will cue the other

12

 Make more excitatory the connections between samestate processors
 Make more inhibitory the connections between oppositestate processors

13

 To recall, activate processors with the cue
 Their activity sends a signal through existing connections

14

 The network signal should reconstruct the other pattern in the second
set of processing units

15

 Let’s examine the Hebb rule in action
 Let us also determine some conditions in which Hebb learning does not
work very well

16


17

 Let W(t) be a matrix of connection weights at time t
 Let a and b be two tobeassociated vectors
 Hebb learning becomes:
 W(t+1) = W(t) + a b’
 The outer product defines Hebb learning!

18


19

 Recall from the memory involves filtering a signal through existing
weights to produce output activity
 r = Wc

20


21

 We can use linear algebra to reveal some interesting limitations of Hebb
learning
 For instance, what if we relax the mutual orthogonality constraint?
 What if the correlation between c and a is equal to 0.5?

22


23

 We would like to develop a new kind of Hebb learning rule
 This rule would permit the network to correctly recall correlated
patterns
 This rule would also allow the network to improve its performance with
repeated presentations of patterns

24


25

 The delta rule can be viewed as a Hebbstyle association between an
input vector and an (output) error vector
 Repeated applications will reduce error
 The amount of learning depends on the amount of error
 The delta rule can be written as:
 D_{t+1} = h ((t  o) · c^{T})

26

 One vector can be created by combining (adding) two others
 If we have a set of vectors, and none of the vectors can be created by
combining the others, the set of vectors is said to be linearly
independent
 If the vectors are such that one can be created by combining some of the
others, then the set is linearly dependent

27

 Let’s examine the delta rule in action
 Let us note some instances in which it serves as an improvement over
Hebb learning
 But let us also note that it is still subject to limitations

28

 How do we move beyond the sorts of limitations that we have noted in the
simple distributed memory?
 First, we need to add nonlinearities into the processing units, letting
them make decisions
 Second, we need to add some methods by which layers of these
nonlinearities can be coordinated together
 These will be our topics in later lectures
