1
|
- Laws of Association
- Building Associations Into Networks
- The Hebb Rule
- The Delta Rule
|
2
|
- Memory has been studied for thousands of years
- Aristotle was interested in chains of memory, and proposed laws of
association to explain successive thoughts
- Aristotle wrote “Acts of recollection happen because one change is of a
nature to occur after another”
- Aristotle considered three different kinds of relationships between the
starting image and its successor: similarity, opposition, and (temporal)
contiguity
|
3
|
- Aristotelian theory evolved into British empiricism, and later into the
association psychology
- Scholars argued over the nature of the laws that permitted two ideas to
be associated, so that the occurrence of one idea would lead naturally
to the subsequent occurrence of the other
- One law that persisted throughout this evolution was some form of the law
of contiguity
|
4
|
- One of the key building blocks for a connectionist system is a method
for storing associations between and input and output pattern
- Let us begin by considering a couple of simple methods by which this
sort of association could be achieved
- We will focus on bringing the law of contiguity to life:
- “When two elementary brain-processes have been active together or in
immediate succession, one of them, on reoccurring, tends to propagate
its excitement into the other” (James, 1890)
|
5
|
|
6
|
|
7
|
- “When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes place in firing it, some growth process or
metabolic change takes place in one or both cells such that A’s
efficiency, as one of the cells firing B, is increased” (Hebb, 1949)
- Principle of contiguity!
|
8
|
- Modern views of Hebb learning involve the strengthening of synapses
(both excitatory and inhibitory) as well as the weakening of synapses
- These two processes have been combined to create many interesting models
of content addressable memories
|
9
|
- “Address” addressable memory
- Retrieve items by content-independent location
|
10
|
- A simple distributed memory system consists of two sets of processors,
and a set of modifiable connections between them
|
11
|
- Present two patterns of activity
- Associate the patterns because of their temporal contiguity
- Later, one pattern will cue the other
|
12
|
- Make more excitatory the connections between same-state processors
- Make more inhibitory the connections between opposite-state processors
|
13
|
- To recall, activate processors with the cue
- Their activity sends a signal through existing connections
|
14
|
- The network signal should reconstruct the other pattern in the second
set of processing units
|
15
|
- Let’s examine the Hebb rule in action
- Let us also determine some conditions in which Hebb learning does not
work very well
|
16
|
|
17
|
- Let W(t) be a matrix of connection weights at time t
- Let a and b be two to-be-associated vectors
- Hebb learning becomes:
- W(t+1) = W(t) + a b’
- The outer product defines Hebb learning!
|
18
|
|
19
|
- Recall from the memory involves filtering a signal through existing
weights to produce output activity
- r = Wc
|
20
|
|
21
|
- We can use linear algebra to reveal some interesting limitations of Hebb
learning
- For instance, what if we relax the mutual orthogonality constraint?
- What if the correlation between c and a is equal to 0.5?
|
22
|
|
23
|
- We would like to develop a new kind of Hebb learning rule
- This rule would permit the network to correctly recall correlated
patterns
- This rule would also allow the network to improve its performance with
repeated presentations of patterns
|
24
|
|
25
|
- The delta rule can be viewed as a Hebb-style association between an
input vector and an (output) error vector
- Repeated applications will reduce error
- The amount of learning depends on the amount of error
- The delta rule can be written as:
- Dt+1 = h ((t - o) · cT)
|
26
|
- One vector can be created by combining (adding) two others
- If we have a set of vectors, and none of the vectors can be created by
combining the others, the set of vectors is said to be linearly
independent
- If the vectors are such that one can be created by combining some of the
others, then the set is linearly dependent
|
27
|
- Let’s examine the delta rule in action
- Let us note some instances in which it serves as an improvement over
Hebb learning
- But let us also note that it is still subject to limitations
|
28
|
- How do we move beyond the sorts of limitations that we have noted in the
simple distributed memory?
- First, we need to add nonlinearities into the processing units, letting
them make decisions
- Second, we need to add some methods by which layers of these
nonlinearities can be coordinated together
- These will be our topics in later lectures
|