Backpropagation .*;

Elman Networks

based on articles by Jeffrey L. Elman, Laurene Fausett, and Ben Krose & Patrick van der Smagt.

Most of the neural network architectures proposed by Jeffrey Elman were recurrent and designed to learn sequential or time-varying patterns.  In this way, the algorithms could recognize and predict learned series of values or events.  Elman's primary interest in this architecture was for language processing algorithms, but he suggested it's usefulness for just about anything involving sequences.

Elman's definition of a context revolved around prior internal states, and thus he added a layer of "context units" to a standard feedforward net.  In this way, the states of the hidden units could be fed back into the hidden units during the next stage of input.  Elman (1990) best describes it:

"Both the input units and context units activate the hidden units; and then the hidden units feed forward to activate the output units.  The hidden units also feed back to activate the context units.  This constitutes the forward activation.  Depending on the task, there may or may not be a learning phase in this time cycle.  If so, the output is compared with a teacher input and backpropagation of error is used to incrementally adjust connection strengths.  Recurrent connections are fixed at 1.0 and are not subject to adjustment.  At the next time step t+1 the above sequence is repeated.  This time the context units contain values which are exactly the hidden unit values at time t.  These context units thus provide the network with memory (pp. 4-5)."

 

Figure 1.  A common structure of an Elman network.  Note the context layer, which receives input from, and returns values to, the hidden layer.

 

In Figure 1 is an example of a simple recurrent network design with potential to learn an unlimited number of sequences of varying length.  Elman and others intended the input and output units to represent individual letters, with which the network could be trained to predict the next letter in a string of letters.

However, the nature of this kind of recurrent network is easier to understand (at least to me), simply by referring to the unit's position in serial order (i.e.; Y0, Y1, Y2, Y3, ...).  So for the purpose of this illustration, I'll just use strings of numbers like: 0, 3, 5, 2, 0, where 0 refers to Y0, 3 refers to Y3, 5 refers to Y5, etc.  Each string begins and ends with a terminal symbol; I'll use 0 for this example.

Other than beginning and ending with the terminal symbol, the order of the numbers within the string can be any order.  And with enough units in the network, the length of string is without limit.

The neural net architecture for this example has six input, and six output units (for numbers 1 through 5, plus 0 for the terminal symbol).  There are three hidden units, with three corresponding context units.

Since X0 represents the terminal symbol, its input is 1, all other inputs are 0.  The terminal symbol would correspond to the vector (1, 0, 0, 0, 0, 0).  The rest would follow this pattern:

1 = 0, 1, 0, 0, 0, 0
2 = 0, 0, 1, 0, 0, 0
3 = 0, 0, 0, 1, 0, 0
4 = 0, 0, 0, 0, 1, 0
5 = 0, 0, 0, 0, 0, 1

Training the net for a particular string involves several steps, the number depending on the length of the string.  At the beginning of training, the activations of the context units are set to 0.5.  The terminal symbol is first presented to the input units, and the net predicts the successor.  The error (the difference between the predicted and the actual successor specified by the training string) is determined and backpropagated, and the weights are adjusted.  The context units receive a copy of the hidden unit's activations, and the next symbol in the training string (which was the target for the output units on the first step of training) is presented to the input units.  Training continues in this manner until another instance of the terminal symbol (0) is reached.

 

Figure 2.  A time-delay neural network remembers the previous few training examples and uses them as input into the network. The network then works like a feed-forward, back propagation network.

 

The Algorithm

For each training string, do steps 1 through 7.

Step 1:    Set activations of the context units to 0.5.

Step 2:    Do Steps 3 through 7 until second instance of terminal symbol.

Step 3:    Present input symbol.

Step 4:    Present successor to output units as target response.

Step 5:    Calculate predicted successor.

Step 6:    Determine error, backpropagate, and update weights.

Step 7:    Test for stopping condition:

If target = second instance of terminal symbol, then

Stop

Otherwise,

Copy hidden unit activations to context units and continue at Step 3.

 

Example

Let's run through the algorithm with an example training string: ( 0, 3, 5, 2, 0 ).

Step 2:    Begin training for this string.

Step 3:    Input terminal symbol ( 1, 0, 0, 0, 0, 0 ).

Step 4:    Target response is the next symbol in the string ( 3 ).

( 0, 0, 0, 1, 0, 0 )

Step 5:    Compute predicted response, a real-valued vector with components between 0 and 1.

Step 6:    Determine error, backpropagate, and update the weights.

Step 7:    Copy hidden unit activations to context units.

Step 2:    Training for second symbol in this string.

Step 3:    Input symbol 3:  ( 0, 0, 0, 1, 0, 0 ).

Step 4:    Target response is 5:

( 0, 0, 0, 0, 0, 1 )

Step 5:    Compute predicted response.

Step 6:    Determine error, backpropagate, and update the weights.

Step 7:    Copy hidden unit activations to context units.

Step 2:    Training for second symbol in this string.

Step 3:    Input symbol 5:  ( 0, 0, 0, 0, 0, 1 ).

Step 4:    Target response is 2:

( 0, 0, 1, 0, 0, 0 )

Step 5:    Compute predicted response.

Step 6:    Determine error, backpropagate, and update the weights.

Step 7:    Copy hidden unit activations to context units.

Step 2:    Training for second symbol in this string.

Step 3:    Input symbol 2:  ( 0, 0, 1, 0, 0, 0 ).

Step 4:    Target response is the terminal symbol ( 0 ):

( 1, 0, 0, 0, 0, 0 )

Step 5:    Compute predicted response.

Step 6:    Determine error, backpropagate, and update the weights.

Step 7:    Target response is the second instance of the terminal symbol, training is complete.

 

After training, the net can be used to determine whether any given string of numbers is a valid sequence, according to the training string.  As each symbol is presented, the net predicts the possible valid successors of that symbol.  The output unit with the highest activation value indicates that the symbol it represents is a valid successor to the current input.  Ideally, the net should be trained with enough iterations and just the right learning rate, resulting in a target activation of 0.3 or better, with the others falling well below.

To determine whether a string of numbers is valid, the symbols (numbers) are presented to the net sequentially, as long as the net predicts valid successors in the string.  If the net fails to predict a successor, the string of numbers is rejected.  If all successors are predicted, the string is accepted as valid.

 

Example Output 1

The following is only the last couple dozen lines of output. Basically, the full output would show 2117 failed tests, ending with the 2118th, which was the first to succeed.

The numbers in parentheses represent the input vectors. Each row of fractional values were the output activations, indicating which input vector was expected to show up next.

In the first row, note that the value .390 represents the number 3, which the net predicted to be the next input. However, 2 was the next input, so that test failed.

In the next to last test, the net started with a .757 (expecting a 3), and as it turned out, the next input was a 3. Looks good so far, so the net tried for the next number in line. In the next row, the .812 shows it was expecting a 5 in the next input. But, alas, the next input was a 0. Therefore, that test failed, and the net started another test.

And so on, until finally, after 2118 tries, the right sequence of 0, 3, 5, 2, 0 was found. The remaining "(2)" at the end is only a leftover from testing process.

... Abbreviated listing ...

(0) 0.141 0.019 0.109 0.390 0.016 0.142 
(3) 0.054 0.015 0.091 0.147 0.018 0.812 
(2) Failed.

(0) 0.134 0.019 0.102 0.749 0.016 0.158 
(1) Failed.

(0) 0.144 0.020 0.111 0.758 0.016 0.136 
(4) Failed.

(0) 0.141 0.019 0.109 0.756 0.016 0.142 
(5) Failed.

(0) 0.150 0.020 0.117 0.762 0.017 0.126 
(0) Failed.

(0) 0.169 0.020 0.112 0.757 0.016 0.116 
(2) Failed.

(0) 0.134 0.019 0.102 0.749 0.016 0.158 
(0) Failed.

(0) 0.169 0.020 0.112 0.757 0.016 0.116 
(5) Failed.

(0) 0.150 0.020 0.117 0.762 0.017 0.126 
(0) Failed.

(0) 0.169 0.020 0.112 0.757 0.016 0.116 
(3) 0.055 0.015 0.090 0.146 0.018 0.812 
(0) Failed.

(0) 0.169 0.020 0.113 0.757 0.016 0.115 
(3) 0.055 0.015 0.090 0.146 0.018 0.812 
(5) 0.153 0.078 0.770 0.184 0.073 0.129 
(2) 0.818 0.014 0.101 0.136 0.017 0.064 
(0) 0.135 0.019 0.102 0.749 0.016 0.157 
(2) Success.
Completed 2118 tests.

Example Output 2

As with my other backpropagation examples, sometimes the network doesn't finish training with 100% accuracy. This can lead to erroneous predictions.

... Abbreviated listing ...

(0) 0.126 0.038 0.076 0.688 0.041 0.365 
(3) Failed.

(0) 0.126 0.038 0.075 0.688 0.041 0.365 
(0) Failed.

(0) 0.117 0.037 0.083 0.677 0.040 0.365 
(2) Failed.

(0) 0.114 0.037 0.087 0.676 0.041 0.363 
(3) Failed.

(0) 0.126 0.038 0.076 0.688 0.041 0.365 
(5) 0.038 0.034 0.813 0.111 0.028 0.232 
(2) 0.805 0.035 0.057 0.096 0.034 0.256 
(0) 0.113 0.037 0.088 0.675 0.041 0.362 
(5) 0.038 0.034 0.811 0.111 0.028 0.233 
(5) Success.
Completed 81 tests.


"0, 5, 2, 0, 5" isn't quit right.

 

Works Cited

Elman, Jeffrey L.  (1990).  "Finding structure in time."  Cognitive Science, 14, pp. 179-211.

Fausett, Laurene.  (1994).  Fundamentals of neural networks: Architectures, algorithms, and applications.  New Jersey: Prentice Hall.

 

 

public void footer() {
About | Contact | Privacy Policy | Terms of Service | Site Map
Copyright© 2009-2012 John McCullock. All Rights Reserved.
}