Introduction: The XOR Problem
The advent of multilayer neural networks sprang from the need to implement the XOR logic gate. Early perceptron researchers ran into a problem with XOR. The same problem as with electronic XOR circuits: multiple components were needed to achieve the XOR logic. With electronics, 2 NOT gates, 2 AND gates and an OR gate are usually used. With neural networks, it seemed multiple perceptrons were needed (well, in a manner of speaking). To be more precise, abstract perceptron activities needed to be linked together in specific sequences and altered to function as a single unit. Thus were born multi-layer networks.
Why go to all the trouble to make the XOR network? Well, two reasons: (1) a lot of problems in circuit design were solved with the advent of the XOR gate, and (2) the XOR network opened the door to far more interesting neural network and machine learning designs.
Figure 1. XOR logic circuit (Floyd, p. 241).
If you're familiar with logic symbols, you can just look at this circuit, compare it to Figure 2, and see how the two function alike. The two inverters (NOT gates) do exactly what the -2 is doing in Figure 2. The OR gate is doing exactly the same function as the 0.5 activation in the output unit of Figure 1. And everywhere you see a +1 in Figure 2, those together perform the same as the two AND gate in Figure 1.
While perceptrons are limited to variations of Boolean algebra functions including NOT, AND and OR, the XOR function is a hybrid involving multiple units (Floyd, p. 241).
Figure 2. XOR Gate. Note the center-most unit is hidden from the outside influence, and only connect via input or output units. The value of 1.5 for the threshold for the hidden unit insures that it will be turned on only when both input units are on. The value of 0.5 for the output unit insures that it will turn on only when a net positive input greater than 0.5. The weight of -2 from the hidden unit to the output unit insures that the output unit will come on when both inputs are on.
A Multilayer Perceptron (MLP) is a type of neural network referred to as a supervised network because it requires a desired output in order to learn. The goal of this type of network is to create a model that correctly maps the input to the output using pre-chosen data so that the model can then be used to produce the output when the desired output is unknown.
Figure 3. An MLP with two hidden layers. As input patterns are fed into the input layer, they get multiplied by interconnection weights while passing from the input layer to the first hidden layer. Within the first hidden layer, they get summed, and then processed by a nonlinear activation function. Each time data is processed by a layer, it gets multiplied by interconnection weights, then summed and processed by the next layer. Finally the data is processed one last time within the output layer to produce the neural network output.
The MLP learns using an algorithm called backpropagation. With backpropagation, the input data is repeatedly presented to the neural network in a process known as "training". With each presentation the output of the neural network is compared to the desired output and an error is computed. This error is then fed back (backpropagated) to the neural network and used to adjust the weights such that the error decreases with each iteration and the neural model gets closer and closer to producing the desired output.
Figure 4. A neural network learning to model exclusive-or (XOR) data. The XOR data is repeatedly presented to the neural network. With each presentation, the error between the network output and the desired output is computed and fed back to the neural network. The neural network uses this error to adjust its weights such that the error will be decreased. This sequence of events is usually repeated until an acceptable error has been reached or until the network no longer appears to be learning.
Floyd, Thomas L. (2003). Digital fundamentals. (8th ed.). New Jersey: Prentice Hall.
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986). "Learning internal representations by error propagation." In Rumelhart, D. E., McClelland, J. L. (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1. Cambridge, MA.: MIT Press.