Multilayer Neural Network Design

There are several factors involved in the design of a multilayer neural network:

  1. The number and complexity of training samples
  2. The number of hidden layers and their nodes
  3. The activation function used
  4. Learning Rate
  5. Momentum

Number Of Training Samples

The number of output nodes must somehow represent the number of possible outcomes the network should have. So, it makes sense that the number of input samples should be as many. If you want the network to have two possible output, you'd need only one output node (i.e.: binary), so you'd need at least two input samples. If you want six possible outputs, you might choose six output nodes (i.e.: only one fires for its category), so you'd need at least six input samples.

Ultimately, we're talking about statistical functions, though, and statistics can only represent reality as well as the sample population. The "richness" and accuracy of the network depends on having as many samples that represent as many possible patterns as the network might encounter.

In some cases it may be impossible to provide every single last input pattern the network can accept. Like visual patterns: the total number of possible patterns is virtually incalculable. You may have to make due with a subset of as many diverse patterns as you can find.

If your input layer has three nodes, that implies 8 different possibilities of input patterns (2^3). Using all 8 would be best. The number of possible input patterns goes up depending on the number of input nodes. Like six nodes: 2^6 = 64 possibilities. Using all 64 will train your network to recognize the most possible patterns.

The short answer is this: it's better to use as many different input patterns as your network can receive and categorize.

Hidden Layers

The number of nodes in a hidden layer determines the 'expressive power' of the network.  It can be said that hidden layer nodes cause a neural net to fit the noise of the input.

For 'smooth', easy functions with stable, softly changing variables, fewer hidden layer nodes are needed.

But for wildly fluctuating functions, more nodes will be needed.

This really is an important example of moderation. If you have TOO FEW hidden units, the quality of a prediction will drop and the net doesn't have enough "brains". And if you make it TOO MANY - it will have a tendency to "remember" the right answers, rather than predicting them. Then your neural net will work very well only on the familiar data, but will fail on the data that was never presented before. Too many hidden layer nodes causes the network to "specialize", when it really should "generalize". Neural networks are most often applied to real-world problems, which are frought with unknowns, so networks are designed to make "educated guesses" not exact answers.

Activation Functions


Range: 0.0 and 1.0


y(x) = 1 / (1 + e^-x)





Bipolar Sigmoid

Range: 1.0 and -1.0


y(x) = 2 / (1 + e^-x) - 1





Sigmoid Derivative

Range: between 0.0 and 1.0 if Sigmoid function is used.

between -1.0 and 1.0 if Bipolar Sigmois function is used.

Format: note that the derivative function uses either the Sigmoid or Bipolar Sigmoid function from above.

y(x) = (1 / 2) * (1 + Sigmoid(x)) * (1 - Sigmoid(x))


Hyperbolic Tangent (TANH)

Range: -1.0 and 1.0


y = (e^x - e^(-x)) / (e^x + e^(-x))





public void footer() {
About | Contact | Privacy Policy | Terms of Service | Site Map
Copyright© 2009-2012 John McCullock. All Rights Reserved.