Q-Learning .*;
Q-Learning Example 2
This example is simply a 16-node version of Example 1. Similar code, with a few refinements.
Note how much wasted memory is used on matrix Q (all the zeros in the results) where there are no links between nodes to record any learning for. This implies the need for a better method of storing learned information. At first, we might see geometric patterns of zeros in matrix Q, but these are really only consequential to the layout of the node/link pattern during design. Now consider that the indices of matrix Q are what we're using to map the agent's progress. Why not just record these index/coordinates with their resulting learning score, and forget anything with a zero? At any rate, this example illustrates the disadvantages of using an entire matrix for Q.
Example Results 1
Shortest routes from initial states: 1, 0, 4, 8, 9, 5, 6, 2, 3, 7, 11, 15 3, 7, 11, 15 5, 6, 2, 3, 7, 11, 15 2, 3, 7, 11, 15 4, 8, 9, 5, 6, 2, 3, 7, 11, 15 0, 4, 8, 9, 5, 6, 2, 3, 7, 11, 15