# Reversi Example 2: Thinking Ahead, Using MiniMax

This example uses both the Minimax algorithm and region-favoring rules to choose its moves. The Minimax algorithm provides a way of looking multiple moves ahead, while maximizing the AI's score and minimizing the human player's score. Defining strategic regions about the board shapes the AI's ability to assess movement risks.

The Minimax is a recursive algorithm that repeatedly evaluates scores over times series, making choices based on maximum scores during some steps, while choosing minimum scores of others. When applied to strategy games, Minimax explores all possible moves of the first player, simulating those moves, and exploring the possible moves of the second player in response. The algorithm searches for the smallest score during the second player's simulated responses, and the largest scores during its own simulated moves, thereby discovering which moves will maximize his own score and minimize his opponent's.

Figure 1. Exploring the possible moves creates a decision tree. Each level of the tree represents either the player's or the opponent's turn. The Minimax algorithm determines the moves which will yield the maximum score for the player, and the minimum score for the opponent.

John von Neumann was credited for the Minimax Theorum described in 1928. Basically, in any zero sum game (i.e: where each player's gain is exactly proportional to the opponent's loss), and where each player can know all possible moves at any given turn, a strategy must exist for both players which maximizes their gain and minimizes their loss.

In this implementation of Minimax, I've used two recursive functions: one for simulating possible moves n steps ahead, and another for calculating which move yields the best score.

The base simMoves() function starts by creating the root node of a decision tree, mRoot. From there, all possible moves the computer can make are passed to the second simMoves() function and subjected to all possible moves the human player can make in response, which are then subjected to all possible moves the computer can make in response, ...and so on for n steps ahead.

Note how a temporary gameboard matrix is created for each simulated move. These temporary gameboards contain the simulated moves for evaluation in each step of the recursion.

```private void simMoves()
{
this.mRoot = new Node();
this.mMovesAhead = 0;

// Get a list of all immediately available moves.
ArrayList moves = this.findAllPossibleMoves(this.mComputer, this.mPlayer, this.matrix);

if(moves.size() > 0){
// Simulate moves from the immediate list.
this.simMoves(this.mRoot, moves, this.matrix, this.mComputer, this.mPlayer);
}
return;
}

private void simMoves(final Node root,
final ArrayList moves,
final SquareState[][] aMatrix,
final SquareState playerA,
final SquareState playerB)
{
if(++this.mMovesAhead < this.mTotalMovesAhead){
for(Move aMove: moves)
{
Node aNode = new Node(aMove, root);
root.setChild(GridMath.getID(aMove.X(), aMove.Y()), aNode);

// Make a copy of the game board.
SquareState[][] tempMatrix = new SquareState[Globals.GRID_SIZE_INTEGER][Globals.GRID_SIZE_INTEGER];
for(int y = 0; y < Globals.GRID_SIZE_INTEGER; y++)
for(int x = 0; x < Globals.GRID_SIZE_INTEGER; x++)
tempMatrix[x][y] = aMatrix[x][y];

// Make a possible prospective move.
tempMatrix[aMove.X()][aMove.Y()] = playerA;
// Flip the simulated pieces for the move.
Traverse t = new Traverse(aMove.X(), aMove.Y(), playerA, playerB, tempMatrix);
ArrayList flips = t.getFlips();
this.flipPieces(flips, playerA, tempMatrix, false);

// Simulate the opponent's possible counter moves.
ArrayList tempMoves = this.findAllPossibleMoves(playerB, playerA, tempMatrix);
if(tempMoves.size() > 0){
this.simMoves(aNode, tempMoves, tempMatrix, playerB, playerA);
}
}
}
return;
}```

After the tree of simulated moves are made, a second set of recursive functions are used to traverse the tree backward, from the youngest child nodes, all the way back to the root node, calculating the move with the best future effects (the best score).

The giant for-loop in the base findBestMove() function implements an additional scripted strategy for favoring some areas while avoiding others (I'm calling them risk regions).

```public Move findBestMove()
{
Move bestMove = null;
ArrayList children = this.mRoot.getChildren();
if(children.size() > 0){
findBestMove(this.mRoot, true);
// Now get the max from the root's children
int bestIndex = 0;
for(int i = 0; i < children.size(); i++)
{
// Bias is imposed here to simulate more strategic behavior.  Occupying corners and
// edges of the game board often lead to strategic advantages in the game.
if(children.get(i).getMove().X() == 0 && children.get(i).getMove().Y() == 0 ||
children.get(i).getMove().X() == 0 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 1 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 1 && children.get(i).getMove().Y() == 0 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 1 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 1){
// Highest bias toward corners.
children.get(i).setMaxScore(children.get(i).getMaxScore() + this.mCornerBias);
}else if(children.get(i).getMove().X() == 1 && children.get(i).getMove().Y() == 0 ||
children.get(i).getMove().X() == 0 && children.get(i).getMove().Y() == 1 ||
children.get(i).getMove().X() == 1 && children.get(i).getMove().Y() == 1 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 2 && children.get(i).getMove().Y() == 0 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 1 && children.get(i).getMove().Y() == 1 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 2 && children.get(i).getMove().Y() == 1 ||
children.get(i).getMove().X() == 0 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 2 ||
children.get(i).getMove().X() == 1 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 1 ||
children.get(i).getMove().X() == 1 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 2 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 1 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 2 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 2 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 1 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 2 && children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 2){
// Bias against Region4.
children.get(i).setMaxScore(children.get(i).getMaxScore() + this.mRegion4Bias);
}else if(children.get(i).getMove().X() == 0 ||
children.get(i).getMove().X() == Globals.GRID_SIZE_INTEGER - 1 ||
children.get(i).getMove().Y() == 0 ||
children.get(i).getMove().Y() == Globals.GRID_SIZE_INTEGER - 1){
// Lower bias toward edges.
children.get(i).setMaxScore(children.get(i).getMaxScore() + this.mEdgeBias);
}
if(children.get(i).getMaxScore() > children.get(bestIndex).getMaxScore()){
bestIndex = i;
}
}
bestMove = children.get(bestIndex).getMove();
}

return bestMove;
}

private void findBestMove(final Node root, final boolean getMaxFromMin)
{
// The idea behind this recursive method, is to traverse all the way out to the leaves of
// the tree, then calculate scores for parent nodes while returning back to the root.
ArrayList children = root.getChildren();
if(children.size() <= 0){
for(Node child : children)
{
this.findBestMove(root, !getMaxFromMin);
child.setMaxScore(child.getMinMaxFromChildren(getMaxFromMin));
child.setMinScore(child.getMinMaxFromChildren(!getMaxFromMin));
}
}
return;
}```

# Risk Regions

Beyond Minimax, this implementation of reversi also includes risk regions defined on the board:

Figure 2. Arbitrary regions representing very general movement risks. Typically, Regions 3 and 5 are very valuable strategic areas. I've found that Region 4 is a real weakness in example 1, often making it easy for me to take the corners.

Regions 3 and 5 are coveted real estate on the Reversi gameboard, because it's harder for opponents to threaten your pieces. Especially so for Region 5. These two regions also make it easy to take large numbers of opponent pieces at once. So the AI in this example is encouraged to place pieces in these regions whenever possible.

In contrast, Regions 2 and 4 are risky territory because opponents gain access to Regions 3 and 5 after taking pieces place here. So, the AI is discouraged from occupying these two regions.

To calculate the best move, the number of possible opponent pieces taken is compared to the bias of the risk regions involved.

Note that this example allows you to vary the amount of bias from the risk regions:

Figure 3. These are the default options. Moves Ahead is limited to between 2 and 10. Edges (Region 3), Corners (Region 5) and Region 4 are limited to between -50 to +50.

Funny thing is, you can even change these values in the middle of a game!

public void footer() {
About | Contact | Privacy Policy | Terms of Service | Site Map
Copyright© 2009-2012 John McCullock. All Rights Reserved.
}