Monte-Carlo Tree Search

Game Playing with Monte-Carlo Tree Search

Aims: To design and implement game playing program(s) with Monte-Carlo Tree Search.

Background: Monte-Carlo Tree Search (MCTS) is a newly developed class of methods for heuristic search in game trees, and for other problems in which tree search is used. In a nutshell, the idea is to explore the game tree by simulating a great many complete sequences of random play (`rollouts’), gradually concentrating attention on the more promising paths found, and so growing an initial promising game tree, and finally selecting the estimated best move. The number of rollouts considered at each move may be tens of thousands (or even millions, depending on the problem and how fast your code is). It is a different game-tree search strategy from the older method of alpha-beta pruning.

MCTS is an active research area, and has scored some striking successes – it is the basis of the current best Go programs (Go has too high a branching factor for alpha-beta pruning to be used).

In this project, the aim is to produce and evaluate a simple implementation of MCTS for a simple game, and to do some experimental evaluations of its performance. Possible games might be Connect4 (probably a good choice) or 9×9 capture Go (requires some subtle programming).

Early Deliverables

  1. Proof of concept programs: bandit problem solver (‘bandit problems’ are particularly trivial statistical games, which you should tackle first), with experiments showing how upper-confidence-bound (UCB) methods compare to naive strategies.
  2. Proof of concept programs: Program that enables 2 human players to play the selected game (no AI moves).
  3. Reports: UCB method for bandit problems.
  4. Reports: Rules of the game, expressed semi-formally as state and allowable transitions.
  5. Reports: Algorithm for MCTS.

Final Deliverables

  1. The program must have a full object-oriented design, with a full implementation life cycle using modern software engineering principles
  2. The program should have a GUI for game play, for the case of two human players, and one player against the computer.
  3. The report should include experimental assessments of effects on program performance of parameters of MCTS (depth of search, number of rollouts used, etc).
  4. The report will describe the software engineering process involved in generating your software.
  5. The report will have sections on: the theory of bandit processes, and the algorithm for MCTS; the game used, the software engineering process in development of the program; an explanation and justification of design decisions for the GUI; and an experimental analysis of the performance of MCTS.

Suggested Extensions

  • A possible enhancement would be a comparison of MCTS with alpha-beta pruning.
  • The project could be done for a relatively simple