Herein, the main investigational problem is learning. This project focus on how a system based on a hybrid attitude between total competition and total cooperation, might be able to extract knowledge in order to improve its own performance over a specific working problem: Learn how to play a given game well.
Games are noise-free, cheap and easily replicable environments where the concepts of tactics, strategy, searching, learning and other relevant features of Artificial Intelligence (AI) can be tested in their purest state. Games, due to their conceptual flexibility, allow an open testing field for many different ideas.
Games are also fun. They are spread by all cultures, regions and epochs, and might have been (and continue to be) very important to our evolution, since they provided an entertainment dimension over intellectual and rational assignments. A game may be a tool for learning how to think properly.
Games embark on an ocean of ideas, concepts, strategies and attention from many different people. In fact, anybody trying to collect all available games, is rapidly overwhelmed by their number and variety. Here, when the word game is used, it refers to games without chance (no dice or whatever stochastic gadget) and with no hidden information (everybody knows the same thing about the game at any time). Games of this type are usually called perfect-information games. To restrict even more this still huge set of possible games, we should deal with games just for 2 players (without the interesting, but complex problems of Diplomacy, Bluffing, Politics, Alliances, ...), played on regular discrete grids (namely, square and hexagonal grids) and that are turn-based (this is a direct implication of perfect-information). More, the game theme (if any) is of no concern, and so this set of games also fit into the abstract games category.
From all these, we shall focus on games with only one or two types of pieces; this will exclude Chess (the Drosophila of AI, as Alexander Kronrod said once) but still including Checkers, Othello, Go, Tic-Tac-Toe and many dozens more (for a extensive list of such games, go to The World of Abstract Games). However, this will not imply a severe restriction on the project (the maximum number of piece types will easily be extended to more than just 2) but rather to keep the project goals as focused as possible in the fundamental problem of learning, that we intend to study.
The actual field of investigation on board game software players includes several different approaches. Some of them are reviewed here (also check an extensive list of papers from the Machine Learning in Games):
The specific knowledge-based approach has had great achievements in some games, the most famous case being IBM's Deep Blue victory against Kasparov on the game of Chess. The GAMES group of the University of Alberta works on several world-class software's like Checkers, Hex or Lines-of-Action. Victor Allis' work solved Gomoku, Connect4 and Qubic. John Romein's Multigame is a program using a rule-oriented language to define how to play, but still requires an human coded evaluation for each given game.
The meta-player approach tries to find known patterns from the rules of a given game, and applies them by tuning the set of features that defines the evaluation function. Barney Pell's Metagame is a program that analyses the rules stated on a specific language, and automatically creates a proper evaluation for each game position. The most successful system using this approach is the commercial application Zillions of Games. These two systems do have very distinct performances, they may have very good results for games that are fit for their meta-evaluation function, and play very poorly for other cases.
The learning approach gives the least possible amount of specific knowledge for a certain game, and uses mechanisms of optimization to improve game play. Some of them insert a lot of knowledge into the set of features, and the learning is reduced to the optimization of relative weights for those features. Examples of this approach are Samuel's self-learning player of the 50's and 60's dealing with Checkers, and Michael Gherrity's SAL [thesis].
Another possibility is to leave that task to the program, instead of inserting an expertise account of good features (even if the relevance of each feature is left undefined). A consequence is that unexpected features for a given game may appear, providing an extra dimension of knowledge acquisition. Related work can be found in Levinson's Morph II, Utgoff's ELF, Buro's GLEM, and Fawcett's Zenith [thesis].
Yet another approach was given by Susan Epstein's HOYLE, a program that decides its next move based on a previous agreement among several heuristics, each one with a certain concern in mind (each heuristic may have different ways to compute its knowledge).
The evolutionary or A-Life approach attacks the problem by using the genetic paradigm to evolve decisional structures, which from generation to generation, hopefully improve their way to play a game. A good example is David Fogel's work on evolving neural networks for playing Checkers. Another related work (using a special neural network architecture) is SANE, where two different populations co-exist, the hosts trying to improve their game strength while a population of parasites improve in order to defeat the hosts, providing a constant "aggressive" background from where the hosts must evolve. Another game tried with evolving networks is Dots'n'Boxes.
General information can be found in the next links [1, 2]. Check also the Computer Games Group, in the Maastricht Univ. - Netherlands, and the Computer Games Research Institute at Shizuoka Univ. - Japan.
A scientist would rather use someone else's toothbrush
than another's scientist's nomenclature.
Inside LUD∆, there will be a set of agents, called players, able to perform several contests of a given game. A tournament is a set of contests between a set of players, where all players play against the others.
A game consists of a set of gaming material (the board and the piece set), the rule set, i.e., a set of rules (which defines the set of valid moves for a given game state), the endgame function (a function that defines when the game ends) and an initial setup (i.e., the initial state for every contest). A game state is defined by a certain board position, the next player and some extra information (e.g., the number of off-board pieces, the actual game phase, number of captured pieces, ...) visible to both players. More formally:
- The board is a set of labels, or cells, and a set of ordered pairs of cells, or links
- Each cell is associated with a value, which defines if it's empty or occupied by a certain piece (white/black soldier/king)
- A board may, optionally, have labels associated with set of cells, called sectors.
- A position is a board with a set of values associated for each cell
- A state consists of:
- A board position
- Which is the next player
- An optional set of registers defining extra information.
- The set S is the set of all states for game G.
- S0 is a special state called the game setup
- The rule set R is a relation SxS
- Each pair (Si, Sj) in R is called a legal move
- S is a valid state iff S = Rn(S0), i.e., if there is a set of n>=0 legal moves that reaches S from the setup
- All valid states SF from which there are no (SF, S) in R are called final states.
- A game allows passes if R is reflexive
- A game is invertible if R is symmetric
- A game has a super-KO rule if R is transitive
- The endgame function: S -> NxN is defined has:
- endgame( S ) = <0,0> if S is not a final state
- endgame( S ) = <1,1> if S is a final state and a win for the first player
- endgame( S ) = <1,0> if S is a final state and a draw for both player
- endgame( S ) = <1,-1> if S is a final state and a win for the second player
- The rule set R is also called the game tree
- The set of legal moves starting at S0 which are known and considered as good or bad moves, is called the opening theory of the game
- The set of valid states from which it is known all legal moves until some final state is reached, is called the endgame theory of the game
- The set of all other valid states is called the middlegame.
- An action is a function S->S.
- An action is an atomic change on the game board (e.g., insert/remove a piece or cell, change a game register, ...).
- A set of actions A1, A2, ..., An define a legal move for a given state S, iff (S, A1oA2o...oAn(S)) belongs to R
- A game policy is a function P:S->An that decides for each valid state, what should be the next actions.
- A tactic is an utility function returning a real for each valid state.
- A strategy is a weighted sequence of tactics.
- The policy then uses one or more strategies to maximize the expected value of the next state, and so to decide which will be the next move, i.e., P(state) = arg maxactions (strategy(actions(state)))
In most games, the rule set R cannot be stated explicitly (the game tree may be infinite or extremely large), so there must be a implicit description to define R. A low-level description focuses on coding the procedure validMoves(state), which gives the set of valid moves from a given legal state. A high-level description focuses on defining a language to represent rules (Zillions' ZRF is the most used and successful example).
LUD∆ is a system which tries to produce knowledge between cooperation and competition. This is the basis for the two main structures, the Arena and the Academy. The Arena is where all tournaments occur; the Academy is where all knowledge collected by all previous tournaments is kept and made accessible to all existing players for learning and improvement.
Inside the Arena, at each tournament, there is a set of players. These players may have different ways of decision making (decide which is the best move among all possible options) and learning (update the parameters of the decision structure). This plethora of different attitudes of playing a game seems useful to enhance the global performance of the system. In any optimization problem, the avoidance of local minima is a main concern to achieve better results. With the use of different searching strategies, some local minima may be easier to escape by some players than others, who will, eventually, be able to send out that information to others, via the Academy.
After a fixed number of tournaments (or just one tournament, if there is no learning involved), the Arena activates a selection mechanism, favoring statistically the best performances. The next generation of players will be found among those best entries of the previous tournament, plus some new players that are genetic combinations and mutations of them (at least, for those players to whom the combination or mutation of their internal structures makes sense).
The Academy is populated with knowledge gathered by those players. There are many open questions in this subject:
- What should be the criteria to insert/remove knowledge?
- Drastic increase/decrease of performance after some change in the player's control structure
- How to quantify the quality of each piece of knowledge? (the number of uses, the number of successes, in this case, the value of a new tactic should be proportional to victory increment)
- A forgetting value for unused knowledge, updated on each generation
- What type of knowledge can be kept?
- The opening theory, or Fuseki
- The endgame theory
- Know patterns should be merged if possible, to achieve more generalized ones.
- Local/middle game pattern moves (tactics), or Tesujis
- Use patterns to represent them?
- Does it make sense to search for regular piece patterns when analyzing a contest? (e.g., is it good to have a T pentomino on game G?)
- How to extract Tesujis from expert games?
- How to generalize two similar Tesujis?
- Feature evaluators (heuristics) and decision mechanisms (strategies)
- All players for historical or even genetic purposes (?)
- What is the representational language used?
- A language able to be translated to and from the players own languages (an Esperanto)
- All knowledge is the same, or is different?
- Can expert Human knowledge be added online?
- Can knowledge be mixed? How?
A player is able to perform a set of contests for a certain game. It is made of:
- A communication module to talk with the Arena (and thus, with other players)
- A communication module to talk with the Academy (to send/receive commands)
- Needs to know how to translate his knowledge to the Academy language
- A search mechanism over the game tree (alpha-beta, ...)
- A game policy
- The policy internal structure is defined by the player's type (it may use genes, neural nets, expert features, ...)
A player should:
- know the game rules in order to create the set of valid moves for a given state (i.e., it must be able to create the game tree if given sufficient time)
- be able to decide between the options he has.
- (optionally) be able to learn, so that it can (hopefully) improve its own ability to decide
- (optionally) be able to analyze the game tree, independently of actual game experience.
The last two points refer to opposite ways to gather data. Analysis is an active exploration of the game tree, which produces knowledge without any experience, in response to eventual game states. Learning is passive exploration of the game tree, which produces knowledge in response to occurring game states. In principle, both methods could gather the same information if sufficient resources were allocated, but each has its weaknesses. Even a deep analysis of the game tree may not be sufficient to find specific problems that only learning experience against an expert can come upon. On the other hand, learning by making a bad move that cannot be reverted is worse, if the player would have been able to predict it and avoid it in the first place. Analysis can create useless knowledge about situations that will only occur very rarely in real gaming. Learning only from games against novice players, could be bad in the long term.
When speaking of learning, the first two points above are included in what is called the weak theory of game domains (check Epstein's work for more details). In principle, a learning player would not even have to know the rules, it could play randomly at the beginning and learn the game tree (i.e., the set of legal moves) by experience. However, this is a major waste of computer resources and a questionable way to learn a game (no human learn a game this way). A more relevant point is to decide what else a player should know. Should it know some "general" heuristics (like mobility, capturing potential to compute piece value)? And Alpha-Beta? Specific game heuristics (e.g., corner cells are good in Othello)?
In principle, a weak theory should give the minimum and most general knowledge in order to support the acquisition of valuable and specific knowledge. Learning should be helped by this information, but not directed by it. In this framework, the player should learn how to use the given weak theory of meta-games for the specific game it plays. But computational resources are bounded (too bounded for most cases) and a compromise must be achieved between the minimum theoretical information (i.e., no information at all) and the minimum possible without destroying the player abilities. [This is still a very open question, at the present initial stage of LUD∆]
An important point is the following. Until now, the related works (mentioned early) have not mixed active and passive game tree exploration; analysis and learning do not cooperate in building the player's game strength. If a high-level description of the rules is available (even if it is harder to implement and has slower execution), it gives the player the possibility to check some game properties (e.g., piece power or goal type), and this knowledge can be used to adapt the learning structure, before the learning even begins! This attitude of planning before start playing, probably implies a more effective information acquisition. How much pre-built knowledge should be inserted on the making of this planning structure is a very relevant and (as far as I know) a new question (the answer should on the minimum possible expert information, without seriously compromising performance).
If a high level description is not available, the player can even so, be able to analyze the next features:
- The board topology (number of cells, average links per cell, ...).
- The average number of moves for each piece type; this will give an expected piece value.
- Is the game symmetric? By analyzing the setup and random positions while applying validMoves(), it would be possible to achieve a high degree of certainty about this feature. But this should be a part of the weak theory domain.
- In this case, the player should use two distinct evaluation functions (one for each color).
The player decision mechanism, also called policy, is based on strategies, i.e., a weighted sequences of tactical evaluations (or just tactics). The better player has a better policy, even if it may have the same tactical knowledge of a worse player (it is the way he chooses from them, that's better). An optimal policy identifies a path from the initial game state to the maximum final state value achievable, taken a certain opponent's behavior. A possible relevant point is that, even in symmetric games, it is known that perfect play leads to the victory of first or second player (or ending in a draw). Does this mean that players should have two different strategies, one when playing first, and another when playing second?
Mark Thomson's comment: In a losing game, the player should, ideally, play in such a way that its opponent has the greatest possible chance of making a mistake and losing his advantage. That would involve inferring the other playerís strategy, and setting up a situation where it would lead the other player astray. Alternatively, and more simply, the losing player could choose the move that delays loss for the longest time, on the assumption that this would tend to give the other player many chances to make a mistake
Each player's strength in a given game, is directly based on the performance of its strategy to play that game. How are we to compare different players' strengths? International game federations use established ratings to classify players by their compared performance in actual contests. For example, FIDE Chess ELO (proposed by Prof. Arpad Elo), uses a normal distribution and a formula. After a contest, the new ELO for the player is calculated by:
New ELO = Old ELO + C . (Score of the Game - Expected Score)
where C is a constant that decreases when the player's strength increases. Namely, C = 30 for players with ELO less than 2000, C = 10 for ELO > 2400, or otherwise C = 130 - (opponent's rating)/20.
The score is, as usual, 0 for a loss, 0.5 for a draw and 1.0 for a win. The expected score of a game is given by the normal distribution, with mean 0.5 (the value for the draw) and variance of 200. This means that two players with a difference of 200 points, the stronger will have an expected result of 0.84. A difference of 100 ELO, will result in an expected score of 0.69 for the better player. A difference greater than 426 will not earn ELO points for the better player (after rounding the decimal part). A fast way to compute this expected value is: Excepted Result = 1 / (1 + 10^(-D/400)), where D is the actual ELO difference between players.
The worst player possible is Random, which strategy is to choose randomly one legal move. This player, by definition, has a fixed ELO zero. Notice that a player with a misŤre strategy (i.e., trying to loose) will have eventually a negative ELO.
Let's assume that a strategy S2 is one playing level above strategy S1, when the expected score of S2 against S1 is around 2/3 (wins two out of three games), i.e., their ELO difference is approx. 100. If another strategy S3 is one level above S2, is it possible that the expected score of S3 against S1 is 0.84 (ELO difference is 200)? Is this true? Only if the relation is transitive, which it probably is not. Playing strength only makes sense within a pool of players in a tournament. In this condition, ELO gives an average of the expected score, when compared with the players of that tournament. If the number of players is very big - like in Chess or Go worlds - each player is only able to perform a restricted number of contests, giving an expected value for its ELO.
A strategy is stable if when facing the same pool of players, its ELO is the same within an interval of 50 ELO points; otherwise, it's unstable.
A move is reasonable for a player with ELO E, if it would be chosen by another player with a strategy with ELO > E - 100. A move is good if it would be chosen by another player with a strategy with ELO > E + 100. Otherwise, it is a poor move. Obviously, this refers to a specific tournament's point of view (a good move may be considered bad because of poor analysis or just because they are all bad players).
A tactic, i.e., a weighted feature within a strategy, is essential (harmful), if its removal will decrease (increase) the strategy performance by one level (ELO increase/decrease of 100). It is critical (destructive), if the increase (decrease) is greater than 200 ELO points. A tactic is redundant if its removal does not affect the strategy strength. If a tactic removal changes the overall strategy strength for every tournament, it's structural.
This section presents some proposals for qualitative and quantitative properties to define and classify abstract games.
- The dimension of the search space, i.e., the number of possible positions inside the game tree, can be used to define the potential game complexity.
Size( G ) = log10 ( NT )
where NT is the number of nodes of game tree for G.
- The potential number of levels of expertise that can be found by different strategies.
Depth( G ) = E / 100 + 6 / log10( PL )
where E is the ELO of the best known player, and PL is the greater number between 10 or the number of existing human players of G. The use of log is two-fold, a crude approximation of the number of players is sufficient, and the value added to E, slowly decreases as PL increases (the progression of expertise does not seem linear with the number of players). The number 6 points that when we get a population of one million serious players, the best one should be, approximately, one level below the perfect player.
- The branching factor of the game, i.e.,
Width( G ) = The average expected number of legal moves of a valid position
This measure can be statistically approximated by automatic search on actual contests.
- The timing between actual interaction (i.e., the possibility to capture or the entering on territory with direct enemy influence) between reasonable adversaries.
Tempo( G ) = The average number of moves to piece interaction
This measure can be statistically approximated by game analysis of actual contests against players positioned in the top 25% ELO.
- The overall piece mobility
Mobility( G ) = The average (number of moves per piece * percentage of empty cells)
This measure can be statistically approximated by automatic search on positions from actual contests.
- The rules simplicity may also be a measure of a game's quality (especially when judged with the game's depth). We can use Kolmogorov notion of complexity in this measure (for games with random elements, the Shannon notion of entropy should also be useful).
Clearness( G ) = The size of the smallest programs that compute validMoves( ) and endGame( )
In this context, the informal notion of a game overall strategy may be related with the algorithmic size of the (usually not available given bounded resources) function findOptimumMove( ). Another point, the precise number given by this definition of Clearness is uncomputable (!) We are only able to produce an estimation of its true value.
These measures deal with the relevance of the overall player's ignorance about the game tree. Their exact values would only be computable if the entire game tree analysis were possible.
- After a positional or material advantage, is it possible to secure that advantage in order to win? Or is easy to reverse it, to achieve a draw?
Decisiveness( G ) = The average percentage of positions with positional advantage for which there is a winning strategy.
A positional advantage is defined by a positive evaluation using the strategy of the best known player. If decisiveness approaches zero, it means that the game tree endgame states are densely covered with draws. This implies that a winning state can just be achieved with the help of both players (i.e., with a bad enemy move), meaning that the game is drawish, and thus, flawed.
A game with good decisiveness should easily be able to reach pre-endgame positions, where the positional or material advantage of those positions are enough to win the game with reasonable playing. Using Go terminology, a sensible sente/gote unbalance should be enough to determine the winning outcome of a contest.
- After a positional advantage, is there any reasonable chance of recovering?
Drama( G ) = The average percentage of reasonable moves that lead to a balanced position, given a positional advantage.
Drama is, in a sense, is the opposite of decisiveness. Poor drama leads to quick and decisive action; high drama should cause unexpected results, despite any positional superiority. But their definitions are not mutually exclusive, games may have excellent decisiveness, even though only very good players can profit from it!
- How easy is it to define a tactical and strategic plan?
Clarity( G ) = the average depth of the sub-tree necessary to find a reasonable move / log10 (the average number of nodes of that sub-tree)
So, clarity tells us about how far a strategy must search down the tree to identify reasonable moves. For the same number of nodes, if the sub-tree is thinner (i.e., deeper), it means better clarity (less lateral analysis is needed). The use of the logarithm is just to avoid numbers very close to zero.
Using some poetic license, the lack of clarity is the fog of war that clouds the abyss of perfect strategies.
Another interesting property is Topologic Invariance (TI) which relates the game playability with the varying size of different boards. Perhaps TI is a factor that should be added to each measure, indicating how they suffer accordingly when the board is changed.
I wish to thank David Ploog for a fruitful discussion on criteria for game classification, and to refer the notions of Tempo and Hardness (another name for Decisiveness). There are some open discussions about these subjects with Claude Chaunier, John Lawson and Jean-Pierre Queille, which contributed a lot to this document. Also a big thanks to Cameron Browne for his insightful thoughts about game tree complexity and to Mark Thomson for his excellent article "Defining the Abstract" and also for reviewing this text. I also wish to acknowledge all the scientific work referred to herein, that helped me with terminology and conceptual tools to elaborate LUD∆.
And I do not want to forget all the seeds ( o o o ) and spiders ( x x x ) that populated thousands of email games in the last 10 years! Finally, long live the Norwegian Blue!
Jo„o Pedro Neto (c) 2002