Sign In

Communications of the ACM

Research highlights

Technical Perspective: Solving Imperfect Information Games


The study of games is as old as computer science itself. Babbage, Turing, and Shannon devised algorithms and hardware to play the game of chess. Game theory began with questions regarding optimal strategies in card games and chess, later developed into a formal system by von Neumann. Chess subsequently became the drosophila—or common fruitfly, the most studied organism in genetics—of artificial intelligence research. Early successes in chess and other games shaped the emerging field of AI: many planning algorithms first used in games became pillars of subsequent research; reinforcement learning was first developed for a checkers playing program; and the performance of game-playing programs has frequently been used to measure progress in AI.

Most of this research focused on perfect information games, in which all events are observed by all players, culminating in programs that beat human world champions in checkers, chess, Othello, backgammon, and most recently, Go. However, many applications in the real world have imperfect information: each agent observes different events. This leads to the possibility of deception and a wealth of social strategies. Imperfect information games provide a microcosm of these social interactions, while abstracting away the messiness of the real world.

Among imperfect information games, Poker is the most widely studied—the latest drosophila—due to its enormous popularity and strategic depth. The smallest competitively played variant by humans, and the most widely played by computers, is the two-player game known as Heads-Up Limit Hold'Em (HULHE), in which each player holds two private cards in addition to five public cards. Two decades of research in this game has led to powerful methods, such as counterfactual regret minimization (CFR), for approximating a Nash equilibrium. Several years ago, a program called Polaris—created by many of the authors of the following paper—defeated for the first time a human professional poker player in HULHE.


The methods used to solve poker are quite general, and therefore have potential applications far beyond this one game.


However, Polaris was still far from perfect; indeed, it turns out in retrospect that it was exploitable, due to the approximations it made, by a very large margin. The obvious remaining question was whether a "near- perfect" solution could be found—a strategy so close to a Nash equilibrium that it cannot be differentiated in a lifetime of play.

The following paper takes the CFR methods used in previous work to the next level. Using a number of innovations—and several hundred machine-years of computation—they were able to find a near-perfect solution to HULHE. Their solution also provides insights into the game itself, showing the dealer holds a significant advantage, and that seemingly poor hands should be played surprisingly often.

For the game of poker, the next step beyond HULHE is no-limit poker, which has a much larger action space. This too has recently been cracked, with the programs Libratus (from CMU) and DeepStack (also from Alberta) both defeating human professionals using variants of CFR—although a near-perfect solution remains out of reach. The final challenge would be the variant most widely played by humans: multiplayer no-limit poker.

The methods used to solve poker are quite general, and therefore have potential applications far beyond this one game. Many other imperfect information games played by humans, including a wide variety of card games, board games, and video games, are tractable to these methods. Furthermore, there are many real-world applications, such as auctions, negotiations, and security, in which agents receive different information, and must make a sequence of decisions to maximize a final pay-off—and therefore belong to the same class of imperfect information games as HULHE.

Solving a problem attains perfection in one domain. The frontier of solved domains is an incontrovertible measure of current computer capabilities. That frontier has now been extended by one significant step, to include for the first time a challenging imperfect information game.

Back to Top

Author

David Silver leads the reinforcement learning research group at Google DeepMind, London, and is lead researcher on AlphaGo.

Back to Top

Footnotes

To view the accompanying paper, visit doi.acm.org/10.1145/3131284


Copyright held by author
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.


 

No entries found