Although games of skill like Go and chess have long been touchstones for intelligence, programmers have gotten steadily better at crafting programs that can now beat even the best human opponents. Only recently, however, has artificial intelligence (AI) begun to successfully challenge humans in the much more popular (and lucrative) game of poker.
Part of what makes poker difficult is that the luck of the draw in this card game introduces an intrinsic randomness (although randomness is also an element of games like backgammon, at which software has beaten humans for decades). More important, though, is that in the games where computers previously have triumphed, players have "perfect information" about the state of the play up until that point.
"Randomness is not nearly as hard a problem," said Michael Bowling of the University of Alberta in Canada. In fact, "the techniques developed for perfect-information games don't work in imperfect-information games."
Perfect information games have largely been conquered using machine learning techniques that explore a huge number of ways the game could proceed to infer the best move for any situation. With poker, as in much of the real world, opponents know facts that you do not; the challenge is to infer something about this hidden knowledge from their actions, even as their actions are guided by their assessment of your hidden knowledge, and so on. Such recursive logic cannot be analyzed with simple machine learning.
Instead, effective poker programs must exploit the techniques of game theory, the mathematical framework for analyzing the rational response of multiple actors whose actions affect each other. Game theory is also useful in a broad range of non-recreational situations, such as modeling evolution, negotiating trade deals, and military strategy. The goal is to assess systematic strategies by explicitly considering strategies other participants might use.
In a simple example, in the children's game Rock-Paper-Scissors (familiar to some as Rochambeau), two players each choose one of these three items, within the rule that rock beats scissors, which beats paper, which beats rock. A pure strategy, say always choosing rock, is a clear loser since the opponent can beat it with "always choose paper." However, this strategy is itself vulnerable to "always choose scissors"; there is a trade-off between exploiting a foolish opponent and being vulnerable oneself. Even for this deterministic game, the best strategy includes probabilistic decisions, choosing among the three items at random.
In this simple case, a strategy is a table that lists, for each possible situation, the ideal probability to choose different actions. For complex games addressed by computer programs, the strategy is effectively the algorithm, which may encompass on-demand analysis of configurations that have never been encountered before.
Among the many forms of poker, one ingredient common to most is that players compete with hands comprising five cards, generally with some cards hidden from other players. Hands are ranked by whether they contain certain combinations, ranging from common ones like a pair to the highest hand, a royal flush, containing the five highest cards in a single suit. Estimating the probability that an opponent will randomly have a better hand, even when cards remain to be dealt, is complex but straightforward for a computer.
Comparing hands is just a small part of the game, however, and as the game progresses, the opponent's choices mean their hand is no longer random. The crux of poker is betting, which generally unfolds in multiple rounds as the players learn about new cards and reassess their odds of winning—and what they choose to signal about those odds. At each round, every player in turn can just match the previous bets by contributing to the common pot, bet even more money, or abandon the hand (fold) to avoid throwing good money after bad. At the end, the best hand among the remaining players wins the pot.
"Bluffing is a mathematical phenomenon that automatically emerges in the output of these algorithms."
Getting more money requires deception about the strength of one's hidden cards. A player with an excellent hand may bet timidly to avoid scaring other players out of keeping up with the bets. Conversely, a player with a poor hand could benefit by (sometimes) bluffing with aggressive bets to get others who might have had a better hand to fold. Success requires extracting information about the other players' hidden cards from this potentially deceptive information.
The face-to-face aspects of deception are well known, but in computer and on-line poker, all signaling is done with bets. "A lot of people think that bluffing is somehow a psychological phenomenon," said Tuomas Sandholm of Carnegie Mellon University. "That's not it at all. Bluffing is a mathematical phenomenon that automatically emerges in the output of these algorithms."
Computer scientists have focused on the two-person "heads-up" version of the very popular Texas Hold'em poker. This game begins with players receiving two hidden cards. After a round of betting, additional (community) cards are revealed to all players in a group of three, then one more and then another, with rounds of betting each time. Each player constructs their best hand by combining their hidden cards with community cards.
Researchers study two variants of this game with different betting rules. The Annual Computer Poker Competition (ACPC) started in 2006 with the "limit" version, in which bets are fixed in size and the number of raises permitted is limited. Despite these restrictions, there are more than 1014 distinct potential decision points in this game. This makes it smaller than checkers, but incomplete information makes it much harder.
In 2015, Bowling and his colleagues declared limit Hold'em solved by their program Cepheus, which uses a variety of techniques to make the problem manageable, although it still required more than two months of processing with a system of almost 5,000 cores. In game theory, a game is "weakly solved" by identifying a strategy that does as well as any opposing strategy, starting at the beginning of the game. (It is "strongly solved" if it can also later recover from previous non-ideal play.)
For games with randomness, however, any strategy will sometimes lose. It can only be assessed by averaging over many repetitions, and even then with limited precision. Bowling and his colleagues suggested that limit Hold'em was "essentially weakly solved" in that any imperfection in the strategy would be undetectable over the equivalent of a human life-time of full-time play.
In recent years, researchers have extended their attention to "no-limit" Hold'em, which "has become the main challenge problem and benchmark in the AI community for testing these algorithms," Sandholm said. This version allows players to place arbitrary bets on each round, with no limit on the number of rounds of betting. As a result, there are some 10161 possible situations a player might face. "You can't store the game, and you can't go through it even once," Sandholm said. "So you need different kinds of techniques."
Poker is a common metaphor for complex situations where actions can communicate intentions or private information, but also have direct consequences.
With student Noam Brown, Sandholm used their application-independent platform to create a program called Libratus, building on previous wins in no-limit Hold'em at ACPC. In a December 2017 article in Science, they showed the program beat four top human professionals in a 120,000-hand competition "with very high statistical significance." With so many games, there were many opportunities for the humans to find and exploit weaknesses in the computer player, but they did not. "In terms of level, it's like beating Kasparov with Deep Blue," or the other AI successes in Go and "Jeopardy," Sandholm said.
Among many innovations, he said the most novel features in Libratus were in the subgame solver, which analyzes situations in detail as they arise. The challenge is that, to avoid creating a strategy that would leave the program vulnerable in other situations, "subgames cannot be solved in isolation," Sandholm said. "What I do in this subgame depends on what I should do in another subgame." The need for consistency is somewhat similar to the assessment of bluffing, which is effective only if used sparingly (and unpredictably).
Poker is a common metaphor for complex situations in the real world, where actions can communicate intentions or private information, but also have direct consequences. For example, self-driving cars may need to know what to expect based on subtle motions of human drivers or pedestrians. Businesses and countries also may signal their intent through aggressive or accommodating actions.
"It feels deeply psychological," admitted Bowling, but the most successful no-limit programs do not include explicit modeling of opponents. For two-person Hold'em, where one player's loss is the other's gain, Bowling said, "if they make mistakes, then what do I care?" But in real-world situations, and even in three-person Hold'em, it becomes important to try to understand the other players' goals, which may be more complex than just the expected monetary return. In addition, "by not doing opponent exploitation, we're maybe not milking the most money from an opponent, especially from a weak opponent," said Sandholm.
"I'm not sure people appreciate how much more difficult [solving imperfect-information] games is," said Daphne Koller, who popularized a way of efficiently representing these games in the 1990s. The recent poker successes, however, required "development of a whole slew of sophisticated methods," she said.
Still, Koller, now at biotech startup insitro, cautioned, "It's likely that the techniques won't work for the real world that's rife with 'unknown unknowns'." She said it's important to ask, "can we identify a subset of interesting, relevant problems where the techniques actually work?" For his part, Sandholm thinks he has already found such real-world problems and has formed two startups to pursue them.
Bowling, M., Burch, N., Johanson, M., and Tammelin, O.
Heads-up limit hold'em poker is solved, CACM, Nov. 2017, http://bit.ly/2JLhZWr
Moravčík, M., et al.
DeepStack: Expert-level artificial intelligence in heads-up no-limit poker, Science, Mar. 2017, http://bit.ly/2JJQIDA
Brown, N., and Sandholm, T.
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals, Science, Dec. 2017, http://bit.ly/2rbeRfC
©2018 ACM 0001-0782/18/9
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
No entries found