Poker AIs can usually compete well with human opponents if the game is limited to only two players. Now, Carnegie Mellon University and Facebook researchers have raised the bar with an AI-baptized Pluribus, in which 1
Five AI copies each played more than 5,000 hands and faced two top professional players: Chris "Jesus" Ferguson, six-time world champion Series of Poker Events and Darren Elias, who currently holds the record for most World Poker Tour titles , Pluribus defeated them both. The same thing happened in a second experiment in which Pluribus played five professionals out of a pool of 13 human players for every 10,000 hands.
Carnegie Mellon University co-author Tuomas Sandholm has tackled the unique challenges poker poses for AI for 16 years. No-Limit Texas Hold'em is a so-called Imperfect Information game because there are hidden cards (held in the hands of the opponents) and there are no restrictions on the amount of bet you can make. In contrast, chess and lot status of the board and all characters are known to all players. Poker players may occasionally bluff (and do).
Claudico sired Libratus
In 2015, Sandholm's early version of a poker-playing AI, Claudico, featured four professional players head-up in Texas Hold'em – where only two players are in hand – at a Brains vs , Artificial Intelligence Tournament at Rivers Casino in Pittsburgh. After 80,000 played hands over two weeks, Claudico has not quite reached the statistical threshold for explaining the victory: the margin must be large enough to ensure 99.98% that the AI's victory is not accidental.
Sandholm et al. . In 2017 another KI named Libratus followed. This time, the AI did not focus on taking advantage of their opponents' mistakes, but on improving their own game – apparently a more reliable approach. "We've tried to close gaps in our own strategy because it makes our own game safer and safer," Sandholm told IEEE Spectrum at the time. "When you exploit your opponents, you are increasingly open to exploitation." The researchers also increased the number of games played to 120,000.
The AI prevailed despite the fact that the four human players were trying to conspire against it, and coordinated to achieve strange stakes to confuse Libratus. Ars' Sam Machkovech wrote at the time: "Libratus has won against 120,000 combined poker hands against four human online poker pros, Libratus & # 39; s $ 1.7m combined with so many hands The main role: Victory with statistical significance. "
But Libratus still played against another player in heads-up action. A much more challenging puzzle is playing poker with multiple players. Therefore, Pluribus builds on this previous work with Libratus and introduces some important innovations that enable it to develop winning strategies in multiplayer games.
Sandholm and his former graduate student Noam Brown, who is currently working on his doctorate with the Facebook Artificial Intelligence Research (FAIR) group, uses "Action Abstraction" and "Information Abstraction" approaches to reduce how many different Actions the AI must take into account in the development of its strategy. Whenever Pluribus reaches a point in the game he has to act on, he forms a subgame – a representation that, according to Sandholm, provides a finer abstraction of the real game, similar to a blueprint.
"It goes back some actions and does some sort of game-theoretical thinking," he said. Each time, Pluribus must develop four continuation strategies for each of the five human players through a new search algorithm with limited lookahead. According to Sandholm, this results in "four to six million different continuation strategies in total".
Like Libratus, Pluribus does not use any poker-specific algorithms; It simply learns the rules of this imperfect information game and then plays against itself to develop its own winning strategy. Pluribus has therefore found out by itself that it is best to develop a mixed strategy of playing and the unpredictable – the conventional wisdom of today's human top players. "We did not even say:" The strategy should be set at random ", Sandholm said. "The algorithm automatically found out that it should be randomized in what ways and with what probabilities in which situations."
Pluribus actually confirmed a bit of conventional poker wisdom: it's just not it's a good idea to "limp" into a hand, that is, call the big blind rather than fold or close increase. The exception, of course, is when you are in the small blind and the mere call costs half as much as the other players. While human players generally avoid so-called "donk-betting" where a player ends a round with a call, but the next round begins with a bet, Pluribus placed Donk bets far more frequently than his human opponents.
"In a sense Pluribus plays just like humans," Sandholm said. "In other ways, it plays completely Martian strategies." In particular, Pluribus makes unusual use sizes and is better at randomization.
"Its main strength is the ability to apply mixed strategies," said Elias. "It's the same thing people try, it's a matter of execution for people – in a completely random and consistent way, most people simply can not."
"These AI's really showed there is a whole extra depth in the game that people did not understand."
"It was incredibly fascinating to play against the poker bot and see some of the strategies which he has chosen, "said Michael Gagliano, another participant poker player." There were some games that people do not even do, especially in terms of betting size. Bots / KI play an important role in the development of poker and it was amazing to have first-hand experience in this big step towards the future. "
This type of AI could be used to develop drugs that could, for example, pick up antibiotic-resistant bacteria or enhance cyber security or military robotic systems." Sandholm cites multi-party bargaining or pricing – such as Amazon, Walmart, and Target – as a specific application Sandholm has already licensed much of the poker technology developed in his laboratory to two startups: Strategic Machine and Strategy Robot Games and Other Entertainment Applications Strategy Robot focuses on defense and intelligence applications.
When Libratus defeated human players in 2017, concerns remained as to whether poker would still be a skill el and whether in particular online games will soon be dominated by camouflaged bots. Some agreed that Libratus needed huge supercomputer hardware to analyze the game and find out how to improve the game: 15 million core hours and 1,400 CPU cores during the live game. However, Pluribus requires far less processing capacity and can complete its blueprint strategy in just eight days with only 12,400 core hours and 28 cores in live play.
So, is this the killing blow for skill-based poker? Well, the algorithm was so successful that the researchers decided not to release their code. "It could be very dangerous for the poker community," Brown told Technology Review.
Sandholm takes the risk of demanding bots appearing in online poker forums, but the destruction of poker has never been his goal, and he still thinks it's a game of skill. "I loved the game because these AIs really showed there was a whole extra depth of play that people did not understand, even brilliant professional players who played millions of hands," he said. "So I hope this adds to the excitement of poker as a leisure game."
DOI: Science, 2019. 10.1126 / science.aay2400 (About DOIs).
Listing image by Steve Grayson / WireImage / Getty Images