Machines have raised the stakes once again. A superhuman poker-playing bot called Pluribus has beaten top human professionals at six-player no-limit Texas hold’em poker, the most popular variant of the game. It is the first time that an artificial-intelligence (AI) program has beaten elite human poker players at a game with more than two players1.
“While going from two to six players might seem incremental, it’s actually a big deal,” says Julian Togelius at New York University, who studies games and AI. “The multiplayer aspect is something that is not present at all in other games that are currently studied.”
The team behind Pluribus had already built an AI, called Libratus, that had beaten professionals at two-player poker. It built Pluribus by updating Libratus and created a bot that needs much less computing power to play matches. “A lot of AI researchers didn’t think it was possible to do this using the kinds of techniques we’re using,” says Noam Brown at Carnegie Mellon University in Pittsburgh, Philadelphia, and Facebook AI Research in New York, who developed Pluribus with his Carnegie colleague Tuomas Sandholm.
Other AIs that have mastered human games — such as Libratus and DeepMind’s Go- and StarCraft II-playing bots — have shown that they are unbeatable in two-player zero-sum matches. In these scenarios, there is always one winner and one loser and game theory offers a well-defined best strategy.
But game theory is less helpful for scenarios involving multiple parties with competing interests and no clear-cut win–lose conditions — those that reflect most real-life challenges. By solving multiplayer poker, Pluribus lays the foundation for future AIs to tackle more complex problems of this sort, says Brown. He thinks that their success is a step towards applications such as automated negotiations, better fraud detection and self-driving cars.
To take on six-player, no-limit Texas hold ‘em, Brown and Sandholm radically overhauled Libratus’s search algorithm. Most game-playing AIs search forwards through decision trees for the best move to make in a given situation. Libratus searched to the end of a game before choosing an action.
But the complexity introduced by extra players makes this tactic impractical. Poker requires reasoning with hidden information — players must work out a strategy by considering what cards their opponents might have and what opponents might guess about their hand based on previous betting. But more players makes choosing an action at any given moment more difficult, because it involves assessing a larger number of possibilities.
The key breakthrough was developing a method that allowed Pluribus to make good choices after looking ahead only a few moves rather than to the end of the game.
Pluribus teaches itself from scratch using a form of reinforcement learning similar to that used by DeepMind’s Go AI, AlphaZero. It starts off playing poker randomly and improves as it works out which actions win more money. After each hand, it looks back at how it played and checks whether it would have made more money with different actions, such as raising rather than sticking to a bet. If the alternatives lead to better outcomes, it will be more likely to choose theme in future.
By playing trillions of hands of poker against itself, Pluribus created a basic strategy that it draws on in matches. At each decision point, it compares the state of the game with its blueprint and searches a few moves ahead to see how the action played out. It then decides whether it can improve on it.
Because it taught itself to play without human input, the AI settled on a few strategies that human players tend not to use, such as “donk betting” — starting a round by betting or raising, when you ended the last betting round with a call, which is to match another player’s bet.
Pluribus’s success is largely down to its efficiency. When playing, it runs on just two central processing units (CPUs). By contrast, AlphaGo, the original version of DeepMind’s AlphaZero, used 1,920 CPUs and 280 graphics processing units to run its search algorithm when it first beat the world’s top Go player in 2016. Libratus used 100 CPUs in its 2017 matches against top professionals. When playing against itself, Pluribus takes around 20 seconds to play a hand — which is roughly twice as fast as professional humans.
Games have proved to be a great way to measure progress in AI because bots can be scored against top humans — and objectively be hailed as superhuman if they triumph. But Brown thinks that AIs are outgrowing their playpen. “This was the last remaining challenge in poker,” he says.
Still, Togelius thinks there is mileage yet for AI researchers and games. “There’s a lot of unexplored territory,” he says. For a start, few AIs have mastered more than one game, and to do so would require them to demonstrate a general ability rather than a niche skill. AlphaZero has taught itself to play Go, chess and shogi — a form of Japanese chess — but only one at a time. For example, a neural network trained by AlphaZero to play Go cannot play chess, and vice versa. In other words, a single instance of the AI cannot play Go, chess and shogi, as a human could.
And, of course, there’s more than simply playing games, says Togelius. “There’s also designing them. A great AI challenge if there ever was one.”
Article credit to: http://feeds.nature.com/~r/nature/rss/current/~3/K_9rfItOvCE/d41586-019-02156-9