Feature

When AI Meets Its Match

Today’s poker bots can crush even the best human players. Still, the game—one of bluffing, deception, and intention—remains technically unsolved.

By Maria Konnikova • Illustration by Camilo Huinca

The emerald beast is begging me to engage. And I, of course, take the bait. With a single motion, I bring the creature to life. It throws out its first punch. I parry with a raise of my own. It defends itself. I slash again. It strikes back with even greater force. But I have plans for a counterattack and I rush forward. My furor finally cows him and he admits defeat. Meek. Deflated. Beaten. At least until the next hand.

Our weapons are cards. Our battlefield, a virtual poker felt. And my opponent’s name is Slumbot—a poker bot that was, up until 2018, when the last Annual Computer Poker Competition was held, the world’s toughest virtual AI opponent in Heads-Up No-Limit Texas Hold’em. And though that competition is no more, Slumbot remains the benchmark program against which all future poker AIs will test themselves. All future poker AIs, that is, and me: the AIs, in the service of technological breakthroughs into the very nature of human decision-making; me, in the service of journalistic exploration. After I finish battling the Slumbot, I will move on to its most feared nemesis, a wily model formerly known as Ruse but now called GTO Wizard AI—the current gold standard for optimal poker play.

Against Slumbot, I may stand a chance. Against GTO Wizard AI, I’m certain to lose, by definition—I’ll be playing on its own platform, judged by its own standard of perfection. But that doesn’t mean I’ll lose altogether. After all, AI researchers have yet to fully solve poker—and, as I came to find, even against the toughest bots, humans have a certain advantage. Though AI can sometimes mimic some human emotion, it lacks the intuitive grasp that defines the human decision-making experience in the face of limited data—those nuances of behavior that can, at a moment’s notice, change the entire tone and direction of a battle of wits. Even a battle with as much mathematical precision as No-limit Texas Hold’em.

Poker isn’t the first game AI has tried to solve. In 1989, a program called Chinook began to churn out a series of computations that would, at their peak, occupy more than 200 computer processors around the world. It was one of the longest-running computations of all time. The end result was announced in 2007: Chinook had solved the game of checkers. It had crafted an AI-driven approach that would never lose against any opponent.

There’s a reason why AI researchers have set their sights on games: They have rules. They have a fixed, defined world. Even the most complex, sophisticated games are cleaner and less noisy than life. But with all their rules and stipulations and neat parameters, they still have the element that is most true to the real world: humanity. And that’s what makes them such a powerful proxy for studying real-life decision-making.

In the 1950s, John von Neumann, a polymath mathematician best known as the father of game theory, proposed that the true prize lay not in the world of checkers and other perfect information games, but in the world of imperfect information games, where, as in life, the unknown was just as crucial as the known, if not more so. The most lifelike game of all? Poker. The game of human intention and bluffing and emotion and seemingly endless recursive thinking. As von Neumann saw it, games like poker were much more than games. If you could tackle them, they would help form a rubric for taking on the thorniest problems of humanity.

But though many researchers took up the challenge to solve these infinitely more complex puzzles, for decades none came close. To date, even perfect information games, such as chess and Go, haven’t been solved in the technical sense of the word. The AIs can beat the best humans, consistently, but they are unable to enumerate every possible situation that may arise in the game tree—a necessity for a real solution. So how could any researcher hope to conquer poker? Certainly not by the brute force approach that had cracked checkers.

In poker of the No-limit Hold’em variety—the most popular form in the world, during which a player can bet any amount, up to her entire holding—the number of possible situations is greater than the atoms in the universe. Add to that mathematical unwieldiness the very human nature of the game, and you have a problem of compounding difficulty. How can an AI parse the shifting emotional dynamics of a table? How can it fight back if a few human opponents decide to single it out and collude, even on an unconscious basis, against it? (While outright collusion is illegal, subconsciously altering your play to single out the “other” at the table, be it an AI or a human outsider, is far from rare.)

For Michael Bowling, that very complexity was the draw. In poker, he saw a problem that neither checkers nor chess could approximate: how other humans respond. “You can’t ignore other agents in poker,” he says. “You need to know how everyone is going to behave.” In other words, exactly what von Neumann had proposed decades earlier.

Working as the head of the Computer Poker Research Group at the University of Alberta, Bowling started with a more manageable poker variant: Heads-Up (that is, one-on-one) Limit Texas Hold’em (for which the amount and number of bets is limited). It took many years, but at last he had it: a program called Cepheus that could decisively beat the game. Its main breakthrough was an algorithm known as CFR, or counterfactual regret minimization. The algorithm compares all future actions to determine which approach will cause the least amount of regret—that is, that no other possible action would have led to a better outcome. To date, Limit Hold’em is the largest imperfect information game that AI has been able to solve—and it is orders of magnitude simpler than Heads-Up No-Limit, which, in turn, is orders of magnitude simpler than Multiplayer No-Limit.

The complexity, however, is the allure—and the reason that Bowling next set out to conquer Heads-Up No-Limit poker. And while he didn’t solve it in the same way he could solve its Limit variant, he did deploy an important new tool: a neural network that could decompose the bigger game into smaller, subgame problems and recalculate an appropriate strategy at every step. The new program, DeepStack, was able to defeat 11 poker professionals over the course of 44,000 hands. That was still for Heads-Up only, and it left the game far from solved, but DeepStack was nearing superhuman ability—and reached it, at least as far as some humans were concerned.

At Carnegie Mellon University, a team led by Tuomas Sandholm was approaching Heads-Up No-Limit from a different angle: Rather than use neural networks, they would start with sophisticated abstractions of the game. Claudico, their first bot, failed miserably. Noam Brown, then Sandholm’s Ph.D. student, who programmed most of the bot’s algorithms, eventually determined the problem: Whereas the human players would sit and think, the bot would act immediately. It had spent countless hours training in advance, playing trillions of hands on a supercomputer, and would use that training to act instantaneously.

Brown decided to program in-game thinking into the bot’s abilities (something DeepStack did as well). The result was Libratus, a bot that looked at the subgame during play and recalculated its strategy accordingly. When Libratus challenged the top humans to a match, it was much better prepared. What’s more, every night, it would hook back up to CMU’s supercomputing center, analyze how the humans had played, and adjust its strategy. The result, in 2018, was a decisive victory, one that took DeepStack’s win to the next level.

And then came the next milestone: multiplayer poker. After the success of Libratus, the CMU team turned to six-max poker, a variant with six players. The new algorithm had one major difference: Instead of just solving a subgame, it would solve a depth-limited subgame. “It can start solving the game when you are already in the game, not from beginning to end,” Sandholm says. This was Pluribus, a bot that performed quite well against some humans in a multiplayer format. CMU declared a victory.

I haven’t come to my matches against Slumbot and GTO Wizard AI, my two poker AI nemeses, empty-handed. Before playing, I consulted with Kevin Rabichow, one of the best Heads-Up players in the world—and the poker consultant for GTO Wizard AI. Rabichow’s initial prognosis is grim: Slumbot is superhuman, he told me. But it isn’t perfect—and, crucially, it does not adjust to its opponents. Instead, I can adjust while it will play the same game it was programmed to play.

I play a few trial hands. I lose some. I win some. I start taking notes, much the way I would against a human opponent. One of the first things I notice is that Slumbot likes to bet small with value hands that are not the absolute nuts (nuts being the best possible hand). My adjustment? Either fold or raise, depending on the situation—because just as Rabichow predicted it might, it overfolds to big raises, even when it holds a solid hand.

I start accumulating chips quite rapidly. And then I make a mistake, running a big bluff even though Slumbot has called every bet. I should know by now that if the machine doesn’t fold to aggression, it has something strong—as indeed it does. It’s a costly mistake that brings me down for the session. I chalk my loss up to two factors: I’m tired, and I was distracted by some texts on my phone. (Both true, but neither a good excuse.)

Here’s one major edge that Slumbot has over me: It doesn’t get distracted or tired. It doesn’t think about what’s for dinner or how close I am to the requisite 500 hands I promised Kevin I’d play. It just executes its strategy, over and over and over, with precision.

I dial back in. I fight back. It turns out Slumbot is teaching me something important about myself as I play: Without the distractions of the live table or the time pressures of real online play, I can pay attention to my own strategy, my own feelings, my own shortcomings much more clearly. Slumbot may be an AI, but it’s evoking very human responses.

We play on. At the end of 750 hands, I emerge victorious, having won 122.5 big blinds. If we’d been playing for real money, that would have been $12,250 in just over two hours. “LFG!” Kevin texted me after I sent him my results. “Humans can still win this thing.”

Yes, they can, even in a format like Heads-Up, poker AI’s strongest suit. At least, they can against the best bots of yore. Because here’s another thing about Slumbot—a limitation that currently marks every existing poker program and plays a crucial role in evaluating any existing AI against humanity. It’s stuck in 2018. Even though a few tweaks have been made since it won its final competition, it is essentially the same bot now as it was five years ago. Poker, however, has evolved.

The world is not static. A strategy that was optimal last year, in poker or in anything else, may no longer be optimal if the environment has changed. Someone who is at the top of their field may find themselves struggling if they stop learning while their competitors keep evolving.

When I began playing poker, as research for a book on the nature of chance and decision-making, I had to keep learning to stay competitive. If I ignored a new tool or tactic, I would lose. But if I embraced it, I still had a chance of winning. So when I faced Slumbot, as relatively bad as I am at Heads-Up poker compared to someone like Rabichow, I could still win. The me that has been studying the game’s evolution is better than the superhuman AI of the past.

Soon after I beat Slumbot, it’s time to face GTO Wizard AI. I feel like I will be lucky to survive. Even though, like Slumbot, GTO Wizard AI is unable to adapt to its opponent, its base algorithms are much more powerful and, in Heads-Up combat, GTO Wizard AI has left Slumbot in the dust.

I quickly realize that in my 750 hands against Slumbot, I’ve picked up some bad habits. Against the easier opponent, I started playing far too many hands. GTO Wizard AI will have none of that. If I play a marginal hand, I am immediately punished. This results in two massive blunders within the first few hands that immediately set me back, causing me to lose most of the big blinds that I will lose this session. For the remaining 100 hands, I’ll be battling back with a big handicap.

I find myself enraged at this stupid AI, which judges players by a standard of “optimal” play, when it labels my decision to call, instead of raise, a bet in a particular spot a blunder. But I had my reasons. Given the board, there could have been a higher straight than mine. And I didn’t think my raise would ever be called by a worse hand. GTO Wizard disagrees and dings me an insane 14.9 big blinds.

Eventually, as so often happens, anger gives way to self-reflection. Maybe the program is right—maybe it spotted a leak in my game, a risk aversion that prompts me to take the more cautious route when I should instead opt for aggression. Then again, against flesh-and-blood opponents, maybe my intuitions are the better guide. Most players in that situation wouldn’t bet in that way unless they had me beat—and most wouldn’t call a raise unless they had me beat. That’s the difficulty of playing an optimal opponent and being judged by those standards: They are incredibly useful but can lead you astray against humans who play anything but optimally. The AI has played vastly more hands than I ever will—but has never learned to parse the movements of an opponent’s fingers, the look in her eyes, the pulse in her neck.

In the end, I recover. At least somewhat. I lose an average of .06 big blinds per hand, or 5.8 big blinds in the 100 hands I play this session. “How bad is that?” I text Kevin. To my surprise, he responds, “Overall seems quite good.” Jubilation.

I will go on to play several more sessions. My performance remains at a steady .06 average. I’m sad I don’t improve, but happy I don’t go up in flames. I feel like I’ve held my own. Even in this format. Even against the best of the best.

To researchers like Bowling and Sandholm and Brown, it doesn’t matter that poker isn’t solved as such or that the game has progressed beyond their models. Their goal was always the same as von Neumann’s: poker as a tool. Sure, many of the researchers—von Neumann first and foremost—love the game. But as a research program, it is a benchmark, a waypoint to AI in service of the greater good of humanity.

Amy Greenwald, an AI researcher at Brown University who collaborated with Bowling in his DeepStack research, is working on negotiation, which she sees as the most important game theoretic problem in the world. “Can we try to predict how agents will act? Can we steer them toward positive outcomes? That’s what poker has given us,” she says. Consider even the most simple negotiation problem, between two agents. Who acts first? What do they say? Did you offend the other person with your initial offer? Did your stance make you seem overeager? “In negotiation, I need to give you an offer now without revealing so much about my hand that it undercuts me,” Greenwald says. “I need to learn how you think—your function, in machine terms—to try to sway you in my direction, eventually.” Every time humans negotiate, it’s like playing a hand of poker to the best of our ability—trying to discern what the other player holds and how far you can push them without revealing too much about your own cards and how far they can push you.

Sandholm would agree, and he’s directly leveraging the algorithmic insights of poker into very real problems via several start-ups. At one of them, Strategic Machine, he works on applications like political campaign planning—a game of poker if ever there was one. “Take a very simple campaign problem: How do you allocate money on various types of media?” Sandholm says. “It all depends on your opponent. It’s pure game theory. But people don’t usually take game theoretical approaches to campaign allocation.”

Poker has, to these researchers, served its purpose. Humans tilt—a poker term for the human tendency to inject emotions into their decision process. Humans celebrate. They cry. They lie—and not just when bluffing. They get greedy. They become risk averse. They become risk-seeking. They like each other and hate each other. They feel like you are out to get them. Sometimes, they don’t know why. Dynamics change, often at a subconscious level. Humans become more aggressive against someone, less aggressive against someone else, a give-and-take that changes strategies and outcomes.

That very humanity is what drew John von Neumann to the game. And in his theory, he challenged us to remove it, to reduce it to equations that would, ultimately be solvable. He and his successors have almost succeeded. As Bowling put it, “Poker has a very human element. But von Neumann was so successful he almost removed it.” That almost, however, does a lot of heavy lifting.

As a psychologist, I know this about the human mind: What we don’t know far outnumbers what we do. We can’t accurately say why we act the way we do, let alone why others do. When Marion Tinsley played against Chinook in the first Man-Machine World Championship in 1992, he was certain he would win. “I have a better programmer than Chinook,” he told The Independent. “His was Jonathan, mine was the Lord.”

AI can improve all it wants, but humans will always be human. We don’t know quite what that means. We can’t quite assess how it will play out. That’s our downfall. But it’s also our strength.

Maria Konnikova is a writer, psychologist, and poker player. She is the author of the books The Biggest Bluff, The Confidence Game, and Mastermind: How to Think Like Sherlock Holmes. She is currently working on a book about cheating in games.