A Game of Trust and Betrayal when Humans Meet Machines
My interpretation of the work by Mukul Singh, Arjun Radhakrishna, and Sumit Gulwani, Microsoft Research
When I first picked up this paper by Mukul Singh, Arjun Radhakrishna, and Sumit Gulwani from Microsoft, I wasn’t expecting to be pulled into a world of games, betrayals, forgiveness, and, believe it or not, language models. But that’s exactly what happened.
The paper takes us into the classic Prisoner’s Dilemma, a game theory experiment that has obsessed economists, psychologists, and strategists for decades. The setup is simple: two players must independently decide whether to cooperate or defect. Cooperation brings mutual benefit, but defecting against a cooperator gives you a bigger win, at the cost of the other’s loss. Defect too much, though, and everyone suffers. It’s the perfect metaphor for trust and conflict.
Now, here’s the twist: what if one of the prisoners isn’t human at all, but a large language model, the same kind of technology that powers modern chatbots and assistants?
The researchers put these AI “players” into hundreds of rounds of the iterated Prisoner’s Dilemma (meaning the game is played repeatedly, so memory, trust, and grudges can form). They pitted the models against 240 tried-and-true strategies, from the famously cooperative Tit-for-Tat to the ruthlessly selfish Always Defect.
The results surprised me. Far from being clueless machines, the language models performed on par with the best classical strategies, sometimes even better. They weren’t blindly nice or hopelessly gullible. Instead, they showed traits eerily human: niceness, provocability, and generosity. They started off cooperative, punished betrayal quickly, but were also willing to forgive.
Even more fascinating, the models could adapt. When opponents suddenly changed tactics mid-game, the AI caught on in just a few rounds, faster than most humans in similar experiments. That adaptability is both thrilling and unsettling. Imagine an AI colleague that not only remembers how you’ve treated it but adjusts its behavior accordingly, perhaps quicker than you realize.
But the story isn’t one-sided. Humans still outshone machines in one key area: sustaining long-term cooperation. While the AI could exploit shifts to maximize short-term gains, people tended to value trust over payoff, preferring mutual benefit over clever opportunism. In other words, we’re still better at keeping relationships alive, even if it costs us a few points on the scoreboard.
So what does all this mean? To me, it paints a picture of our near future. As language models weave deeper into our social and professional lives, whether as customer service agents, teammates, or decision-making aids, we’ll find ourselves in a constant dance of cooperation and conflict with them. They may forgive, retaliate, and adapt just like us, but they won’t value trust the same way humans do. That’s both a promise and a warning.
The authors stop short of making moral judgments. Their aim is to provide a foundation for studying human-AI dynamics in complex environments. But reading between the lines, I can’t help but feel we’re entering a world where every interaction with a machine is its own prisoner’s dilemma: do we cooperate, exploit, or build long-term trust?
And the bigger question is, will the machines do the same with us?
Disclaimer: This article is my interpretation of the paper “Collaboration and Conflict between Humans and Language Models through the Lens of Game Theory” by Mukul Singh, Arjun Radhakrishna, and Sumit Gulwani (Microsoft Research, 2025). Any opinions or narrative style added here are mine and not necessarily those of the authors.