Michal Valko: Building Smarter AI Through Self-Play and Game Theory
At Innovations United 2025, Michal Valko, an AI pioneer with nearly three decades of experience, delivered a keynote that resonated far beyond the stage. Sharing insights from his work at DeepMind, Meta, and his current stealth startup, Valko took the audience on a journey through the evolution of AI training—and why game theory may hold the key to unlocking the next generation of models.
“I didn’t want AI that needed years of training. I wanted AI that could learn to be useful on its own.”
A Human-Centered View of AI
Valko opened by contrasting how AI learns today—with massive labeled datasets and costly compute—with how humans learn through minimal examples and rich context. This gap, he argues, demands a smarter learning paradigm.
“My mother worked in AI in the 1970s. What’s changed is not the idea—but the recipe.”
His Two Core Bets: Self-Supervision & Game Theory
- Self-supervised learning: Models learn by predicting known data—removing the need for costly manual labeling.
- Game theory and self-play: Models improve by competing with similarly skilled versions of themselves, forming a feedback loop akin to human learning in sports or games.
These approaches aren’t just theoretical. Valko has implemented them in real-world models: Gemini, Lama, Spawn, Sparrow, T5X—some open, some internal.
“The goal isn’t just to beat GPT-4. The goal is to be better than every other model.”
Cracking an Open Problem in AI
Valko described how large language models (LLMs) present a scaling problem: tree-search strategies collapse under the weight of 220,000-token contexts. Traditional methods don’t scale. His team’s breakthrough?
➡️ A gradient-descent-based optimization framework using self-play to simulate Nash-like competitive improvements.
This research took 10 years, involved multiple labs and teams, and culminated in a Best Paper Award in 2022. Its outputs are embedded in Gemini and Meta models today.
A New Era of AI Training
The process involves:
- Starting with identical models
- Using randomized prompts to find winning behaviors
- Replacing weaker models with better-performing ones
- Repeating until only top-performing models survive
It’s AI Darwinism—with game theory as the engine.
“Self-play enables AI to train against itself—evolving faster, smarter, and stronger.”
Final Word
Michal Valko’s keynote wasn’t just a technical masterclass—it was a call to rethink how we train intelligent systems. His ultimate goal: one model to rule them all, trained not by brute force, but by clever competition and elegant optimization.