Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
David Silver and Richard Sutton, two renowned AI scientists, argue in a new paper that artificial intelligence is about to enter a new phase, the “Era of Experience.” This is where AI systems rely increasingly less on human-provided data and improve themselves by gathering data from and interacting with the world.
While the paper is conceptual and forward-looking, it has direct implications for enterprises that aim to build with and for future AI agents and systems.
Both Silver and Sutton are seasoned scientists with a track record of making accurate predictions about the future of AI. The validity predictions can be directly seen in today’s most advanced AI systems. In 2019, Sutton, a pioneer in reinforcement learning, wrote the famous essay “The Bitter Lesson,” in which he argues that the greatest long-term progress in AI consistently arises from leveraging large-scale computation with general-purpose search and learning methods, rather than relying primarily on incorporating complex, human-derived domain knowledge.
David Silver, a senior scientist at DeepMind, was a key contributor to AlphaGo, AlphaZero and AlphaStar, all important achievements in deep reinforcement learning. He was also the co-author of a paper in 2021 that claimed that reinforcement learning and a well-designed reward signal would be enough to create very advanced AI systems.
The most advanced large language models (LLMs) leverage those two concepts. The wave of new LLMs that have conquered the AI scene since GPT-3 have primarily relied on scaling compute and data to internalize vast amounts of knowledge. The most recent wave of reasoning models, such as DeepSeek-R1, has demonstrated that reinforcement learning and a simple reward signal are sufficient for learning complex reasoning skills.
What is the era of experience?
The “Era of Experience” builds on the same concepts that Sutton and Silver have been discussing in recent years, and adapts them to recent advances in AI. The authors argue that the “pace of progress driven solely by supervised learning from human data is demonstrably slowing, signalling the need for a new approach.”
And that approach requires a new source of data, which must be generated in a way that continually improves as the agent becomes stronger. “This can be achieved by allowing agents to learn continually from their own experience, i.e., data that is generated by the agent interacting with its environment,” Sutton and Silver write. They argue that eventually, “experience will become the dominant medium of improvement and ultimately dwarf the scale of human data used in today’s systems.”
According to the authors, in addition to learning from their own experiential data, future AI systems will “break through the limitations of human-centric AI systems” across four dimensions:
- Streams: Instead of working across disconnected episodes, AI agents will “have their own stream of experience that progresses, like humans, over a long time-scale.” This will allow agents to plan for long-term goals and adapt to new behavioral patterns over time. We can see glimmers of this in AI systems that have very long context windows and memory architectures that continuously update based on user interactions.
- Actions and observations: Instead of focusing on human-privileged actions and observations, agents in the era of experience will act autonomously in the real world. Examples of this are agentic systems that can interact with external applications and resources through tools such as computer use and Model Context Protocol (MCP).
- Rewards: Current reinforcement learning systems mostly rely on human-designed reward functions. In the future, AI agents should be able to design their own dynamic reward functions that adapt over time and match user preferences with real-world signals gathered from the agent’s actions and observations in the world. We’re seeing early versions of self-designing rewards with systems such as Nvidia’s DrEureka.
- Planning and reasoning: Current reasoning models have been designed to imitate the human thought process. The authors argue that “More efficient mechanisms of thought surely exist, using non-human languages that may, for example, utilise symbolic, distributed, continuous, or differentiable computations.” AI agents should engage with the world, observe and use data to validate and update their reasoning process and develop a world model.
The idea of AI agents that adapt themselves to their environment through reinforcement learning is not new. But previously, these agents were limited to very constrained environments such as board games. Today, agents that can interact with complex environments (e.g., AI computer use) and advances in reinforcement learning will overcome these limitations, bringing about the transition to the era of experience.
What does it mean for the enterprise?
Buried in Sutton and Silver’s paper is an observation that will have important implications for real-world applications: “The agent may use ‘human-friendly’ actions and observations such as user interfaces, that naturally facilitate communication and collaboration with the user. The agent may also take ‘machine-friendly’ actions that execute code and call APIs, allowing the agent to act autonomously in service of its goals.”
The era of experience means that developers will have to build their applications not only for humans but also with AI agents in mind. Machine-friendly actions require building secure and accessible APIs that can easily be accessed directly or through interfaces such as MCP. It also means creating agents that can be made discoverable through protocols such as Google’s Agent2Agent. You will also need to design your APIs and agentic interfaces to provide access to both actions and observations. This will enable agents to gradually reason about and learn from their interactions with your applications.
If the vision that Sutton and Silver present becomes reality, there will soon be billions of agents roaming around the web (and soon in the physical world) to accomplish tasks. Their behaviors and needs will be very different from human users and developers, and having an agent-friendly way to interact with your application will improve your ability to leverage future AI systems (and also prevent the harms they can cause).
“By building upon the foundations of RL and adapting its core principles to the challenges of this new era, we can unlock the full potential of autonomous learning and pave the way to truly superhuman intelligence,” Sutton and Silver write.
DeepMind declined to provide additional comments for the story.
Source link