Roundtables: Can AI Learn to Understand the World?
Original reporting by MIT Technology Review

Large language models (LLMs) have reshaped artificial intelligence, demonstrating unprecedented capabilities in processing and generating text. Yet, despite their prowess, these systems remain largely tethered to the digital realm, lacking an intrinsic understanding of the physical world we inhabit. Their knowledge stems from vast datasets of human language and imagery, but without direct experience or a robust internal model of reality, their capacity to interact with and reason about the external environment remains severely limited. This fundamental hurdle is now driving a significant shift in AI research.
AI companies are increasingly focusing on "world models" – sophisticated AI systems designed to build an internal representation of the physical world, predict its behavior, and learn through interaction. These models aim to equip AI with a more profound grasp of causality, space, and object permanence, moving beyond the statistical patterns of language to a more intuitive understanding of how things work in reality. The ultimate goal: create AI that can not only converse intelligently but also navigate, manipulate, and learn within complex physical settings.
Entering the physical world
This evolving paradigm promises to unlock entirely new applications for AI, from advanced robotics capable of genuine problem-solving to virtual agents that can simulate real-world scenarios with remarkable fidelity. It heralds a future where AI systems are not just digital assistants but active participants in our physical world. To delve into this transformative trend and explore the profound implications of AI's physical embodiment, we recently brought together MIT Technology Review's editor in chief Mat Honan, senior AI editor Will Douglas Heaven, and AI reporter Grace Huckins for an insightful discussion.
The discussion around world models underscores a profound reorientation within AI research: a decisive move beyond purely linguistic or abstract reasoning toward systems capable of truly understanding and interacting with the physical environment. This shift represents not merely an incremental improvement, but a fundamental quest for grounded intelligence. By enabling AI to build internal representations of reality—predicting outcomes, grasping causality, and navigating three-dimensional space—researchers aim to transcend the inherent limitations of large language models, which, for all their prowess, often lack common-sense physics and a direct connection to the world they describe. This endeavor is central to unlocking the next generation of AI capabilities.