PhyDrawGen: Physically Grounded Diagram Generation from Natural Language
Original reporting by arXiv (cs.AI)

Generating accurate physics diagrams from descriptive text has long been a challenge for artificial intelligence. While powerful generative models can produce visually plausible images, they frequently fall short on fundamental physical laws, systematically hallucinating force vectors, ignoring conservation principles, and violating geometric constraints. This critical gap between visual fidelity and scientific accuracy has limited their utility in educational and technical applications.
A Neuro-Symbolic Solution
Enter PhyDrawGen, a novel neuro-symbolic pipeline designed to overcome these persistent inaccuracies. Instead of attempting to learn complex physical laws directly from visual data, PhyDrawGen decouples semantic scene understanding from rigorous physical constraint satisfaction. It begins with a large language model interpreting problem text to create a detailed scene graph. This symbolic representation is then fed into a deterministic solver, which precisely translates physical laws—such as force balance, optical paths, and electromagnetic fields—into exact geometric primitives. A fine-tuned vision-language model subsequently refines the visual output through a propose-verify loop, iteratively correcting any remaining constraint violations.
Evaluated on a comprehensive benchmark of 1,449 problems spanning mechanics, optics, and electromagnetism, PhyDrawGen has demonstrated remarkable proficiency. It significantly outperforms leading generative models like GPT-5-image, Gemini 2.5 Flash, and Gemini 3 Pro, establishing a new standard for robust physical accuracy, even when confronted with unusual or complex scenarios.
PhyDrawGen marks a significant leap in AI's ability to interpret and visualize complex scientific concepts. By decoupling semantic understanding from a deterministic solver and employing a refined propose-verify loop, the neuro-symbolic pipeline effectively overcomes the common pitfalls of hallucination and constraint violation seen in purely generative models. Its robust performance across mechanics, optics, and electromagnetism, even with unusual objects, demonstrates a crucial shift from merely plausible AI-generated outputs to those with verifiable physical accuracy, significantly outperforming competitors like GPT-5-image and Gemini 3 Pro. This advance is not just about drawing better diagrams; it's about embedding scientific rigor directly into AI's creative process.
Expanding AI's Precision
The implications of PhyDrawGen extend far beyond textbook illustrations. In education, it promises to revolutionize how students interact with physics, offering dynamic, accurate visual aids that clarify complex principles and reduce misconceptions. For researchers and engineers, the ability to rapidly generate accurate physical representations could accelerate hypothesis formation, experimental design, and prototyping across various scientific and industrial domains. Furthermore, PhyDrawGen establishes a compelling blueprint for developing AI systems in other fields demanding high-fidelity, constraint-aware generation—from chemical diagrams and architectural blueprints to circuit designs and medical imaging. This blend of large language model interpretation with symbolic solvers underscores the growing power of neuro-symbolic AI, pushing us closer to artificial intelligence that truly understands and respects the fundamental laws of our physical world, thereby opening new avenues for scientific discovery and innovation.