Generative AI & ToolsMonday, June 1, 2026

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Original reporting by Hugging Face

JetBrains has unveiled Mellum2, a 12-billion-parameter Mixture-of-Experts (MoE) model trained from scratch on natural language and code. Released under an Apache 2.0 license, this open-source model is engineered for high-throughput, low-latency inference, a critical factor for real-world AI deployment. Mellum2 achieves this efficiency by activating only 2.5 billion parameters per token, enabling it to deliver competitive benchmark performance at more than twice the speed of similarly sized models. It extends the foundation of its predecessor, originally a code completion model, to a broader range of software engineering and natural language tasks, always with an eye toward efficient inference and deployability.

A Strategic Shift In an era often dominated by calls for ever-larger, monolithic AI models, Mellum2 represents a strategic pivot toward specialized "focal" models. Modern AI systems increasingly rely on multiple, often latency-sensitive, components—for routing, retrieval, summarization, validation, and tool use—many of which do not require the largest available models. Mellum2 is optimized for these high-frequency tasks, serving as a powerful component within multi-model architectures. It excels in roles like orchestrating multi-model systems, enhancing RAG pipelines, powering agent subtasks, and facilitating private deployments of proprietary code. By focusing on well-scoped, efficient capabilities, Mellum2 aims not to replace every model in the stack, but to make the overall AI system faster, more cost-effective, and ultimately more controllable, particularly for complex software engineering workflows.

Mellum2 represents a significant step in optimizing AI for practical, production-level deployment. Its open-source release, combined with an MoE architecture delivering superior inference speed for text-and-code tasks, positions it as a crucial component in evolving AI systems. By focusing on specific, high-frequency operations like routing, RAG, and sub-agent tasks, Mellum2 addresses the growing need for specialized, efficient models that complement larger, more generalist frontier AI. This strategic focus underscores a shift away from monolithic AI, advocating for modular, cost-effective solutions for diverse software engineering applications.

A Modular Future

The implications of models like Mellum2 extend beyond mere performance metrics. Its introduction signals a maturation in AI development, emphasizing specialized efficiency and architectural flexibility. This modular approach promises to democratize access to advanced AI capabilities by lowering operational costs and improving deployability, particularly for sensitive enterprise applications requiring private hosting. As AI systems become more complex, the ability to selectively invoke optimized models for specific tasks will be paramount. Mellum2’s impact will likely be seen in the accelerated development of more robust, scalable, and controllable AI workflows, enabling organizations to build sophisticated applications without the prohibitive expense or latency associated with solely relying on the largest available models. This trend towards specialized, interoperable AI components is poised to redefine how intelligent systems are designed, built, and integrated into our digital infrastructure, fostering innovation across diverse domains.

Intro and outro generated by Printing Press AI from the source article above. Always consult the original reporting for verbatim quotes and primary sources.