Inception Labs introduces ‘diffusion’ approach that could transform how we interact with artificial intelligence
In what could be a paradigm shift for artificial intelligence, Inception Labs has introduced a new type of language model that generates text up to 10 times faster and at one-tenth the cost of current systems. The technology, called Mercury, borrows techniques from image generation to fundamentally change how AI produces text.
Traditional large language models, which power services like ChatGPT and Claude, generate text one word at a time in a sequential process. Mercury’s approach is radically different—it creates an entire response simultaneously as a rough draft, then iteratively refines it into coherent text.
“This is the first production-grade diffusion-based large language model,” said Inception Labs in materials demonstrating the technology.
How Diffusion Models Work
The technique draws inspiration from how AI creates images. When generating pictures, these systems start with random noise and gradually refine it into a recognizable image. Mercury applies this same principle to text generation.
In a demonstration comparing traditional models to Mercury, the difference is striking. While conventional systems needed 75 iterations to complete a response, Mercury Coder required just 14. The resulting speed difference is dramatic, with Mercury operating at over 1,000 tokens (roughly words) per second on standard hardware.
This breakthrough could solve one of AI’s current limitations: waiting time. Coding assistant users often wait 5-15 minutes for the AI to generate complex solutions. With diffusion models, that time could be reduced to seconds.
Real-World Performance

In a live demonstration, Mercury created a particle system that follows mouse movements within seconds of receiving the prompt. It similarly produced a Bayesian model in Python almost instantly.
The system’s speed becomes apparent when compared directly with leading models. In a side-by-side test, Mercury completed a coding task in just 6 seconds, while Claude and ChatGPT took 28 and 36 seconds, respectively.
Expert Analysis
Andre Karpathy, a prominent AI researcher, noted the significance of this approach. “Most image and video generation AI tools actually work this way and use diffusion, not autoregression. It’s only text and sometimes audio that have resisted,” he commented.
Karpathy suggested this model “has the potential to be different and possibly showcase new unique psychology or new strengths and weaknesses.”
Beyond Speed: New Capabilities
According to Inception Labs, diffusion models offer advantages beyond mere speed:
Better reasoning: Since these models can see their entire output at once, they can better structure responses and potentially correct their own mistakes.
Agent acceleration: AI agents, which perform complex tasks through multiple steps of reasoning, could work much faster when powered by diffusion models.
Controllable generation: The ability to edit output and generate text in any order could allow for more precise formatting and better alignment with safety requirements.
Edge applications: The smaller computational footprint means these models could run effectively on personal computers rather than requiring cloud servers.
A New Era for AI?
This breakthrough comes as major AI labs have increasingly focused on “test time computation”—allowing models more time to think through problems. Diffusion models could make this approach much more practical by dramatically reducing the time and cost involved.
While a paper proposing similar techniques was published about a month ago, Mercury represents the first working implementation of a diffusion-based language model available for public testing.
If Inception Labs’ claims hold true in widespread use, this technology could fundamentally change how developers and companies interact with AI, potentially making advanced language models more accessible and useful in time-sensitive applications.