NVIDIA CEO Jensen Huang opened the Computex 2026 keynote in Taipei on June 1 with a sweeping announcement: Nemotron 3 Ultra, a 550-billion-parameter open-weights AI model that the company is calling America’s most intelligent publicly available model. The reveal signals NVIDIA’s ambition to compete not just in AI chips — but in the models that run on them.
What Is Nemotron 3 Ultra?
Nemotron 3 Ultra is a mixture-of-experts (MoE) model with 550 billion total parameters, but only 55 billion active at any given inference call. That architectural choice is the key to its efficiency: the model achieves 300+ tokens per second while costing 30% less to run than comparable open-weight alternatives. For enterprises self-hosting large models on NVIDIA hardware, that cost gap matters enormously.
The model scores an Intelligence Index of 48 on the Artificial Analysis benchmark — placing it solidly at the frontier for open-weights models in the US. For context, proprietary closed models like GPT-5.5 and Claude Opus 4.8 still lead on overall intelligence scores, but those come with API pricing and data-sharing agreements. Nemotron 3 Ultra can run entirely on-premise.
The Catch: China Is Still Ahead on Open Models
Despite the fanfare, Nemotron 3 Ultra doesn’t hold the global open-model crown. China’s Kimi K2.6 from Moonshot AI scores 54 on the same benchmark — ranking fourth among all AI models globally, closed or open. NVIDIA’s model is America’s best open-weight option, but the gap with leading Chinese open models is a persistent concern in the open-source AI community.
NVIDIA was careful in its language at Computex: “the most intelligent US open weights model” — a geographically-scoped claim that reflects the current state of open AI model competition between the US and China.
Built for Agentic Workflows
Nemotron 3 Ultra was designed from the ground up for agentic AI — systems that can plan, execute, and iterate on multi-step tasks without human input at each stage. This is where the 550B architecture pays off: complex reasoning chains and long-horizon planning benefit from more parameters, even if only a fraction are active per token.
This positions it squarely against models like Anthropic’s upcoming Mythos model and Google’s Gemini agent line — both of which also target agentic enterprise use cases. The difference is deployment model: Nemotron 3 Ultra is fully open-weights, making it attractive to regulated industries (finance, healthcare, government) that can’t send data to external APIs.
Availability and the Full Computex Stack
The model releases on June 4, 2026 across Hugging Face, ModelScope, and OpenRouter for download, plus NVIDIA’s NIM microservice on build.nvidia.com for managed deployment. Enterprises can also access it as a managed API.
Nemotron 3 Ultra sits atop a three-tier family: the base Nemotron 3 for lightweight tasks, the Super (120B parameters, launched March 2026) for mid-range enterprise work, and Ultra for the most demanding reasoning and agentic applications.
Huang’s Computex keynote also included new chip announcements, a personal AI PC called Project DIGITS 2, and updates to the Cosmos 3 world-model platform — but Nemotron 3 Ultra was the headline for enterprise AI buyers.
Why This Matters
NVIDIA’s entry into the open-weights model race is a strategic pivot. For years, the company’s pitch was simple: buy our GPUs to run others’ models. Nemotron 3 changes that to: buy our GPUs, run our model, and deploy via our microservices. It’s a full-stack lock-in play.
For developers and enterprises, Nemotron 3 Ultra is worth benchmarking — particularly for agentic use cases where on-premise deployment is a hard requirement. At 300+ tokens/second and 30% cheaper than comparable open-weight alternatives, the efficiency story is real.
Whether it narrows the gap with China’s leading open models remains to be seen. For now, NVIDIA has America’s best open-weights model — and that’s a claim it didn’t hold two weeks ago.