NVIDIA Nemotron 3 Ultra: Benchmark Results, Architecture, and How It Compares

NVIDIA’s Nemotron 3 Ultra, a 550 billion parameter open-weight model announced at Computex 2026, went live on Hugging Face on June 4. It scores 48 on the Artificial Analysis Intelligence Index — the highest of any US-built open-weight model ever released — and runs at over 300 tokens per second, three to six times faster than comparable Chinese models available through commercial APIs today.

NVIDIA published the benchmark results alongside the Hugging Face release, framing the model as America’s answer to the open-weight arms race with China. The timing is deliberate: DeepSeek and Kimi have dominated open-weight leaderboards for most of 2026, and Nemotron 3 Ultra is NVIDIA’s most direct challenge yet.

NVIDIA Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index, the highest of any US open-weight model. (Source: Artificial Analysis)

The 550B Number Is Misleading — Here’s What’s Actually Running

Nemotron 3 Ultra uses a mixture-of-experts (MoE) architecture. The 550 billion total parameters are split across expert networks, but only 55 billion are active during any given inference pass. That’s the same design principle behind DeepSeek V3 and Kimi K2 — you get the reasoning depth of a 500B+ model at a fraction of the compute cost.

The practical upside is speed. On a pre-release DeepInfra endpoint, NVIDIA clocked 300+ output tokens per second. DeepSeek V4 Pro and Kimi K2.6 — the strongest Chinese open models — run at 50–100 tokens per second through their commercial APIs. For developers building latency-sensitive applications, that gap matters.

NVIDIA’s hardware advantage is doing real work here. Running large open-weight models on H100 or GB200 infrastructure with software tuned in-house is a different proposition than spinning up a model on commodity cloud GPUs. Nemotron 3 Ultra is, in part, an advertisement for NVIDIA’s full stack.

Where It Sits Against Closed Models and China’s Best

The honest benchmark picture is more complicated. At 48 on the Artificial Analysis Intelligence Index, Nemotron 3 Ultra is the top US open-weight model — but it’s not the top model, full stop. China’s Kimi K2.6 sits at 54 on the same index. Among closed commercial models, Anthropic’s Claude Opus 4.8 scores 61.

The gap to the next-best US open-weight models is significant. Google’s Gemma 4 31B sits at 39. OpenAI’s gpt-oss-120b reaches 33. Nemotron 3 Ultra doesn’t just edge past these — it resets the American open-weight ceiling by 9 full points.

For enterprise teams that need a self-hosted model for compliance, security, or cost reasons, that ceiling matters. Until now, accepting an open-weight model meant a noticeable quality drop versus frontier APIs. Nemotron 3 Ultra narrows that gap without requiring a proprietary API agreement.

Who This Is Actually Built For

NVIDIA isn’t competing with OpenAI for consumer mindshare. Nemotron 3 Ultra targets developers and enterprises who want to run inference on their own infrastructure — hospitals that can’t send patient data to an API, defense contractors, financial firms under data-residency rules, and AI startups that want frontier-class performance without per-token costs that scale to millions of dollars a month.

The Computex announcement positioned it alongside NVIDIA’s NIM microservices platform, which packages model weights with optimized inference software for enterprise deployment. That pairing — weights plus runtime — is how NVIDIA intends to compete with hyperscaler AI APIs on the same customers’ budgets.

For context on where the wider model landscape is heading, Gemini 3.5 Flash’s aggressive pricing earlier this year already pushed open-weight economics into uncomfortable territory for closed-API providers. Nemotron 3 Ultra adds another pressure point from the other direction.

💡 Our Take: Nemotron 3 Ultra is a genuine milestone — the first US open-weight model that enterprises can seriously consider as a closed-API replacement for non-coding workloads. But the “America’s best” framing papers over the fact that China’s open frontier is still nine points ahead. NVIDIA has the chip advantage and the deployment infrastructure; the intelligence gap to Kimi K2.6 is the next problem to solve.

Frequently Asked Questions

What is NVIDIA Nemotron 3 Ultra?

Nemotron 3 Ultra is a 550 billion parameter open-weight AI model from NVIDIA, using a mixture-of-experts architecture with 55 billion active parameters per inference pass. It was announced at Computex 2026 and released on Hugging Face on June 4, 2026.

How does Nemotron 3 Ultra benchmark against GPT-4o and Claude?

On the Artificial Analysis Intelligence Index, Nemotron 3 Ultra scores 48. Anthropic’s Claude Opus 4.8 (a closed model) scores 61. OpenAI’s open-weight gpt-oss-120b scores 33. Nemotron 3 Ultra is the top US open-weight model but ranks below leading closed commercial models.

How fast is Nemotron 3 Ultra?

On a pre-release DeepInfra endpoint, NVIDIA measured over 300 output tokens per second. Chinese open-weight models of similar intelligence — DeepSeek V4 Pro and Kimi K2.6 — run at 50–100 tokens per second through their commercial APIs, making Nemotron 3 Ultra three to six times faster in practice.

Can anyone download and run Nemotron 3 Ultra?

Yes. The weights are available on Hugging Face as of June 4, 2026. Running the full 550B parameter model requires significant GPU infrastructure. NVIDIA also offers it through its NIM microservices platform for enterprise deployment on NVIDIA hardware.

Does Nemotron 3 Ultra beat China’s best open-weight models?

Not yet. China’s Kimi K2.6 scores 54 on the Artificial Analysis Intelligence Index versus Nemotron 3 Ultra’s 48. Nemotron 3 Ultra is the strongest US open-weight model ever released, but China’s open frontier still leads in raw intelligence benchmarks.

Nemotron 3 Ultra sets a new bar for what US open-weight AI can do — and for organizations that can’t use closed APIs, it’s now a serious option. The closed-vs-open model decision just got more interesting.

Last Updated: June 2026

What's Hot

xAI Open-Sources Grok Build After .env File Upload Controversy

Mira Murati’s Inkling: Open-Weight AI You Can Fine-Tune Yourself

Claude for Teachers: Free Premium AI for Every US K-12 Teacher

NVIDIA Nemotron 3 Ultra Is Now Live — First Look at Benchmarks

xAI Open-Sources Grok Build After .env File Upload Controversy

Mira Murati’s Inkling: Open-Weight AI You Can Fine-Tune Yourself

Claude for Teachers: Free Premium AI for Every US K-12 Teacher

Subscribe to Updates

What's Hot

xAI Open-Sources Grok Build After .env File Upload Controversy

Mira Murati’s Inkling: Open-Weight AI You Can Fine-Tune Yourself

Claude for Teachers: Free Premium AI for Every US K-12 Teacher

Subscribe to Updates

NVIDIA Nemotron 3 Ultra Is Now Live — First Look at Benchmarks

The 550B Number Is Misleading — Here’s What’s Actually Running

Where It Sits Against Closed Models and China’s Best

Who This Is Actually Built For

Frequently Asked Questions

What is NVIDIA Nemotron 3 Ultra?

How does Nemotron 3 Ultra benchmark against GPT-4o and Claude?

How fast is Nemotron 3 Ultra?

Can anyone download and run Nemotron 3 Ultra?

Does Nemotron 3 Ultra beat China’s best open-weight models?

Related Posts

xAI Open-Sources Grok Build After .env File Upload Controversy

Mira Murati’s Inkling: Open-Weight AI You Can Fine-Tune Yourself

Claude for Teachers: Free Premium AI for Every US K-12 Teacher