Close Menu
WithO2WithO2

    Subscribe to Updates

    Get the latest AI News Tools Updates in your Inbox

    What's Hot

    Claude vs ChatGPT 2026: We Tested Both — Here’s the Honest Verdict

    June 3, 2026

    NVIDIA Nemotron 3 Ultra Is Now Live — First Look at Benchmarks

    June 3, 2026

    Anthropic Just Fixed a Major Claude Outage — Here’s What Went Down

    June 3, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    WithO2WithO2
    • AI
    • Blog
    • Business Software
    • Trending News
    • Stories
    WithO2WithO2
    Home » Trending News
    Trending News

    NVIDIA Nemotron 3 Ultra Is Now Live — First Look at Benchmarks

    By Amitabh SarkarJune 3, 20265 Mins Read1
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    NVIDIA Nemotron 3 Ultra — Americas best open-weight AI model benchmarks 2026
    NVIDIA Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index, the highest of any US open-weight model ever released.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    NVIDIA’s Nemotron 3 Ultra, a 550 billion parameter open-weight model announced at Computex 2026, went live on Hugging Face on June 4. It scores 48 on the Artificial Analysis Intelligence Index — the highest of any US-built open-weight model ever released — and runs at over 300 tokens per second, three to six times faster than comparable Chinese models available through commercial APIs today.

    NVIDIA published the benchmark results alongside the Hugging Face release, framing the model as America’s answer to the open-weight arms race with China. The timing is deliberate: DeepSeek and Kimi have dominated open-weight leaderboards for most of 2026, and Nemotron 3 Ultra is NVIDIA’s most direct challenge yet.

    NVIDIA Nemotron 3 Ultra benchmark results — America's top open-weight AI model 2026
    NVIDIA Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index, the highest of any US open-weight model. (Source: Artificial Analysis)

    The 550B Number Is Misleading — Here’s What’s Actually Running

    Nemotron 3 Ultra uses a mixture-of-experts (MoE) architecture. The 550 billion total parameters are split across expert networks, but only 55 billion are active during any given inference pass. That’s the same design principle behind DeepSeek V3 and Kimi K2 — you get the reasoning depth of a 500B+ model at a fraction of the compute cost.

    The practical upside is speed. On a pre-release DeepInfra endpoint, NVIDIA clocked 300+ output tokens per second. DeepSeek V4 Pro and Kimi K2.6 — the strongest Chinese open models — run at 50–100 tokens per second through their commercial APIs. For developers building latency-sensitive applications, that gap matters.

    NVIDIA’s hardware advantage is doing real work here. Running large open-weight models on H100 or GB200 infrastructure with software tuned in-house is a different proposition than spinning up a model on commodity cloud GPUs. Nemotron 3 Ultra is, in part, an advertisement for NVIDIA’s full stack.

    Where It Sits Against Closed Models and China’s Best

    The honest benchmark picture is more complicated. At 48 on the Artificial Analysis Intelligence Index, Nemotron 3 Ultra is the top US open-weight model — but it’s not the top model, full stop. China’s Kimi K2.6 sits at 54 on the same index. Among closed commercial models, Anthropic’s Claude Opus 4.8 scores 61.

    The gap to the next-best US open-weight models is significant. Google’s Gemma 4 31B sits at 39. OpenAI’s gpt-oss-120b reaches 33. Nemotron 3 Ultra doesn’t just edge past these — it resets the American open-weight ceiling by 9 full points.

    For enterprise teams that need a self-hosted model for compliance, security, or cost reasons, that ceiling matters. Until now, accepting an open-weight model meant a noticeable quality drop versus frontier APIs. Nemotron 3 Ultra narrows that gap without requiring a proprietary API agreement.

    Who This Is Actually Built For

    NVIDIA isn’t competing with OpenAI for consumer mindshare. Nemotron 3 Ultra targets developers and enterprises who want to run inference on their own infrastructure — hospitals that can’t send patient data to an API, defense contractors, financial firms under data-residency rules, and AI startups that want frontier-class performance without per-token costs that scale to millions of dollars a month.

    The Computex announcement positioned it alongside NVIDIA’s NIM microservices platform, which packages model weights with optimized inference software for enterprise deployment. That pairing — weights plus runtime — is how NVIDIA intends to compete with hyperscaler AI APIs on the same customers’ budgets.

    For context on where the wider model landscape is heading, Gemini 3.5 Flash’s aggressive pricing earlier this year already pushed open-weight economics into uncomfortable territory for closed-API providers. Nemotron 3 Ultra adds another pressure point from the other direction.

    💡 Our Take: Nemotron 3 Ultra is a genuine milestone — the first US open-weight model that enterprises can seriously consider as a closed-API replacement for non-coding workloads. But the “America’s best” framing papers over the fact that China’s open frontier is still nine points ahead. NVIDIA has the chip advantage and the deployment infrastructure; the intelligence gap to Kimi K2.6 is the next problem to solve.

    Frequently Asked Questions

    What is NVIDIA Nemotron 3 Ultra?

    Nemotron 3 Ultra is a 550 billion parameter open-weight AI model from NVIDIA, using a mixture-of-experts architecture with 55 billion active parameters per inference pass. It was announced at Computex 2026 and released on Hugging Face on June 4, 2026.

    How does Nemotron 3 Ultra benchmark against GPT-4o and Claude?

    On the Artificial Analysis Intelligence Index, Nemotron 3 Ultra scores 48. Anthropic’s Claude Opus 4.8 (a closed model) scores 61. OpenAI’s open-weight gpt-oss-120b scores 33. Nemotron 3 Ultra is the top US open-weight model but ranks below leading closed commercial models.

    How fast is Nemotron 3 Ultra?

    On a pre-release DeepInfra endpoint, NVIDIA measured over 300 output tokens per second. Chinese open-weight models of similar intelligence — DeepSeek V4 Pro and Kimi K2.6 — run at 50–100 tokens per second through their commercial APIs, making Nemotron 3 Ultra three to six times faster in practice.

    Can anyone download and run Nemotron 3 Ultra?

    Yes. The weights are available on Hugging Face as of June 4, 2026. Running the full 550B parameter model requires significant GPU infrastructure. NVIDIA also offers it through its NIM microservices platform for enterprise deployment on NVIDIA hardware.

    Does Nemotron 3 Ultra beat China’s best open-weight models?

    Not yet. China’s Kimi K2.6 scores 54 on the Artificial Analysis Intelligence Index versus Nemotron 3 Ultra’s 48. Nemotron 3 Ultra is the strongest US open-weight model ever released, but China’s open frontier still leads in raw intelligence benchmarks.

    Nemotron 3 Ultra sets a new bar for what US open-weight AI can do — and for organizations that can’t use closed APIs, it’s now a serious option. The closed-vs-open model decision just got more interesting.

    Last Updated: June 2026

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Amitabh Sarkar
    • Website

    I am a software engineer, I have a passion for working with cutting-edge technologies and staying up-to-date with the latest developments in the field. In my articles, I share my knowledge and insights on a range of topics, including business software, how to set up tools, and the latest trends in the tech industry.

    Related Posts

    Anthropic Just Fixed a Major Claude Outage — Here’s What Went Down

    June 3, 2026

    Apple’s New Siri Runs on Google Gemini — WWDC Starts in 6 Days

    June 3, 2026

    OpenAI Just Filed for IPO — Targeting $1 Trillion in September

    June 2, 2026

    Comments are closed.

    Don't Miss
    AI

    Claude vs ChatGPT 2026: We Tested Both — Here’s the Honest Verdict

    By Amitabh SarkarJune 3, 2026

    Claude 4 vs ChatGPT-4o: head-to-head comparison across coding, writing, context window, pricing, and image generation. Which AI assistant wins in 2026? The answer depends on what you actually use it for.

    Anthropic Just Fixed a Major Claude Outage — Here’s What Went Down

    June 3, 2026

    Apple’s New Siri Runs on Google Gemini — WWDC Starts in 6 Days

    June 3, 2026

    AI Is Replacing These 7 Jobs in 2026 — And Creating 3 New Ones

    June 3, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Our Picks

    12 Best Deepfake Apps and Software 2026 (Tested & Compared)

    March 7, 2023

    Best long-form AI writer 2023 for writing full blog articles

    January 29, 2023

    Best CRM Software for Insurance Agencies 2026: Top 10 Compared

    January 26, 2023

    Elevate Your Filmmaking with the Best Video Editing editing software for Filmmakers on the Market in 2023

    January 23, 2023
    Editors Picks

    Anthropic Just Fixed a Major Claude Outage — Here’s What Went Down

    June 3, 2026

    Apple’s New Siri Runs on Google Gemini — WWDC Starts in 6 Days

    June 3, 2026

    OpenAI Just Filed for IPO — Targeting $1 Trillion in September

    June 2, 2026

    Apple WWDC 2026: Siri 2.0, iOS 27 AI Features — What to Expect June 8

    June 2, 2026
    About Us
    About Us

    Your Source for Innovation: Discover in-depth guides, solutions, and tools tailored to modern business challenges.

    Links
    • Blog
    • Privacy Policy
    • Contact WithO2.com
    • Terms and Conditions
    Facebook X (Twitter) Instagram Pinterest
    © 2026 WITHO2.COM

    Type above and press Enter to search. Press Esc to cancel.